We normally encounter four different kinds of addresses in an IP network. We tend to assign specific purposes to each one. There are other address-like things, of course, such as the protocol number, a router ID, an MPLS label, etc. But let’s stick to these four for the moment. Looking through this list, the first thing you should notice is we often use the IP address as if it identified a host—which is generally not a good thing. There have been some efforts in the past to split the locator from the identifier, but the IP protocol suite was designed with a separate locator and identifier already: the IP address is the location and the DNS name is the identifier.
A few weeks ago, I was in the midst of a conversation about EVPNs, how they work, and the use cases for deploying them, when one of the participants exclaimed: “This is so complicated… why don’t we stick with the older way of doing things with multi-chassis link aggregation and virtual chassis device?” Sometimes it does seem like we create complex solutions when a simpler solution is already available. Since simpler is always better, why not just use them? After all, simpler solutions are easier to understand, which means they are easier to deploy and troubleshoot.
The problem is we too often forget the other side of the simplicity equation—complexity is required to solve hard problems and adapt to demanding environments. While complex systems can be fragile (primarily through ossification), simple solutions can flat out fail just because they can’t cope with changes in their environment.
One “sideways” place to look for value in the network is in a place that initially seems far away from infrastructure, data gravity. Data gravity is not something you might often think about directly when building or operating a network, but it is something you think about indirectly. For instance, speeds and feeds, quality of service, and convergence time are all three side effects, in one way or another, of data gravity.
As with all things in technology (and life), data gravity is not one thing, but two, one good and one bad—and there are tradeoffs. Because if you haven’t found the tradeoffs, you haven’t looked hard enough. All of this is, in turn, related to the CAP Theorem.
Data gravity is, first, a relationship between applications and data location.
Two things which seem to be universally true in the network engineering space right this moment. The first is that network engineers are convinced their jobs will not exist or there will only be network engineers “in the cloud” within the next five years. The second is a mad scramble to figure out how to add value to the business through the network. These two movements are, of course, mutually exclusive visions of the future. If there is absolutely no way to add value to a business through the network, then it only makes sense to outsource the whole mess to a utility-level provider.
The result, far too often, is for the folks working on the network to run around like they’ve been in the hot aisle so long that your hair is on fire. This result, however, somehow seems less than ideal.
A post on Martin Fowler’s blog this week started me thinking about lock-in—building a system that only allows you to purchase components from a single vendor so long as the system is running. The point of Martin’s piece is that lock-in exists in all systems, even open source, and hence you need to look at lock-in as a set of tradeoffs, rather than always being a negative outcome. Given that lock-in is a tradeoff, and that lock-in can happen regardless of the systems you decide to deploy, I want to go back to one of the foundational points Martin makes in his post and think about avoiding lock-in a little differently than just choosing between open source and vendor-based solutions.
If cannot avoid lock-in either by choosing a vendor-based solution or by choosing open source, then you have two choices. The first is to just give up and live with the results of lock-in. In fact, I have worked with a lot of companies who have done just this—they have accepted that lock-in is just a part of building networks, that lock-in results in a good transfer of risk to the vendor from the operator, or that lock-in results in a system that is easier to deploy and manage.
Giving in to lock-in, though, does not seem like a good idea on the surface, because architecture is about creating opportunity. If you cannot avoid lock-in and yet lock-in is antithetical to good architecture, what are your other options?
I recently spoke at CHINOG on the business value of disaggregation, and participated in a panel on getting involved in the IETF.
Over at the Communications of the ACM, Micah Beck has an article up about the hourglass model. While the math is quite interesting, I want to focus on transferring the observations from the realm of protocol and software systems development to network design. Specifically, start with the concept and terminology, which is very useful. Taking a typical design, such as this—
The first key point made in the paper is this—
The thin waist of the hourglass is a narrow straw through which applications can draw upon the resources that are available in the less restricted lower layers of the stack.
DevOps Research and Assessment (DORA) released their 2018 Accelerate report on the state of DevOps at the end of 2018; I’m a little behind in my reading, so I just got around to reading it, and trying to figure out how to apply their findings to the infrastructure (networking) side of the world.
DORA found organizations that outsource entire functions, such as building an entire module or service, tend to perform more poorly than organizations that outsource by integrating individual developers into existing internal teams (page 43). It is surprising companies still think outsourcing entire functions is a good idea, given the many years of experience the IT world has with the failures of this model. Outsourced components, it seems, too often become a bottleneck in the system, especially as contracts constrain your ability to react to real-world changes.
One thing we often forget in our drive to “more” is this—context matters. Ivan hit on this with a post about the impact of controller failures in software-defined networks just a few days ago: A controller (or management/provisioning) system is obviously the central point of failure in any network, but we have to go beyond…
Way back in the day, when telephone lines were first being installed, running the physical infrastructure was quite expensive. The first attempt to maximize the infrastructure was the party line. In modern terms, the party line is just an Ethernet segment for the telephone. Anyone can pick up and talk to anyone else who happens to be listening. In order to schedule things, a user could contact an operator, who could then “ring” the appropriate phone to signal another user to “pick up.” CSMA/CA, in essence, with a human scheduler.
This proved to be somewhat unacceptable to everyone other than various intelligence agencies, so the operator’s position was “upgraded.” A line was run to each structure (house or business) and terminated at a switchboard. Each line terminated into a jack, and patch cables were supplied to the operator, who could then connect two telephone lines by inserting a jumper cable between the appropriate jacks.
An important concept: this kind of operator driven system is nonblocking. If Joe calls Susan, then Joe and Susan cannot also talk to someone other than one another for the duration of their call. If Joe’s line is tied up, when someone tries to call him, they will receive a busy signal. The network is not blocking in this case, the edge is—because the node the caller is trying to reach is already using 100% of its available bandwidth for an existing call. Blocking networks did exist in the form of trunk connections, or connections between these switch panels. Trunk connections not only consume ports on the switchboard, they are expensive to build, and they require a lot of power to run. Hence, making a “long distance call” costs money because it consumes a blocking resource. It is only when we get to packet switched digital networks that the cost of a “long distance call” drops to the rough equivalent of a “normal” call, and we see “long distance” charges fade into memory (many of my younger readers have never been charged for “long distance calls,” in fact, and may not even know what I’m talking about).