If you haven’t found the tradeoffs, you haven’t looked hard enough. Something I say rather often—as Eyvonne would say, a “Russism.” Fair enough, and it’s easy enough to say “if you haven’t found the tradeoffs, you haven’t looked hard enough,” but what does it mean, exactly? How do you apply this to the everyday world of designing, deploying, operating, and troubleshooting networks?
Humans tend to extremes in their thoughts. In many cases, we end up considering everything a zero-sum game, where any gain on the part of someone else means an immediate and opposite loss on my part. In others, we end up thinking we are going to get a free lunch. The reality is there is no such thing as a free lunch, and while there are situations that are a zero-sum game, not all situations are. What we need is a way to “cut the middle” to realistically appraise each situation and realistically decide what the tradeoffs might be.
If you are looking for a good resolution for 2020 still (I know, it’s a bit late), you can’t go wrong with this one: this year, I will focus on making the networks and products I work on truly simpler. . . We need to go beyond just figuring out how to make the user interface simpler, more “intent-driven,” automated, or whatever it is. We need to think of the network as a system, rather than as a collection of bits and bobs that we’ve thrown together across the years. We need to think about the modules horizontally and vertically, think about how they interact, understand how each piece works, understand how each abstraction leaks, and be able to ask hard questions.
We normally encounter four different kinds of addresses in an IP network. We tend to assign specific purposes to each one. There are other address-like things, of course, such as the protocol number, a router ID, an MPLS label, etc. But let’s stick to these four for the moment. Looking through this list, the first thing you should notice is we often use the IP address as if it identified a host—which is generally not a good thing. There have been some efforts in the past to split the locator from the identifier, but the IP protocol suite was designed with a separate locator and identifier already: the IP address is the location and the DNS name is the identifier.
A few weeks ago, I was in the midst of a conversation about EVPNs, how they work, and the use cases for deploying them, when one of the participants exclaimed: “This is so complicated… why don’t we stick with the older way of doing things with multi-chassis link aggregation and virtual chassis device?” Sometimes it does seem like we create complex solutions when a simpler solution is already available. Since simpler is always better, why not just use them? After all, simpler solutions are easier to understand, which means they are easier to deploy and troubleshoot.
The problem is we too often forget the other side of the simplicity equation—complexity is required to solve hard problems and adapt to demanding environments. While complex systems can be fragile (primarily through ossification), simple solutions can flat out fail just because they can’t cope with changes in their environment.
One “sideways” place to look for value in the network is in a place that initially seems far away from infrastructure, data gravity. Data gravity is not something you might often think about directly when building or operating a network, but it is something you think about indirectly. For instance, speeds and feeds, quality of service, and convergence time are all three side effects, in one way or another, of data gravity.
As with all things in technology (and life), data gravity is not one thing, but two, one good and one bad—and there are tradeoffs. Because if you haven’t found the tradeoffs, you haven’t looked hard enough. All of this is, in turn, related to the CAP Theorem.
Data gravity is, first, a relationship between applications and data location.
Two things which seem to be universally true in the network engineering space right this moment. The first is that network engineers are convinced their jobs will not exist or there will only be network engineers “in the cloud” within the next five years. The second is a mad scramble to figure out how to add value to the business through the network. These two movements are, of course, mutually exclusive visions of the future. If there is absolutely no way to add value to a business through the network, then it only makes sense to outsource the whole mess to a utility-level provider.
The result, far too often, is for the folks working on the network to run around like they’ve been in the hot aisle so long that your hair is on fire. This result, however, somehow seems less than ideal.
A post on Martin Fowler’s blog this week started me thinking about lock-in—building a system that only allows you to purchase components from a single vendor so long as the system is running. The point of Martin’s piece is that lock-in exists in all systems, even open source, and hence you need to look at lock-in as a set of tradeoffs, rather than always being a negative outcome. Given that lock-in is a tradeoff, and that lock-in can happen regardless of the systems you decide to deploy, I want to go back to one of the foundational points Martin makes in his post and think about avoiding lock-in a little differently than just choosing between open source and vendor-based solutions.
If cannot avoid lock-in either by choosing a vendor-based solution or by choosing open source, then you have two choices. The first is to just give up and live with the results of lock-in. In fact, I have worked with a lot of companies who have done just this—they have accepted that lock-in is just a part of building networks, that lock-in results in a good transfer of risk to the vendor from the operator, or that lock-in results in a system that is easier to deploy and manage.
Giving in to lock-in, though, does not seem like a good idea on the surface, because architecture is about creating opportunity. If you cannot avoid lock-in and yet lock-in is antithetical to good architecture, what are your other options?
I recently spoke at CHINOG on the business value of disaggregation, and participated in a panel on getting involved in the IETF.
Over at the Communications of the ACM, Micah Beck has an article up about the hourglass model. While the math is quite interesting, I want to focus on transferring the observations from the realm of protocol and software systems development to network design. Specifically, start with the concept and terminology, which is very useful. Taking a typical design, such as this—
The first key point made in the paper is this—
The thin waist of the hourglass is a narrow straw through which applications can draw upon the resources that are available in the less restricted lower layers of the stack.
DevOps Research and Assessment (DORA) released their 2018 Accelerate report on the state of DevOps at the end of 2018; I’m a little behind in my reading, so I just got around to reading it, and trying to figure out how to apply their findings to the infrastructure (networking) side of the world.
DORA found organizations that outsource entire functions, such as building an entire module or service, tend to perform more poorly than organizations that outsource by integrating individual developers into existing internal teams (page 43). It is surprising companies still think outsourcing entire functions is a good idea, given the many years of experience the IT world has with the failures of this model. Outsourced components, it seems, too often become a bottleneck in the system, especially as contracts constrain your ability to react to real-world changes.