It’s time for a short lecture on complexity…

It’s time for a short lecture on complexity.

Networks are complex. This should not be surprising, as building a system that can solve hard problems, while also adapting quickly to changes in the real world, requires complexity—the harder the problem, the more adaptable the system needs to be, the more resulting design will tend to be. Networks are bound to be complex, because we expect them to be able to support any application we throw at them, adapt to fast-changing business conditions, and adapt to real-world failures of various kinds.

There are several reactions I’ve seen to this reality over the years, each of which has their own trade-offs.

The first is to cover the complexity up with abstractions. Here we take a massively complex underlying system and “contain” it within certain bounds so the complexity is no longer apparent. I can’t really make the system simpler, so I’ll just make the system simpler to use. We see this all the time in the networking world, including things like intent driven replacing the command line with a GUI, and replacing the command line with an automation system. The strong point of these kinds of solutions is they do, in fact, make the system easier to interact with, or (somewhat) encapsulate that “huge glob of legacy” into a module so you can interface with it in some way that is not… legacy.

One negative side of these kinds of solutions, however, is that they really don’t address the complexity, they just hide it. Many times hiding complexity has a palliative effect, rather than a real world one, and the final state is worse than the starting state. Imagine someone who has back pain, so they take pain-killers, and then go back to the gym to life even heavier weights than they have before. Covering the pain up gives them the room to do more damage to their bodies—complexity, like pain, is sometimes a signal that something is wrong.

Another negative side effect of this kind of solution is described by the law of leaky abstractions: all nontrivial abstractions leak. I cannot count the number of times engineers have underestimated the amount of information that leaks through an abstraction layer and the negative impacts such leaks will have on the overall system.

The second solution I see people use on a regular basis is to agglutinate multiple solutions into a single solution. The line of thinking here is that reducing the number of moving parts necessarily makes the overall system simpler. This is actually just another form of abstraction, and it normally does not work. For instance, it’s common in data center designs to have a single control plane for both the overlay and underlay (which is different than just not having an overlay!). This will work for some time, but at some level of scale it usually creates more complexity, particularly in trying to find and fix problems, than it solves in reducing configuration effort.

As an example, consider if you could create some form of wheel for a car that contained its own little engine, braking system, and had the ability to “warp” or modify its shape to produce steering effects. The car designer would just provide a single fixed (not moving) attachment point, and let the wheel do all the work. Sounds great for the car designer, right? But the wheel would then be such a complex system that it would be near impossible to troubleshoot or understand. Further, since you have four wheels on the car, you must somehow allow them to communicate, as well as having communication to the driver to know what to do from moment to moment, etc. The simplification achieved by munging all these things into a single component will ultimately be overcome by complexity built around the “do-it-all” system to make the whole system run.

Or imagine a network with a single transport protocol that does everything—host-to-host, connection-oriented, connectionless, encrypted, etc. You don’t have to think about it long to intuitively know this isn’t a good idea.

An example for the reader: Geoff Huston joins the Hedge this week to talk about DNS over HTTPS. Is this an example of munging systems together than shouldn’t be munged together? Or is this a clever solution to a hard problem? Listen to the two episodes and think it through before answering—because I’m not certain there is a clear answer to this question.

Finally, what a lot of people do is toss the complexity over the cubicle wall. Trust me, this doesn’t work in the long run–the person on the other side of the wall has a shovel, too, and they are going to be pushing complexity at you as fast as they can.

There are no easy solutions to solving complexity. The only real way to deal with these problems is by looking at the network as part of a larger system including applications, the business environment, and many other factors. Then figure out what needs to be done, how to divide the work up (where the best abstraction points are), and build replaceable components that can solve each of these problems while leaking the least amount of information, and are internally as simple as possible.

Every other path leads to building more complex, brittle systems.