Design Intelligence from the Hourglass Model

Over at the Communications of the ACM, Micah Beck has an article up about the hourglass model. While the math is quite interesting, I want to focus on transferring the observations from the realm of protocol and software systems development to network design. Specifically, start with the concept and terminology, which is very useful. Taking a typical design, such as this—

The first key point made in the paper is this—

The thin waist of the hourglass is a narrow straw through which applications can draw upon the resources that are available in the less restricted lower layers of the stack.

A somewhat obvious point to be made here is applications can only use services available in the spanning layer, and the spanning layer can only build those services out of the capabilities of the supporting layers. If fewer applications need to be supported, or the applications deployed do not require a lot of “fancy services,” a weaker spanning layer can be deployed. Based on this, the paper observes—

The balance between more applications and more supports is achieved by first choosing the set of necessary applications N and then seeking a spanning layer sufficient for N that is as weak as possible. This scenario makes the choice of necessary applications N the most directly consequential element in the process of defining a spanning layer that meets the goals of the hourglass model.

Beck calls the weakest possible spanning layer to support a given set of applications the minimally sufficient spanning layer (MSSL). There is one thing that seems off about this definition, however—the correlation between the number of applications supported and the strength of the spanning layer. There are many cases where a network supports thousands of applications, and yet the network itself is quite simple. There are many other cases where a network supports just a few applications, and yet the network is very complex. It is not the number of applications that matter, it is the set of services the applications demand from the spanning layer.

Based on this, we can change the definition slightly: an MSSL is the weakest spanning layer that can provide the set of services required by the applications it supports. This might seem intuitive or obvious, but it is often useful to work these kinds of intuitive things out, so they can be expressed more precisely when needed.

First lesson: the primary driver in network complexity is application requirements. To make the network simpler, you must reduce the requirements applications place on the network.

There are, however, several counter-intuitive cases here. For instance, TCP is designed to emulate (or abstract) a circuit between two hosts—it creates what appears to be a flow controlled, error free channel with no drops on top of IP, which has no flow control, and drops packets. In this case, the spanning layer (IP), or the wasp waist, does not support the services the upper layer (the application) requires.

In order to make this work, TCP must add a lot of complexity that would normally be handled by one of the supporting layers—in fact, TCP might, in some cases, recreate capabilities available in one of the supporting layers, but hidden by the spanning layer. There are, as you might have guessed, tradeoffs in this neighborhood. Not only are the mechanisms TCP must use more complex that the ones some supporting layer might have used, TCP represents a leaky abstraction—the underlying connectionless service cannot be completely hidden.

Take another instance more directly related to network design. Suppose you aggregate routing information at every point where you possibly can. Or perhaps you are using BGP route reflectors to manage configuration complexity and route counts. In most cases, this will mean information is flowing through the network suboptimally. You can re-optimize the network, but not without introducing a lot of complexity. Further, you will probably always have some form of leaky abstraction to deal with when abstracting information out of the network.

Second lesson: be careful when stripping information out of the spanning layer in order to simplify the network. There will be tradeoffs, and sometimes you end up with more complexity than what you started with.

A second counter-intuitive case is that of adding complexity to the supporting layers in order to ultimately simplify the spanning layer. It seems, on the model presented in the paper, that adding more services to the spanning layer will always end up adding more complexity to the entire system. MPLS and Segment Routing (SR), however, show this is not always true. If you need traffic steering, for instance, it is easier to implement MPLS or SR in the support layer rather than trying to emulate their services at the application level.

Third lesson: sometimes adding complexity in a lower layer can simplify the entire system—although this might seem to be counter-intuitive from just examining the model.

The bottom line: complexity is driven by applications (top down), but understanding the full stack, and where interactions take place, can open up opportunities for simplifying the overall system. The key is thinking through all parts of the system carefully, using effective mental models to understand how they interact (interaction surfaces), and the considering the optimization tradeoffs you will make by shifting state to different places.