The 4D Network

I think we can all agree networks have become too complex—and this complexity is a result of the network often becoming the “final dumping ground” of every problem that seems like it might impact more than one system, or everything no-one else can figure out how to solve. It’s rather humorous, in fact, to see a lot of server and application folks sitting around saying “this networking stuff is so complex—let’s design something better and simpler in our bespoke overlay…” and then falling into the same complexity traps as they start facing the real problems of policy and scale.

This complexity cannot be “automated away.” It can be smeared over with intent, but we’re going to find—soon enough—that smearing intent on top of complexity just makes for a dirty kitchen and a sub-standard meal.

While this is always “top of mind” in my world, what brings it to mind this particular week is a paper by Jen Rexford et al. (I know Jen isn’t on the lead position in the author list, but still…) called A Clean Slate 4D Approach to Network Control and Management. Of course, I can appreciate the paper in part because I agree with a lot of what’s being said here. For instance—

We believe the root cause of these problems lies in the control plane running on the network elements and the management plane that monitors and configures them. In this paper, we argue for revisiting the division of functionality and advocate an extreme design point that completely separates a network’s decision logic from the protocols that govern interaction of network elements.

In other words, we’ve not done our modularization homework very well—and our lack of focus on doing modularization right is adding a lot of unnecessary complexity to our world. The four planes proposed in the paper are decision, dissemination, discovery, and data. The decision plane drives network control, including reachability, load balancing, and access control. The dissemination plane “provides a robust and efficient communication substrate” across which the other planes can send information. The discovery plane “is responsible for discovering the physical components of the network,” giving each item an identifier, etc. The data plane carries packets edge-to-edge.

I do have some quibbles with this architecture, of course. To begin, I’m not certain the word “plane” is the right one here. Maybe “layers,” or something else that implies more of a modular concept with interactions, and less a “ships in the night” sort of affair. My more substantial disagreement is with the placement of “interface configuration” and where reachability is placed in the model.

Consider this: reachability and access control are, in a sense, two sides of the same coin. You learn where something is to make it reachable, and then you block access to it by hiding reachability from specific places in the network. There are two ways to control reachability—by hiding the destination, or by blocking traffic being sent to the destination. Each of these has positive and negative aspects.

But notice this paradox—the control plane cannot hide reachability towards something it does not know about. You must know about something to prevent someone from reaching it. While reachability and access control are two sides of the same coin, they are also opposites. Access control relies on reachability to do its job.

To solve this paradox, I would put reachability into discovery rather than decision. Discovery would then become the discovery of physical devices, paths, and reachability through the network. No policy would live here—discovery would just determine what exists. All the policy about what to expose about what exists would live within the decision plane.

While the paper implies this kind of system must wait for some day in the future to build a network using these principles, I think you can get pretty close today. My “ideal design” for a data center fabric right now is (modified) IS-IS or RIFT in the underlay and eVPN in the overlay, with a set of controllers sitting on top. Why?

IS-IS doesn’t rely on IP, so it can serve in a mostly pure discovery role, telling any upper layers where things are, and what is changing. eVPN can provide segmented reachability on top of this, as well as any other policies. Controllers can be used to manage eVPN configuration to implement intent. A separate controller can work with IS-IS on inventory and lifecycle of installed hardware. This creates a clean “break” between the infrastructure underlay and overlays, and pretty much eliminates any dependence the underlay has on the overlay. Replacing the overlay with a more SDN’ish solution (rather than BGP) is perfectly do-able and reasonable.

While not perfect, the link-state underlay/BGP overlay model comes pretty close to implementing what Jen and her co-authors are describing, only using protocols we have around today—although with some modification.

But the main point of this paper stands—a lot of the reason for the complexity we fact today is simply because we modularize using aggregation and summarization and call the job “done.” We aren’t thinking about the network as a system, but rather as a bunch of individual appliances we slap together into places, which we then connect through other places (or strings, like between tin cans).

Something to think about the next time you get that 2AM call.