Leaky Abstractions

Much of our life, as engineers, is about building, manipulating, and using abstractions. For instance, C is nothing but an abstraction on top of the actual register set provided by a particular processor. HTML is nothing but an abstraction for formatting and display (a markup language), implemented in — well, C. There is a lot of power in such abstractions, of course. Without them we couldn’t build operating systems, applications, browsers, web pages — or networks.

Ethernet is an abstraction of electronic signals (anyone remember Manchester Encoding?). IP is an abstraction of every physical layer in the world. TCP is an simulation, or abstraction, of a reliable connection oriented link over (completely unreliable) IP. HTTP is an abstraction of a flow of information, a stream, between two computers. It’s all abstractions — as the philosopher might say, “it’s abstractions all the way down.” So what’s wrong with this?

All abstractions are leaky. What do I mean when I say abstractions are leaky? Let’s turn to the originator of the phrase, Joel Spolsky:

Abstractions fail. Sometimes a little, sometimes a lot. There’s leakage. Things go wrong. It happens all over the place when you have abstractions.

This is the “other side” of the coin of unintended consequences, so to speak — or perhaps the underlying “root cause” of “Murphy’s Law.” The problem is, of course, that networks are one huge abstraction, layer to layer and end to end. Let’s consider a few examples:

  • Summarization. We use summarization to break up failure domains, essentially by stopping the spread of changes in reachability and topology through the entire network. Where is the leak? First, summarization almost always results in suboptimal paths through the network. Second, summarization can hide reachability problems that really need to be exposed, and seeking out a forwarding problem across a summarization boundary can increase mean time to repair in a major way.
  • Virtualization. We use virtual topologies in much the same way we do summarization, to break up failure domains — and beyond summarization, to contain control plane and traffic state within a subset of the network. Where is the leak? Shared fate risk groups. Although this isn’t something you hear a lot about today, a single failure in the network can impact hundreds/thousands of topologies in a highly virtualized environment. These types of failures can be difficult to plan for and to understand.

Perhaps another one should be the concept of a control plane at all. We build protocols that abstract the topology in a way that makes it easy to automatically find the shortest path. But those protocols, themselves, leak — they must be understood, and troubleshot, and managed. At each point you see a leaky abstraction, you need to understand the underlying protocol, process, or packet flow to effectively deploy, manage, and troubleshoot the implementation of that abstraction.

Ten years ago, we might have imagined that new programming paradigms would have made programming easier by now. Indeed, the abstractions we’ve created over the years do allow us to deal with new orders of complexity in software development that we didn’t have to deal with ten or fifteen years ago, like GUI programming and network programming. And while these great tools, like modern OO forms-based languages, let us get a lot of work done incredibly quickly, suddenly one day we need to figure out a problem where the abstraction leaked, and it takes 2 weeks.

Replace “programming,” with networking, and you get the same result. A crucial skill to add to your engineering toolbox, then, is to be able to see the abstractions, and figure out where and how they leak — to be able to look through the abstraction when needed, and see how the underlying network really works.