CAP Theorem and Routing
In 2000, Eric Brewer was observing and discussing the various characteristics of database systems. Through this work, he observed that a database generally has three characteristics—
-
- Consistency, which means the database will produce the same result for any two readers who happen to read a value at the same moment in time.
- Availability, which means the database can be read by any reader at any moment in time.
- Partion Tolerance, which means the database can be partitioned.
Brewer, in explaining the relationship between the three in a 2012 article, says—
The CAP theorem, therefore, represents a two out of three situation—yet another two out of three “set” we encounter in the real world, probably grounded someplace in the larger space of complexity. We’ll leave the relationship to complexity on the side for the moment, however, and just look at how CAP relates to routing.
Network engineers familiar with the concepts of complexity, and their application in the network design realm, should quickly realize that this “two out of three” paradigm almost never represents a set of “hard and fast realities,” but is better represented as a set of continuums along which a system designer can choose. Moving towards one pole in the two out of three space inevitably means giving up some measure along one of the other two; the primary choices you can make are which one and how much.
One more detail is important before diving into a deeper discussion: A partition is normally thought of as a hard divide in database theory, but for the purposes of explaining how CAP applies to routing, I’m going to soften the definition a little. Instead of strict partitions, I want to consider partitioning as relating to the distribution of the database across a set of nodes separated by a network.
So how does CAP apply to routing?
To see the application, let’s ignore the actual algorithm used to compute the best path in any given routing protocol—whether Dijkstra’s SPF, or Garcia’s DUAL, or path vector—and focus, for a moment, on just getting the information required to compute loop free paths to every router running in a single failure (and/or flooding) domain. This latter point is important, because failure and flooding domains actually represent an intentional form of information hiding that causes the database to be correct for the purposes of routing (most of the time), but inconsistent in strict database consistency terms on a permanent basis. We need, therefore, to treat aggregation and summarization as a separate topic when considering how CAP impacts routing. Since the tradeoffs are easiest to see in link state protocols, we’re actually going to focus in that area.
Once we’ve softened the definition of partition tolerance, it’s immediately obvious that all forms of flooding, for instance, are simply the synchronization process of a highly partitioned database. Further, we can observe that the database flooded through the network must be, largely, always available (as in always able to be read by a local device). Given CAP, we should immediately suspect there is some sort of consistency tradeoff, as we’ve already defined two of the three legs of the two out of three set pretty strongly.
And, in fact, we do. In link state protocols, the result of inconsistency is most easily seen in the problem of microloops during convergence. To understand this, let’s look at the network just below.
Assume, for a moment, that the link between routers A and B fails. At this moment, let’s assume:
-
-
- The best path for B is through A
- The best path for C is through B
- The best path for D is through C
-
Let’s also assume that each router in the network takes precisely the same amount of time to flood a packet, and to run SPF. When the link fails, C is going to detect the failure immediately, and hence will calculate SPF first. If we assume that D can only process this information about the topology change once it’s received an update to its database, then we can see where C will always process the new information before D.
But we can also observe this point: once C has processed the topology change—C has run SPF—router C will point at router D for its new best path. Until router D receives the new topology information and runs SPF, D will continue to point to C as the best path.
This is the microloop.
It happens any time there is a ring in the topology of more than four hops with any link state protocol. And it’s a direct result of the relationship between consistency and availability in the distributed database. We can verify this by examining the way in which the microloop can be resolved—specifically, by using ordered FIB.
In ordered FIB, each router in the network calculates an amount of time it should wait before installing newly calculated routes. This calculation is, in turn, based on the number of routers relying on the local router for the best path. If you back up and think through how the microloop happens, you can see that what’s happening is C is now waiting for D to calculate before installing the new route into its local forwarding table.
In CAP terms, what ordered FIB does is reduce availability some small amount—in effect by not allowing the forwarding plane at C to read the newly calculated routes for some period of time—in order to ensure consistency in the database. This observation should confirm microloops are related to the relationship between CAP and distributed routing.
There are a number of other relationships between CAP and routing; I’ll put some of these on my list for later posts—but this one has reached the 1000 word mark, and so is long enough for a starter on the topic.
Interesting article, especially since it’s not often that distributed computing concepts are applied to detailed networking issues.
I have a higher level question, though. Would you please explain what you mean by “Instead of strict partitions, I want to consider partitioning as relating to the distribution of the database across a set of nodes separated by a network.”
“Partition” in CAP means a network disruption. Saying “distribution of the database across a set of nodes separated by a network” sounds as if you may be using “partition” to mean a subset of nodes containing replicated data, where all the nodes are connected by an unreliable network. This is a common use of “partition,” too, but different than the other.
I’m using partitions in the second sense rather than the first, as you’ve surmised. My reasoning is this: there is actually no difference, from an application perspective, between a network that is temporarily unavailable separating two copies of a single database, and the simple timing artifacts of distributing a database in near real time. A distributed database is still “partitioned,” even if the word is not used in the traditional CAP meaning. The time it takes to synchronize the database has the same impact on its consistency in either case, it seems to me.
The only difference is we know the database will be eventually consistent if the network is working correctly. If the network is not working correctly, the time might be longer, or the partition may be permanent.
🙂
Russ