CONTENT TYPE
Thoughts on Auto Disaggregation and Complexity

Way in the past, the EIGRP team (including me) had an interesting idea–why not aggregate routes automatically as much as possible, along classless bounds, and then deaggregate routes when we could detect some failure was causing a routing black hole? To understand this concept better, consider the network below.

In this network, B and C are connected to four different routers, each of which is advertising a different subnet. In turn, B and C are aggregating these four routes into 2001:db8:3e8:10::/60, and advertising this aggregate towards A. From a control plane state perspective, this is a major win. The obvious gain is that the amount of state is reduced from four routes to one. The less obvious gain is A doesn’t need to know about any changes in the state for the four destinations aggregated into the /60. Depending on how often these links change state, the reduction in the rate of change is, perhaps, more important than the reduction in the amount of control plane state.
We always know there will be a tradeoff when reducing state; what is the tradeoff here? If C somehow loses its connection to one of the four routers, say the router advertising 11::/64, C’s 10::/60 aggregate will not change. Since A thinks C still has a route to every subnet within 10::/60, it will continue sending traffic destined to addresses in the 11::/64 towards both B and C. C will not have a route towards these destinations, so it will drop the traffic.
We have a routing black hole.
This much is pretty simple. The harder part is figuring out to eliminate this routing black hole. Our first choice is to just not aggregate these routes. While you might be cringing right now, this isn’t such a bad option in many networks. We often underestimate the amount of state and the speed of state change modern routing protocols running on modern processors can support. I’ve seen networks running IS-IS in a single flooding domain with tens of thousands of routes and thousands of nodes running “in the wild.” I’ve seen IS-IS networks with thousands of nodes and hundreds of thousands of routes running in lab environments. These networks still converge.
But what if we really think we need to reduce the amount and speed of state, so we really need to aggregate these routes?
One solution that has been proposed a number of times through the years is auto disaggregation.
In this case, suppose D somehow realizes C cannot reach one of the components of a shared aggregate route. D could simply stop advertising the aggregate, advertising each of the components instead. The question here might be: is this a good idea? Looking at this from the perspective of the SOS triad, the aggregation replaced four routes with a single route. In the auto disaggregation case, the single route change is replaced by four route changes. The amount of state is variable, and in some cases the rate of change in state is actually higher than without the aggregation.
So…
I don’t hold that auto disaggregation is either good nor bad—it just presents a different set of challenges to the network designer. Instead of designing for average rates of change and given table sizes, you can count on much smaller tables, but you might find there are times when the rate of change is dramatically higher than you expect. A good question to ask, before deploying this kind of technology, might be: can I forsee a chain of events that will cause a high enough rate of state change that auto disaggregation is actually more destabilizing than just not summarizing at all in this network?
A real danger with auto disaggregation, by the way, is using summarization to dramatically reduce table sizes without understanding how a goldilocks failure (what we used to call in telco a mother’s day event, or perhaps a black swan) can cascade into widespread failures. If you’re counting on particular devices in your network only have a dozen or two dozen table entries, but just the right set of failures can cause them to have several thousand entries because of auto disaggregation, what kinds of failures modes should you anticipate? Can you anticipate or mitigate this kind of problem?
The idea of automatically summarizing and disaggregating routes is an interesting study in complexity, state, and optimization. It’s a good brain exercise in thinking through what-if situations, and carefully thinking about when and where to deploy this kind of thing.
What do you think about this idea? When would you deploy it, where, and why? When and where would you be cautious about deploying this kind of technology?
Hedge 108: In Defense of Boring Technology with Andrew Wertkin

Engineers (and marketing folks) love new technology. Watching an engineer learn or unwrap some new technology is like watching a dog chase a squirrel—the point is not to catch the squirrel, it’s just that the chase is really fun. Join Andrew Wertkin (from BlueCat Networks), Tom Ammon, and Russ White as we discuss the importance of simple, boring technologies, and moderating our love of the new.
Hedge 107: Career Advice with Terry Slattery
Whether you’re just starting in your technology career, or you’re an old hand who likes to go back to basics and understand how to move forward in your career, this episode of the Hedge is for you. Terry Slattery joins Tom Ammon and Russ White to discuss the things you can do to build a successful career as in the world of network engineering.
Hedge 106: Compositional Network Modeling and Zen
One topic of constant discussion among network engineers is the basic problems surrounding network modeling, which leads to configuration, telemetry, and troubleshooting. In this episode of the Hedge, Ryan Beckett, Tom Ammon, and Russ White discuss Zen, a general framework for compositional network modelling.
Hedge 105: Johan Gustawsson and Changing Provider Architectures
Many service providers have the feeling that they “didn’t do anything wrong, but somehow we still lost.” How are providers reacting to the massive changes in the networking field, and how are they trying to regain their footing so they can move into the coming decades better positioned to compete? Join Johan Gustawsson, Tom Ammon, and Russ White as we discuss the impact of merchant silicon and changing applications on the architecture of service providers.
Hedge 104: Automation with David Gee
Automation is often put forward as the answer to all our problems—but without a map, how can we be certain we are moving in the right direction? David Gee joins Tom Ammon and Russ White on this episode of the Hedge to talk about automata without a map. Where did we come from, what are we doing with automation right now, and what do we need to do to map out a truly better future?
Hedge 103: BGP Security with Geoff Huston
Our community has been talking about BGP security for over 20 years. While MANRS and the RPKI have made some headway in securing BGP, the process of deciding on a method to provide at least the information providers need to make more rational decisions about the validity of individual routes is still ongoing. Geoff Huston joins Alvaro, Russ, and Tom to discuss how we got here and whether we will learn from our mistakes.
download
