JJ Garcia is Distinguished Professor of Computer Science and Engineering and Jack Baskin Endowed Chair of Computer Engineering at USC Santa Cruz. He first became involved in packet networks in the 1980’s, eventually inventing the Diffusing Update Algorithm (DUAL), which later became the basis for the Enhanced Interior Gateway Protocol (EIGRP). You can find more information about Dr. Garcia at his personal page.
A reader recently emailed me with this question: Why isn’t the condition for a Feasible Successor set to less than (<), rather than less than of equal (<=), in EIGRP? It certainly seems, as noted in the email, that this rules out a lot of possible possible loop free alternate paths. The network below will be used to illustrate.
First, assume all links are cost of 1 except D->C, which is cost of 2. Here D will choose B as the Successor, and the FC will be set to 2. The RD of C will be 1, so C will be an FS. Now consider two failures. The first failure is D->B. D will immediately reroute to the FS, which is C, without changing the FC. This works, because C’s cost to 100::/64 via D is 4, much higher than it’s cost to 100::64 along C->A. Now consider what happens if A->100::/64 fails. If the timing of the query “works right,” C and B will be notified first, then finally D. Even if D is somehow notified before C, and D switches to C as its FS, the traffic is dropped, rather than looped—so all is happy.
Now change the situation a little. Assume the A->C link is cost of 2, and the remaining links are 1. Now assume you make the FS condition <=, rather than "just" <. From D's perspective, C is still an FS. From C's perspective, D is also an FS. Suppose the B->D link fails. D switches to C, who’s path is intact; this works. Assume the C->A link fails. C switches to D, who’s path is intact; this works.
Finally, assume the A->100::/64 link fails. If the timing is just right, D and C will receive the update with this failure at the same moment, and will both switch to their FS’s. Now you have a loop. How long will this loop last? Until C and D can do a diffusing update — probably around 250ms or less… But if you count the outside computation time, it’s the SIA timer, which is around 10 minutes in more recent versions. Hence, “just the wrong circumstances” can cause up to a 10 minute loop. Not good.
The bottom line is this: any time you have a situation where two routers can end up pointing at one another as their local FS, you have a ring of some number of hops. If the final destination is outside the ring, some member of the ring must be the point at which traffic leaves the ring, and moves towards the destination. If the link connecting the ring to the destination fails, the update carrying the information about the loss of connectivity to the destination must travel around the ring in both directions.
When the update reaches the point at which routing would normally split horizon—the “point at which the waterfall splits,” so-to-speak, it will like reach both sides of the split horizon point at a close enough interview to cause a loop in the forwarding tables. This situation causes microloops in a link state protocol, but microloops are often resolved quickly, and hence tend to be tolerable. In a distance vector protocol, like EIGRP, the length of time the microloop can exist can be much longer—ultimately it depends on the speed at which a distributed computation can take place (because the computation is not local to each node), and, failing that, the amount of time the network can remain in an unstable state before “something is done about it.”
This, by the way, is why I am always opposed to increasing the SIA timer in EIGRP above the “factory defaults.” The SIA timer is essentially the amount of time you are willing to allow your network to remain unconverged in the worst case, and hence either dropping or looping traffic.
Reading a paper to build a research post from (yes, I’ll write about the paper in question in a later post!) jogged my memory about an old case that perfectly illustrated the concept of a positive feedback loop leading to a failure. We describe positive feedback loops in Computer Networking Problems and Solutions, and in Navigating Network Complexity, but clear cut examples are hard to find in the wild. Feedback loops almost always contribute to, rather than independently cause, failures.
Many years ago, in a network far away, I was called into a case because EIGRP was failing to converge. The immediate cause was neighbor flaps, in turn caused by Stuck-In-Active (SIA) events. To resolve the situation, someone in the past had set the SIA timers really high, as in around 30 minutes or so. This is a really bad idea. The SIA timer, in EIGRP, is essentially the amount of time you are willing to allow your network to go unconverged in some specific corner cases before the protocol “does something about it.” An SIA event always represents a situation where “someone didn’t answer my query, which means I cannot stay within the state machine, so I don’t know what to do—I’ll just restart the state machine.” Now before you go beating up on EIGRP for this sort of behavior, remember that every protocol has a state machine, and every protocol has some condition under which it will restart the state machine. IT just so happens that EIGRP’s conditions for this restart were too restrictive for many years, causing a lot more headaches than they needed to.
So the situation, as it stood at the moment of escalation, was that the SIA timer had been set unreasonably high in order to “solve” the SIA problem. And yet, SIAs were still occurring, and the network was still working itself into a state where it would not converge. The first step in figuring this problem out was, as always, to reduce the number of parallel links in the network to bring it to a stable state, while figuring out what was going on. Reducing complexity is almost always a good, if counterintuitive, step in troubleshooting large scale system failure. You think you need the redundancy to handle the system failure, but in many cases, the redundancy is contributing to the system failure in some way. Running the network in a hobbled, lower readiness state can often provide some relief while figuring out what is happening.
In this case, however, reducing the number of parallel links only lengthened the amount of time between complete failures—a somewhat odd result, particularly in the case of EIGRP SIAs. Further investigation revealed that a number of core routers, Cisco 7500’s with SSE’s, were not responding to queries. This was a particularly interesting insight. We could see the queries going into the 7500, but there was no response. Why?
Perhaps the packets were being dropped on the input queue of the receiving box? There were drops, but not nearly enough to explain what we were seeing. Perhaps the EIGRP reply packets were being dropped on the output queue? No—in fact, the reply packets just weren’t being generated. So what was going on?
After collecting several
show tech outputs, and looking over them rather carefully, there was one odd thing: there was a lot of free memory on these boxes, but the largest block of available memory was really small. In old IOS, memory was allocated per process on an “as needed basis.” In fact, processes could be written to allocate just enough memory to build a single packet. Of course, if two processes allocate memory for individual packets in an alternating fashion, the memory will be broken up into single packet sized blocks. This is, as it turns out, almost impossible to recover from. Hence, memory fragmentation was a real thing that caused major network outages.
Here what we were seeing was EIGRP allocating single packet memory blocks, along with several other processes on the box. The thing is, EIGRP was actually allocating some of the largest blocks on the system. So a query would come in, get dumped to the EIGRP process, and the building of a response would be placed on the work queue. When the worker ran, it could not find a large enough block in which to build a reply packet, so it would patiently put the work back on its own queue for future processing. In the meantime, the SIA timer is ticking in the neighboring router, eventually timing out and resetting the adjacency.
Resetting the adjacency, of course, causes the entire table to be withdrawn, which, in turn, causes… more queries to be sent, resulting in the need for more replies… Causing the work queue in the EIGRP process to attempt to allocate more packet sized memory blocks, and failing, causing…
You can see how this quickly developed into a positive feedback loop—
- EIGRP receives a set of queries to which it must respond
- EIGRP allocates memory for each packet to build the responses
- Some other processes allocate memory blocks interleaved with EIGRP’s packet sized memory blocks
- EIGRP receives more queries, and finds it cannot allocate a block to build a reply packet
- EIGRP SIA timer times out, causing a flood of new queries…
Rinse and repeat until the network fails to converge.
There are two basic problems with positive feedback loops. The first is they are almost impossible to anticipate. The interaction surfaces between two systems just have to be both deep enough to cause unintended side effects (the law of leaky abstractions almost guarantees this will be the case at least some times), and opaque enough to prevent you from seeing the interaction (this is what abstraction is supposed to do). There are many ways to solve positive feedback loops. In this case, cleaning up the way packet memory was allocated in all the processes in IOS, and, eventually, giving the active process in EIGRP an additional, softer, state before it declared a condition of “I’m outside the state machine here, I need to reset,” resolved most of the incidents of SIA’s in the real world.
But rest assured—there are still positive feedback loops lurking in some corner of every network.
In episode 5 the Network Collective panel dives deep into the inner-workings of EIGRP and how to tune the protocol to work best for you. This isn’t your run of the mill EIGRP training session though, so buckle up and dig in to learn a lot about a protocol which appears pretty straight forward on the surface.
This last week I was on the Network Collective discussing EIGRP with Nick Russo; even if you think this protocol is dead, it’s well worth watching or listening to. And if this isn’t enough EIGRP for you, the EIGRP book on Addision-Wesley is another good resource.