Why is the Feasibility Condition Less Than?
A reader recently emailed me with this question: Why isn’t the condition for a Feasible Successor set to less than (<), rather than less than of equal (<=), in EIGRP? It certainly seems, as noted in the email, that this rules out a lot of possible possible loop free alternate paths. The network below will be used to illustrate.
First, assume all links are cost of 1 except D->C, which is cost of 2. Here D will choose B as the Successor, and the FC will be set to 2. The RD of C will be 1, so C will be an FS. Now consider two failures. The first failure is D->B. D will immediately reroute to the FS, which is C, without changing the FC. This works, because C’s cost to 100::/64 via D is 4, much higher than it’s cost to 100::64 along C->A. Now consider what happens if A->100::/64 fails. If the timing of the query “works right,” C and B will be notified first, then finally D. Even if D is somehow notified before C, and D switches to C as its FS, the traffic is dropped, rather than looped—so all is happy.
Now change the situation a little. Assume the A->C link is cost of 2, and the remaining links are 1. Now assume you make the FS condition <=, rather than "just" <. From D's perspective, C is still an FS. From C's perspective, D is also an FS. Suppose the B->D link fails. D switches to C, who’s path is intact; this works. Assume the C->A link fails. C switches to D, who’s path is intact; this works.
Finally, assume the A->100::/64 link fails. If the timing is just right, D and C will receive the update with this failure at the same moment, and will both switch to their FS’s. Now you have a loop. How long will this loop last? Until C and D can do a diffusing update — probably around 250ms or less… But if you count the outside computation time, it’s the SIA timer, which is around 10 minutes in more recent versions. Hence, “just the wrong circumstances” can cause up to a 10 minute loop. Not good.
The bottom line is this: any time you have a situation where two routers can end up pointing at one another as their local FS, you have a ring of some number of hops. If the final destination is outside the ring, some member of the ring must be the point at which traffic leaves the ring, and moves towards the destination. If the link connecting the ring to the destination fails, the update carrying the information about the loss of connectivity to the destination must travel around the ring in both directions.
When the update reaches the point at which routing would normally split horizon—the “point at which the waterfall splits,” so-to-speak, it will like reach both sides of the split horizon point at a close enough interview to cause a loop in the forwarding tables. This situation causes microloops in a link state protocol, but microloops are often resolved quickly, and hence tend to be tolerable. In a distance vector protocol, like EIGRP, the length of time the microloop can exist can be much longer—ultimately it depends on the speed at which a distributed computation can take place (because the computation is not local to each node), and, failing that, the amount of time the network can remain in an unstable state before “something is done about it.”
This, by the way, is why I am always opposed to increasing the SIA timer in EIGRP above the “factory defaults.” The SIA timer is essentially the amount of time you are willing to allow your network to remain unconverged in the worst case, and hence either dropping or looping traffic.
I believe dropping the traffic is better than looping traffic all the time until it’s dropped eventually.
Don’t you think setting the SIA timer to a very minimal value allows Router A to send EIGRP update packets to its neighbors to remove the network from their routing table and hence speeding up convergence as well in such scenario?! Thanks for your valuable advise!