Section 10 Routing Loops
A (long) time ago, a reader asked me about RFC4456, section 10, which says:
Care should be taken to make sure that none of the BGP path attributes defined above can be modified through configuration when exchanging internal routing information between RRs and Clients and Non-Clients. Their modification could potentially result in routing loops. In addition, when a RR reflects a route, it SHOULD NOT modify the following path attributes: NEXT_HOP, AS_PATH, LOCAL_PREF, and MED. Their modification could potentially result in routing loops.
On first reading, this seems a little strange—how could modifying the next hop, Local Preference, or MED at a route reflector cause a routing loop? While contrived, the following network illustrates the principle.
Note the best path, from an IGP perspective, from C to E is through B, and the best path, from an IGP perspective, from B to D is through C. In this case, a route is advertised over eBGP from F towards E and D. These two eBGP speakers, in turn, advertise the route to their iBGP neighbors, B and C. Both B and C are route reflectors, so they both reflect the route on to A, which advertises the route to some other eBGP speaker outside AS65000 (not shown in the network diagram). In this case, assume the best path (for whatever reason) should be the route learned through D.
What happens if C changes the next hop for the route so it points to E rather than D? This should be fine, at first glance; when E receives traffic for the destination reachable through F, it will use the local eBGP route learned from F directly to forward the traffic. But there is a subtle problem here. Assume A receives both routes, one from B with a next hop of D, and one from C with a next hop of E. A, for whatever reason, chooses the path with a next hop of D. The best path to D, according to the IGP metrics, is through C, so A forwards the traffic to C.
C, however, has been configured to set the next hop to E through a local configuration. The best IGP path to E is through B, so C will forward the traffic towards B to be forwarded to E. B, however, has a next hop towards this destination of D, so when it receives packets destined beyond F in AS65001, it will examine its local routing table for the best path towards D, and find this is through C. Hence, B will forward the traffic to C to be forwarded towards D.
Thus a routing loop is formed because the best IGP path towards the next hop always points through another router with a next hop that points back to the router forwarding the traffic. The problem is B and C have inconsistent bestpaths, such that they each think the bestpath is through one another.
This is, of course, an artifact of overlaying two different control planes, each with their own rules about how to determine a loop free path to any given destination. This sort of problem can arise with any pair of control planes overlaid in this way.
What about MED, Local Preference, or the AS Path? C could modify any of these while reflecting the route to cause E to be chosen as the best exit point locally, while B and A continue to choose D as the best exit point. Any of these, then, can be used to create a routing loop in this topology.
Again, this is a somewhat contrived example, but if a loop can be contrived, then it will likely show up in more complex (and not-so-contrived) networks in the real world. It would be much easier to create a loop with a hierarchical route reflector, or even by causing an inconsistent route advertisement on the AS edge (two different eBGP speakers advertising different paths to a given destination reachable through the local AS).
I am curious how these things are discovered. You said that this is a contrived example, but I assume researchers have some sort of methodology to discover issues like this. I am sure some things have been found through operational mishap, but is there some “standardized” way of testing graph logic for the possibility of loops? I trust this is much easier to do today than even a decade ago.
Nice, good question. I would expect that this can be done only through simulation/emulation since all these caveats and details are very specific to the IOS operation/configuration of specific device. But let’s hope there is some feedback