BGP and Suboptimal Route Reflection

One of the crucial points in understanding the operation of BGP is the reliance on the AS path to ensure all routes are loop-free. Within a single AS, however, there is no AS path. How, then, can you ensure the path through an AS is loop-free? The original plan was to fully mesh all the BGP speakers in the AS (a full mesh of iBGP speakers)—but building and maintaining a full mesh of iBGP speakers is difficult, so other solutions were quickly designed. The first of these as the BGP Confederation, which allows a set of autonomous systems to look like a single AS from the outside. This solution, however, is also cumbersome, so… the RR was invented.


  • BGP RR’s abstract information in a way that can cause suboptimal routing
  • To resolve this suboptimal routing, additional paths are advertised to RRCs by RRs


The basic operation of an RR is fairy simple; as new attribute, the cluster list, is added to a route as is passes from client to server. The cluster list contains as list of the clusters the route has passed through, identified by the identifier of the route reflector that “heads” the cluster. If a route is advertised to an iBGP speaker with a cluster ID already on the cluster list, the route is ignored. In a sense, each cluster acts as a “sub-AS,” much like in a confederation, and the cluster ID acts as the sub-AS number. Loop-freeness is guaranteed by making certain the route is not advertised as reachable through the same RR cluster twice, much like eBGP loop-freeness is guaranteed by not advertising a route as reachable through the same AS twice.

Route reflectors emulate the eBGP construction in one other way, as well—only the best path from each cluster’s perspective is advertised. This improves scaling dramatically; with a full mesh of iBGP speakers, every BGP speaker will learn about every path available to reach a destination. If there are ten eBGP peers that can reach 2001:db8:3e8:100::/64, for instance, then every BGP speaker in the AS will know about all ten paths. If route reflectors are used instead of a full mesh of iBGP speakers, the RR server will choose the best exit point from the local AS, and advertise only that one route to each of its clients. An important point to note here: the best path is chosen from the perspective of the RR.

For instance, in this diagram, the best path to 2011:db8:3e8:100::/64 for B is through A. Since B is the route reflector, it will advertise only the path through A to its two clients, which are C and E. Because of this, E will not know about the path through C, even though this is a valid path—and a better path, from E’s perspective.

This is a classic case of aggregation and information hiding; B is removing some available routes towards its route reflector clients, which allows the control plane to scale. According to the state/optimization/surface triad, a reduction in the amount of state should have some effect on the optimization of traffic flow through the network. In this case, E has a suboptimal route.

How can this problem be solved? The most obvious way is to add more information back into the control plane. The solutions offered for this problem vary in which information is added back, and how that information is carried.

The first proposal for solving the suboptimal routing problem was for the RR to send all of its routes to each of its clients. Normally, if a BGP speaker sends a route for a destination, and then follows with another update about this same destination, the receiving speaker will discard the first advertisement. This is called an implicit withdraw. RFC7911 specifies a mechanism for a BGP speaker to carry more than one route for the same destination to a peer by including a Route Identifier (RID) with each route. The function of the route identifier is the same as the route distinguisher in a BGP-VPN deployment—to describe two different routes that happen to share the same destination prefix. By setting a different RID on each route, the reflector can advertise all of its route to its clients. The problem with this solution is that it “undoes” the scaling advantages conferred by the RR; every RRC receives every route. In large networks, the scaling and convergence speed differences can be dramatic.

It seems better to send only part of the routes, but which part? An alternative proposal is for the RR to compute the BGP bestpath from the perspective of each of its RRCs and send only the best route from that RRC’s perspective each RRC. There are several tradeoffs here, as well. First, the RRC must calculate the IGP path from each of its RRC’s in some way; this normally means running SPF from the perspective of each RRC. The RR must then run bestpath for each RRC and send the correct route to each one. Finally, the RRC must keep track of which route it has sent to each RRC, so it can determine when it needs to resend a new route based on changes in the network topology—both for changes in BGP and IGP reachability. This adds a lot of complexity to the BGP RR implementation, and creates a rather deep interaction surface between the IGP and BGP. It also just happens to assume there is an underlying link-state IGP in operation—which may not always be true.

Another option is for the RR to send the bestpath and the second-best path to each RRC. The assumption is one of these two paths will either be optimal, or close to optimal, for all RRCs. This is the solution most implementations have opted for, as it seems to strike a good balance between adding more control plane information, implementation complexity, and solving the suboptimal routing problem. Most implementations will allow the operator to send as many “second-best” paths as they desire to, which allows the operator to fine-tune optimization against state differently in diverse parts of their network.

Weekend Reads ‭18B62‬

Members of the EU Parliament voted to advance the new Copyright Directive, even though it contained two extreme and unworkable clauses: Article 13 (“Censorship Machines”) that would filter everything everyone posts to online platforms to see if matches a crowdsourced database of “copyrighted works” that anyone could add anything to; and Article 11 (“The Link Tax”), a ban on quote more than one word from an article when linking to them unless you are using a platform that has paid for a linking license. The link tax provision allows, but does not require, member states to create exceptions and limitations to protect online speech. —Cory Doctorow @EFF

So it is my belief that 5G will initially be used by industries in constrained, “safe” environments. In these environments, operators and the industries involved will learn what is possible and what the limitations are. Then, if necessary, new requirements can be generated that allow the mobile technology to meet all the needs of the mission critical businesses. If these new requirements are extensive, then it will herald the start of 6G. —David Stokes @LightTalk

The shortage of qualified security analysts is an issue that the IT security industry has been dealing with for years. There is little question that technology tools – from better analytics engines, to increased automation, to artificial intelligence – are seen as methods for dealing with the shortage. But will the fact that artificial intelligence, like its human analog must be carefully trained, limit its ability to help the industry out of its expertise deficit? —Curtis Franklin Jr. @Dark Reading

We finally have a vendor coming out to say that maybe WiFi and not 5G is the answer to IoT connectivity requirements. An SDxCentral report says Broadcom isn’t depending on 5G to open up connectivity, but instead points out that when 4G came along, operators had aspirations of replacing WiFi in buildings, only to come back to WiFi in the end. So, is this an opportunistic play on Broadcom’s part, or maybe even the start of new realism? The concept of the “Internet of things” has an inherent imperfection in the definition of both “Internet” and “things”. —Tom Nolle @CIMI

Finding logic errors is simply a people management issue, in an era where good cybersecurity talent is hard to find. At a basic level, developing secure code is the responsibility of the team that wrote the code. But at a higher level, it’s up to executives to institute processes and gather resources to give development teams the time and resources to uncover these issues. —Jerry Gamblin @Dark Reading

Android’s security model is enforced by the Linux kernel, which makes it a tempting target for attackers. We have put a lot of effort into hardening the kernel in previous Android releases and in Android 9, we continued this work by focusing on compiler-based security mitigations against code reuse attacks. —Sami Tolvanen @Google

Technologies that Didn’t: The OSI Protocol Stack

The Open Systems Interconnect (OSI) model is the most often taught model of data transmission—although it is not all that useful in terms of describing how modern networks work. What many engineers who have come into network engineering more recently do not know is there was an entire protocol suite that went with the OSI model. Each of the layers within the OSI model, in fact, had multiple protocols specified to fill the functions of that layer. For instance, X.25, while older than the OSI model, was adopted into the OSI suite to provide point-to-point connectivity over some specific kinds of physical circuits. Moving up the stack a little, there were several protocols that provided much the same service as the widely used Internet Protocol (IP). @ECI