BGP is one of the foundational protocols that make the Internet “go;” as such, it is a complex intertwined system of different kinds of functionality bundled into a single set of TLVs, attributes, and other functionality. Because it is so widely used, however, BGP tends to gain new capabilities on a regular basis, making the Interdomain Routing (IDR) working group in the Internet Engineering Task Force (IETF) one of the consistently busiest, and hence one of the hardest to keep up with. In this post, I’m going to spend a little time talking about one area in which a lot of work has been taking place, the building and maintenance of peering relationships between BGP speakers.
The first draft to consider is Mitigating the Negative Impact of Maintenance through BGP Session Culling, which is a draft in an operations working group, rather than the IDR working group, and does not make any changes to the operation of BGP. Rather, this draft considers how BGP sessions should be torn down so traffic is properly drained, and the peering shutdown has the minimal effect possible. The normal way of shutting down a link for maintenance would be to for administrators to shut down BGP on the link, wait for traffic to subside, and then take the link down for maintenance. However, many operators simply do not have the time or capability to undertake scheduled shutdowns of BGP speakers. To resolve this problem, graceful shutdown capability was added to BGP in RFC8326. Not all implementations support graceful shutdown, however, so this draft suggests an alternate way to shut down BGP sessions, allowing traffic to drain, before a link is shut down: use link local filtering to block BGP traffic on the link, which will cause any existing BGP sessions to fail. Once these sessions have failed, traffic will drain off the link, allowing it to be safely shut down for maintenance. The draft discusses various timing issues in using this technique to reduce the impact of link removal due to maintenance (or other reasons).
Graceful shutdown, itself, is also in line to receive some new capabilities through Extended BGP Administrative Shutdown Communication. This draft is rather short, as it simply allows an operator to send a short freeform message (presumably in text format) along with the standard BGP graceful shutdown notification. This message can be printed on the console, or saved to syslog, to provide an operator with more information about why a particular BGP has been shut down, whether it coming back up again, how long the shutdown is expected to last, etc.
Graceful Restart (GR) is a long available feature in many BGP implementations that aims to prevent the disruption of traffic flow; the original purpose was to handle a route processor restart in a router where the line cards could continue forwarding traffic based on local forwarding tables (the FIB), including cases where one route processor fails, causing the router switches to a backup route processor in the same chassis. Over time, GR began to be applied to NOTIFICATION messages in BGP. For instance, if a BGP speaker receives a malformed message, it is required (by the BGP RFCs) to send a NOTIFICATION, which will cause the BGP session to be torn down and restarted. GR has been adapted to these situations, so traffic flow is either not impacted, or minimally impacted through the NOTIFICATION/session restart process. This same processing takes place for a hold timer timeout in BGP.
The problem is that only one of the two speakers in a restarting pair will normally retain its local forwarding information. The sending speaker will normally flush its local routing tables, and with them its local forwarding tables, on sending a BGP NOTIFICATION. Notification Message support for BGP Graceful Restart changes this processing, allowing both speakers to enter the “receiving speaker” mode, so both speakers would retain their local forwarding information. A signal is provided to allow the sending speaker to indicate the sessions should be hard reset, rather than gracefully reset, if needed.
Finally, BGP allows speakers to send a route with a next hop other than themselves; this is called a third party next hop, and is illustrated in the figure below.
In this network, router C’s best path to 2001:db8:3e8:100::/64 might be through A, but the operator may prefer this traffic pass through B. While it is possible to change the preferences so C chooses the path through B, there are some situations where it is better for A to advertise C as the next hop towards the destination (for instance, a route server would not normally advertise itself as the nexthop towards a destination). The problem with this situation is that B might not have the same capabilities as a BGP speaker as A. If B, for instance, cannot forward for IPv6, the situation shown in the illustration would clearly not work.
To resolve this, BGP Next-Hop dependent capabilities allows a speaker to advertise the capabilities of these alternate next hops to peered BGP speakers.