BGP
BGP Policy (Part 7)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
In this post—the last post in this series—I’m going to cover do not transit options from the perspective of AS65001 in the following network—
There are cases where an operator does not traffic to be forwarded to them through some specific AS, whether directly connected or multiple hops away. For instance, AS65001 and AS65005 might be operated by companies in politically unfriendly nations. In this case, AS65001 may be legally required to reject traffic that has passed through the nation in which AS65005 is located. There are at least three mechanisms in BGP that are used, in different situations, to enforce this kind of policy.
Do Not Advertise Communities (Provider Specific)
Many providers supply communities a customer can use to block the advertisement of their routes to a particular AS. For instance, if AS65002 were NTT, according to the NTT customer communities site, if AS65001 advertises 100::/64 with the community 65500:65005, NTT would advertise 100::/64 to all its other peers, but not to AS65005.
Note: NTT is not AS65002; this is only used as an illustration of using a community to block advertisement to a peer’s peer.
The operator at AS65001 might reasonably expect that blocking AS65002 from advertising 100::/64 to AS65005 will block all traffic traveling through AS65005—but the vagaries of the global Internet routing table may well cause traffic to be forwarded through AS65005 anyway in some instances.
If AS65006 has a default route pointing to AS65005, traffic destined to 100::/64 may still be forwarded to AS65005. If AS65005 happens to have a covering aggregate route, or learned of the route via AS65004, it might still carry traffic destined for 100::/64.
It is almost impossible to block all traffic to a given reachable destination from being forwarded through a given autonomous system.
AS Path Injection
An alternate, widely used mechanism is to intentionally inject an AS Path loop when advertising a route to prevent some AS from accepting the route. For instance, AS65001 might advertise 100::/64 with the AS Path [65005,65001] to AS65002. AS65005 would then reject this advertisement because the local AS is already in the AS Path.
While this might appear to “break the rules” of BGP, the reality is the AS Path was never really intended to be a “true record” of the path of an “update” (in fact, there is no such thing as an “update” that travels from one router to the next—the “update” is constructed at each hop based on local tables). This technique is problematic in providing “path security” in BGP, but it does not intrinsically break any BGP rules.
Note: For more information about this technique, refer to this episode of the Hedge.
Again, note it is almost impossible to block all traffic to a given reachable destination from being forwarded through a given autonomous system.
Do Not Advertise Communities (Well Known)
Three further well-known communities, although they are not widely used, are worth considering.
When a route is marked with NO-PEER, the AS should only advertise the route to its customers and never its peers. For instance, if AS65001 advertises 100::/64 to AS65003 with NO-PEER, AS65003 will advertise the destination to AS6507 and AS65008 (assuming these are customers), and not to AS65002 or AS65004 (because both of these autonomous systems transit traffic to and from AS65003).
When a route is marked with NO-EXPORT, the AS should not advertise the reachable destination to any other AS. For instance, if AS65001 advertises 100::/64 to AS65003 with NO-EXPORT, AS65003 will not advertise this reachable destination to any other AS, including AS65007, AS65008, AS65002, or AS65004.
When a route is marked with NO-ADVERTISE, the receiving BGP speaker should not advertise the route to any other BGP speaker, including internal and external connections.
BGP Policy (Part 6)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
In this post I’m going to cover local preference via communities, longer prefix match, and conditional advertisement from the perspective of AS65001 in the following network—
Communities an Local Preference
As noted above, MED is the tool “designed into” BGP for selecting an entrance point into the local AS for specific reachable destinations. MED is not very effective, however, because a route’s preference will always win over MED, and because it is not carried between autonomous systems.
Some operators provide an alternate for MED in the form of communities that set a route’s preference within the AS. For instance, assume 100::/64 is geographically closer to the [65001,65003] link than either of the [65001,65002] links, so AS65001 would prefer traffic destined to 100::/64 enter through AS65003.
In this case, AS65001 can advertise 100::/64 with a community that makes AS65001 prefer the route through AS65003 over the direct route to AS65001 (see 2914:450 on NTT’s list of customer set communities as an example).
Note: Many of the communities described here have regional versions for more specific use cases. These operate on the same principles, just in a more restricted topological or geographical area.
Longer Prefix Match
While MED is often not effective, and using communities is both restricted in range and complex to configure and manage, advertising a longer-prefix match always works, is simple to configure, and easy to deploy.
For instance, if AS65001 would like traffic destined to 100::/64 to only enter from AS65003, it may advertise an aggregated route, say 2001:db8:3e8100::/63 to both AS65003 and AS65002, and then advertise 100::/64 only to AS65003. Because all routing systems will select the prefix with the longest match first, the /64 through AS65003 will be selected over the /63 through AS65003 and AS65003, so the traffic always enters AS65001 the way the operator desires.
The overlapping, or covering, aggregate is advertised to provide backup reachability. If the [AS65001,AS65003] link (or peering) fails for any reason, traffic destined to 100::/64 will follow the /63 route, entering from AS65002. This is not optimal from the perspective of AS65001, but it keeps connectivity in place while any problems can be traced down and repaired.
According to Geoff Huston, a large percentage of the routes in the current global table are advertised for traffic engineering—to manipulate the point at which traffic destined to specific reachable destinations enters an AS.
Note: The use of longer prefix routes to control inbound route flows represents a “tragedy of the commons” problem to the global Internet. Work has been put into various mechanisms designed to remove these more specific routes from the routing table when they are no longer needed, but little progress has been made in implementing them, not have any of these solutions achieved widespread adoption and deployment.
Conditional Advertisement
What if AS65001 has signed a contract with AS65003 to carry traffic only if both its links to AS65002 fails? In this case, AS65001 could advertise many more longer prefix specifics through AS65002 and one shorter covering route through AS6503.
This strategy, however, has two flaws. First, it requires AS6501 to manage the more specifics and covering routes as a set, making certain the pairs are correctly configured. Second, it could be that AS65001 does not want anyone to know about this backup arrangement unless and until it is used. This is sometimes the case when two competitors agree to back one another up, and neither wants anyone to know what their backup arrangements are.
To resolve these (and other) policy problems, operators can use conditional advertisement.
Conditional advertisement is conceptually simple; if a router does not have some route, x, in its routing table, it advertises some other route (given the route is in the local tables so it can be advertised). For instance, AS65001 might configure the router at C to advertise 100::/64 only when it does not have some other route.
The hardest part of configuring conditional advertisement is knowing when to trigger the advertisement of the alternate path. Using the lack of reachability to the destination itself (100::/64 in this case) as the trigger will fail in some circumstances, and will always require the global table to converge before the alternate path is advertised. Instead, conditional advertisement is often triggered by the lack of a route to between the BGP speakers being “watched” (in this case, the two [65001,65002] links) learned through from within the AS (within AS65001, rather than through the global routing table).
Triggering on the internal state of a link directly connected to a router managed by the local operator, and carried through internal convergence, removes external convergence from the time required to begin advertising the alternate path.
BGP Policies (Part 5)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
In this post I’m going to cover AS Path Prepending from the perspective of AS65001 in the following network—
Since the length of the AS Path plays a role in choosing which path to use when forwarding traffic towards a given reachable destination, many (if not most) operators prepend the AS Path when advertising routes to a peer. Thus an AS Path of [65001], when advertised towards AS65003, can become [65001,65001] by adding one prepend, [65001,65001,65001] by adding two prepends, etc. Most BGP implementations allow an operator to prepend as many times as they would like, so it is possible to see twenty, thirty, or even higher numbers of prepends.
Note: The usefulness of prepending is generally restricted to around two or three, as the average length of an AS Path in the global Internet is around 4 hops.
If AS65001 would like traffic destined to 100::/64 to enter from AS65003 rather than AS65002, it can prepend the AS Path at every peering point with AS65002 (A and B) with two hops (sending [65001,65001,65001] to AS65002). If preference, MED, and all other metrics are equal, AS65002 would then prefer the path with the shorter AS Path through AS65003, rather than the path directly into AS65001 (either through A or B).
That all metrics are equal is not likely, however. AS65002 will probably have preference set so routes learned directly from customers (such as AS65001) are selected over routes learned from peers (such as AS65003). The impact of prepending on route selection by directly connected peers is, therefore, uncertain.
Moving one step out in the network, consider the routes received by AS65004 to reach 100::/64. There will be one route along [65002,65001,65001,65001], and another with an AS Path of [65003,65001]. All other things being equal (same preference, etc.), AS65004 will choose to send traffic destined to 100::/64 through AS65003 rather than AS65002. How likely is it all the other BGP metrics will be equal at AS65004? So long as the peering between AS65004, AS65003, and AS65002 are all of the same type, the odds are high—so prepending can help move some (not all) traffic from one inbound link to another.
Because AS Path prepending has variable results over time, operators using this technique often “just try it” to see what the effect will be. There’s no real way to predict how effective prepending any number of times will be in moving traffic from one inbound link to another.
What if AS65001 does not want traffic destined to 100::/64 to traverse AS6505? For instance, suppose AS6506 s on across an ocean, mountain range, or other difficult-to-cross geographic feature. AS65005 crosses this geography via a satellite link, while AS65004 crosses the same geography via an optical cable. Sine optical cable runs can provide better delay and jitter than a satellite link, AS65001 may desire to choose which of these two autonomous systems is traversed to reach 100::/64.
This cannot be directly accomplished using AS Path prepend, as both AS65004 and AS65005 will both receive the same prepended path.
To express this kind of policy, some operators allow their customers to set communities that cause the operator to remotely prepend a given route advertisement. For instance, NTT allows their customers to set a community that will cause NTT to prepend specific routes when those routes are advertised to specific autonomous systems—in this case, AS65001 could add the community 65421:65005 to the advertisement for 100::/64, which would cause NTT to prepend AS65001 when advertising 100::64 to AS65005, and not prepend anything when advertising 100::/64 to AS65004.
This technique is subject to the same caveats as using AS Path prepend locally—it may work in some situations, or it may not—because the local operator does not have visibility into the policies of the operators they are trying to influence.
BGP Policies (Part 4)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
In this post, I’ll cover the first of a few ways to give surrounding autonomous systems a hint about where traffic should enter a network. Note this is one of the most vexing problems in BGP policy, so there will be a lot of notes across the next several posts about why some solutions don’t work all that well, or when they will and won’t work.
There are at least three reasons an operator may want to control the point at which traffic enters their network, including:
- Controlling the inbound load on each link. It might be important to balance inbound and outbound load to maintain settlement-free peering, or to equally use all available inbound bandwidth, or to ensure the quality of experience is not impacted by overusing a single link.
- Accounting for geographically dispersed entry points. For instance, while the two entry points into AS65001 might appear to be topologically close, they might be geographically diverse, with one being in South America and the other being in North America.
- Ensuring flows requiring symmetric paths are properly handled. A common use case is the use of stateful packet filters or port address translators, both of which require inbound and outbound traffic to be routed through a single device.
All these reasons apply to all kinds of network operators, so this section will examine the various techniques used to control traffic entry points from the perspective of AS65001 in the following network—
Policies designed to control the point at which traffic enters an operator’s network will often conflict with policies designed to control the point at which traffic exits some other operator’s network. For instance, AS65001’s policy that all traffic destined to 100::/64 enter the network from AS65002 may conflict with AS6500’2 policy that all traffic destined to 100::/64 leave its network by being forwarded to AS65003.
This effect is not just seen between directly connected autonomous systems. For instance, AS65001’s policy that all traffic destined to 100::/64 enter the network through AS65002 may conflict with AS65004’s policy that all traffic to that same destination exit the network by being forwarded to AS65003.
The original intent of BGP policy was the policy of the sender overrides the policy of the receiver, as expressed in the design of the metrics (the multiple exit discriminator, or MED, has a lower priority than the preference). In real deployments, however, exit and entry policies are more fluid and entangled. These relationships will be considered in each of the sections below, each of which describes a different way to influence or control how traffic destined to a single reachable destination.
Let’s begin with the Multiple Exist Discriminator, or MED.
MED is a suggestion or request to neighboring autonomous systems to forward traffic for reachable destination along a particular path. For instance, AS65001 may desire for traffic being sent to 100::/64 be sent to B in the network diagram, rather than to A or through its link to AS65003.
However, the MED is not a transitive attribute of a BGP route. This means that if AS65001 sets the MED so that entry B is preferred, and sends this MED to AS65003, AS65003 will strip (or reset) the MED before advertising 100::/64 to either AS65004 or AS65002.
MED, in this case, would be useful to help AS65002 determine whether to send this traffic to A or B, but not whether to send the traffic to AS65001 or AS65003. AS65002 will, instead, rely on local policy, primarily preference, to determine which exit point to use. If AS65002 determines the best path to 100::/64 is through one of its direct connections to AS65001 (either A or B), and there is no other reason for AS65002 to choose one path over the other, the MED will be used to determined which path to use.
Because AS65003 only has one connection to AS65001, the MED will not impact its bestpath decision at all. Because AS65001’s MED has been reset or stripped in all the routes to 100::/64 AS65004 receives, AS65001’s MED will not play a role in any bestpath decision there, either (AS65002 or AS65003 may set the MED when sending routes to AS65004, which may influence the path AS65004 chooses, but again only when choosing between multiple connections to the same peering AS).
Because MED is only considered nominally useful, it is often stripped off routes when they are received from another AS.
BGP Policies (Part 2)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
There are many reasons an operator might want to select which neighboring AS through which to send traffic towards a given reachable destination (for instance, 100::/64). Each of these examples assumes the AS in question has learned multiple paths towards 100::/64, one from each peer, and must choose one of the two available paths to forward along.
In the following network—
From AS65004’s perspective…
Transit providers primarily choose the most optimal exit from their AS to reduce the amount of peering settlement they are paying by using and maintaining settlement-free peering where possible and reducing the amount of time and distance traffic is carried through their network (through hot potato routing, discussed in more detail below).
If, for instance, AS65004 has a paid peering relationship with AS65002, and a contract with AS65003 which is settlement-free so long as the traffic between AS65004 and AS65003 is roughly symmetric. AS65004 has two roughly equal-cost paths (both have the same AS Path length) towards 100::/64. In this situation, AS 65004 is going to direct traffic towards AS65003 to maintain symmetrical traffic flows and direct any remaining traffic towards AS65002.
This kind of balancing is normally done through a controller or network management system that monitors the balance of traffic with AS65003, adjusting the preference of sets of routes to attain the correct balance with AS65003 while reducing the costs of using the link to AS65002 to the minimum possible.
From AS65005’s perspective…
AS65005 can either send traffic originating in AS65001, received from AS65002, and destined to AS65006, to either AS65004—a peer—or AS65006—a customer. The internal path between the entry point for this traffic is longer if the traffic is carried to AS65006, and shorter if the traffic is carried to AS65004. These longer and shorter paths give rise to the concepts of hot and cold potato routing.
If AS65006 is paying AS65005 for transit, AS65005 would normally carry traffic across the longer path to its border with AS65006. This is cold potato routing. AS65005’s reason for choosing this option is to maximize revenue from the customer. First, as the link between AS65005 and AS65006 becomes busier, AS65006 is likely to upgrade the link, generating additional revenue for AS65005. Even if the traffic level is not increasing, steady traffic flow encourages the customer to maintain the link, which protects revenue. Second, AS65005 can control the quality-of-service AS65006 receives by keeping the traffic within its network for as long as possible, improving the customer’s perception of the service they are receiving.
Cold potato routing is normally implemented by setting the preference on routes learned from customers, so these routes are preferred over all routes learned from peers.
If AS6006 is not paying AS65005 for transit, it is to AS65005’s advantage to carry the traffic as short a distance as possible. In this case, although AS65005 is directly connected to AS65006, and the destination is in AS65006, AS65005 will choose to direct the traffic towards its border with AS65004 (because there is a valid route learned for this reachable destination from AS65004).
This is hot potato routing—like the kids’ game, you want to hold on to the traffic for as short an amount of time as possible. Hot potato routing is normally implemented by setting the preference on routes to the same and relying on the IGP metric component of the BGP bestpath decision process to find the closest exit point.
Next week I’ll continue this series on BGP interdomain policies… feel free to leave a comment if you think I’ve explained something incorrectly, etc.
BGP Policies (part 1)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
In the following network—
There are many reasons an operator might want to select which neighboring AS through which to send traffic towards a given reachable destination (for instance, 100::/64). Each of these examples assumes the AS in question has learned multiple paths towards 100::/64, one from each peer, and must choose one of the two available paths to forward along.
Examining this from AS65006’s Perspective …
Assuming AS65006 is an edge operator (commonly called enterprise, but generally just originating and terminating traffic, and never transiting traffic), there are several reasons the operator may prefer one exit point (through an upstream provider), including:
- An automated system may determine AS65004 has some sort of brownout; in this case, the operator at 65006 has configured the system to prefer the exit through AS65005
- The traffic destined to 100::/64 may require a class of service (such as video transport) AS65004 cannot support (for instance, because the link between AS65006 and 65005 has low bandwidth, high delay, or high jitter)
The most common way this kind of policy would be implemented is by setting the BGP LOCAL_PREFERENCE (called preference throughout the rest of this document) on routes learned from AS65005 higher than the preference on routes learned from AS65004.
Another common case is AS65006 would prefer to send traffic to AS65005 only when the destination is in an AS directly connected to AS65005 itself, while sending all other traffic through AS65004. This is common when a one provider has good local and poor global coverage, while the other provider has good global but poor local coverage.
For instance, if AS65006 is in a somewhat isolated part of the world, such as some parts of the South Pacific or Central America, there may be a local provider, such as AS65004, that has solid connectivity to most of the other edge operators in the local geographic region but charges a high cost for transiting to the rest of the global Internet. A second provider, such as AS65005, charges less to reach destinations beyond the local geographic region but is relatively expensive to use when sending traffic to other edge operators within the local region.
Preference, by itself, would be difficult to use in this case, because the operator would need to distinguish between geographically local and geographically distant routes. To implement this kind of policy, the operator would accept partial routes from the geographically local provider (AS65004 in this case) and set a high preference on these routes. Partial routes are typically those the local provider learns only from other directly connected autonomous systems, and hence would only include operators in the local geographic region. The operator would then accept full routes, or the entire Internet global routing table, from the second provider (AS65005 in this case) and set a lower preference.
An alternative way to implement geographic preference is using communities. Many transit providers mark individual reachable destinations with information about where the route originated. NTT, for instance, describes their geographic marking here. An operator can create filters using regular expressions to change the preference of a route based on its geographic origin.
This is not a common way to solve the problem because the filtering rules involved can become complex—but it might be deployed if local providers do not offer partial routes for some reason.
Another alternate to implement geographic preference is to use a regular expression filter to set the preference for each reachable destination based on the length of the AS Path. Theoretically, routes originating within the local region should have an AS Path of one or two hops, while those originating outside a region should have longer AS Paths.
This generally does not work for two reasons. First, the average length of an AS Path (after prepending is factored out) is about 4 hops in the entire global Internet—and it is easy to reach four hops even within a local region in some situations. Second, many operators prepend the AS Path to manage inbound entry point preference; these prepended hops must be factored out to use this method.
Next week I’ll continue this series on BGP interdomain policies… feel free to leave a comment if you think I’ve explained something incorrectly, etc.
Hedge 103: BGP Security with Geoff Huston
Our community has been talking about BGP security for over 20 years. While MANRS and the RPKI have made some headway in securing BGP, the process of deciding on a method to provide at least the information providers need to make more rational decisions about the validity of individual routes is still ongoing. Geoff Huston joins Alvaro, Russ, and Tom to discuss how we got here and whether we will learn from our mistakes.
download