CONTENT TYPE
BGP Policy (Part 6)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
In this post I’m going to cover local preference via communities, longer prefix match, and conditional advertisement from the perspective of AS65001 in the following network—

Communities an Local Preference
As noted above, MED is the tool “designed into” BGP for selecting an entrance point into the local AS for specific reachable destinations. MED is not very effective, however, because a route’s preference will always win over MED, and because it is not carried between autonomous systems.
Some operators provide an alternate for MED in the form of communities that set a route’s preference within the AS. For instance, assume 100::/64 is geographically closer to the [65001,65003] link than either of the [65001,65002] links, so AS65001 would prefer traffic destined to 100::/64 enter through AS65003.
In this case, AS65001 can advertise 100::/64 with a community that makes AS65001 prefer the route through AS65003 over the direct route to AS65001 (see 2914:450 on NTT’s list of customer set communities as an example).
Note: Many of the communities described here have regional versions for more specific use cases. These operate on the same principles, just in a more restricted topological or geographical area.
Longer Prefix Match
While MED is often not effective, and using communities is both restricted in range and complex to configure and manage, advertising a longer-prefix match always works, is simple to configure, and easy to deploy.
For instance, if AS65001 would like traffic destined to 100::/64 to only enter from AS65003, it may advertise an aggregated route, say 2001:db8:3e8100::/63 to both AS65003 and AS65002, and then advertise 100::/64 only to AS65003. Because all routing systems will select the prefix with the longest match first, the /64 through AS65003 will be selected over the /63 through AS65003 and AS65003, so the traffic always enters AS65001 the way the operator desires.
The overlapping, or covering, aggregate is advertised to provide backup reachability. If the [AS65001,AS65003] link (or peering) fails for any reason, traffic destined to 100::/64 will follow the /63 route, entering from AS65002. This is not optimal from the perspective of AS65001, but it keeps connectivity in place while any problems can be traced down and repaired.
According to Geoff Huston, a large percentage of the routes in the current global table are advertised for traffic engineering—to manipulate the point at which traffic destined to specific reachable destinations enters an AS.
Note: The use of longer prefix routes to control inbound route flows represents a “tragedy of the commons” problem to the global Internet. Work has been put into various mechanisms designed to remove these more specific routes from the routing table when they are no longer needed, but little progress has been made in implementing them, not have any of these solutions achieved widespread adoption and deployment.
Conditional Advertisement
What if AS65001 has signed a contract with AS65003 to carry traffic only if both its links to AS65002 fails? In this case, AS65001 could advertise many more longer prefix specifics through AS65002 and one shorter covering route through AS6503.
This strategy, however, has two flaws. First, it requires AS6501 to manage the more specifics and covering routes as a set, making certain the pairs are correctly configured. Second, it could be that AS65001 does not want anyone to know about this backup arrangement unless and until it is used. This is sometimes the case when two competitors agree to back one another up, and neither wants anyone to know what their backup arrangements are.
To resolve these (and other) policy problems, operators can use conditional advertisement.
Conditional advertisement is conceptually simple; if a router does not have some route, x, in its routing table, it advertises some other route (given the route is in the local tables so it can be advertised). For instance, AS65001 might configure the router at C to advertise 100::/64 only when it does not have some other route.
The hardest part of configuring conditional advertisement is knowing when to trigger the advertisement of the alternate path. Using the lack of reachability to the destination itself (100::/64 in this case) as the trigger will fail in some circumstances, and will always require the global table to converge before the alternate path is advertised. Instead, conditional advertisement is often triggered by the lack of a route to between the BGP speakers being “watched” (in this case, the two [65001,65002] links) learned through from within the AS (within AS65001, rather than through the global routing table).
Triggering on the internal state of a link directly connected to a router managed by the local operator, and carried through internal convergence, removes external convergence from the time required to begin advertising the alternate path.
BGP Policies (Part 5)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
In this post I’m going to cover AS Path Prepending from the perspective of AS65001 in the following network—

Since the length of the AS Path plays a role in choosing which path to use when forwarding traffic towards a given reachable destination, many (if not most) operators prepend the AS Path when advertising routes to a peer. Thus an AS Path of [65001], when advertised towards AS65003, can become [65001,65001] by adding one prepend, [65001,65001,65001] by adding two prepends, etc. Most BGP implementations allow an operator to prepend as many times as they would like, so it is possible to see twenty, thirty, or even higher numbers of prepends.
Note: The usefulness of prepending is generally restricted to around two or three, as the average length of an AS Path in the global Internet is around 4 hops.
If AS65001 would like traffic destined to 100::/64 to enter from AS65003 rather than AS65002, it can prepend the AS Path at every peering point with AS65002 (A and B) with two hops (sending [65001,65001,65001] to AS65002). If preference, MED, and all other metrics are equal, AS65002 would then prefer the path with the shorter AS Path through AS65003, rather than the path directly into AS65001 (either through A or B).
That all metrics are equal is not likely, however. AS65002 will probably have preference set so routes learned directly from customers (such as AS65001) are selected over routes learned from peers (such as AS65003). The impact of prepending on route selection by directly connected peers is, therefore, uncertain.
Moving one step out in the network, consider the routes received by AS65004 to reach 100::/64. There will be one route along [65002,65001,65001,65001], and another with an AS Path of [65003,65001]. All other things being equal (same preference, etc.), AS65004 will choose to send traffic destined to 100::/64 through AS65003 rather than AS65002. How likely is it all the other BGP metrics will be equal at AS65004? So long as the peering between AS65004, AS65003, and AS65002 are all of the same type, the odds are high—so prepending can help move some (not all) traffic from one inbound link to another.
Because AS Path prepending has variable results over time, operators using this technique often “just try it” to see what the effect will be. There’s no real way to predict how effective prepending any number of times will be in moving traffic from one inbound link to another.
What if AS65001 does not want traffic destined to 100::/64 to traverse AS6505? For instance, suppose AS6506 s on across an ocean, mountain range, or other difficult-to-cross geographic feature. AS65005 crosses this geography via a satellite link, while AS65004 crosses the same geography via an optical cable. Sine optical cable runs can provide better delay and jitter than a satellite link, AS65001 may desire to choose which of these two autonomous systems is traversed to reach 100::/64.
This cannot be directly accomplished using AS Path prepend, as both AS65004 and AS65005 will both receive the same prepended path.
To express this kind of policy, some operators allow their customers to set communities that cause the operator to remotely prepend a given route advertisement. For instance, NTT allows their customers to set a community that will cause NTT to prepend specific routes when those routes are advertised to specific autonomous systems—in this case, AS65001 could add the community 65421:65005 to the advertisement for 100::/64, which would cause NTT to prepend AS65001 when advertising 100::64 to AS65005, and not prepend anything when advertising 100::/64 to AS65004.
This technique is subject to the same caveats as using AS Path prepend locally—it may work in some situations, or it may not—because the local operator does not have visibility into the policies of the operators they are trying to influence.
Hedge 127: FR Routing Update
The FR Routing project is a fully featured open-source routing stack, including BGP, OSPF, and IS-Is (among others), supported by a community including NVDIA, Orange, VMWare, and many others. On today’s episode of the Hedge, Tom Ammon and Russ White are joined by Donald Sharp, Alistair Woodman, and Quentin Young to update listeners on projects completed and underway in FR Routing.
Hedge 126: George Michaelson on ISDN
ISDN, while an old technology, is still around in many parts of the world. When will it go away? George Michaelson joins Tom Ammon and Russ White to discuss the end of ISDN. The conversation then veers into old networking technologies, and the importance of ISDN in setting the terms and ideas we use today—ISDN is one of the key technologies around which network engineers built their mental maps of how to build and maintain networks.
Hedge 125: Brooks Westbrook and DC Fabric Design
DC fabric design is more of an art than a science—a lot of factors come into play, such as future growth, lifecycle management, security, and costs. How can network engineers balance these various factors—how do they even know what questions to ask? Brooks Westrbook joins Tom Ammon and Russ White to discuss three- and five-stage DC fabric design, OPEX, CAPEX, and other topics on this episode of the Hedge.
BGP Policies (Part 4)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
In this post, I’ll cover the first of a few ways to give surrounding autonomous systems a hint about where traffic should enter a network. Note this is one of the most vexing problems in BGP policy, so there will be a lot of notes across the next several posts about why some solutions don’t work all that well, or when they will and won’t work.
There are at least three reasons an operator may want to control the point at which traffic enters their network, including:
- Controlling the inbound load on each link. It might be important to balance inbound and outbound load to maintain settlement-free peering, or to equally use all available inbound bandwidth, or to ensure the quality of experience is not impacted by overusing a single link.
- Accounting for geographically dispersed entry points. For instance, while the two entry points into AS65001 might appear to be topologically close, they might be geographically diverse, with one being in South America and the other being in North America.
- Ensuring flows requiring symmetric paths are properly handled. A common use case is the use of stateful packet filters or port address translators, both of which require inbound and outbound traffic to be routed through a single device.
All these reasons apply to all kinds of network operators, so this section will examine the various techniques used to control traffic entry points from the perspective of AS65001 in the following network—

Policies designed to control the point at which traffic enters an operator’s network will often conflict with policies designed to control the point at which traffic exits some other operator’s network. For instance, AS65001’s policy that all traffic destined to 100::/64 enter the network from AS65002 may conflict with AS6500’2 policy that all traffic destined to 100::/64 leave its network by being forwarded to AS65003.
This effect is not just seen between directly connected autonomous systems. For instance, AS65001’s policy that all traffic destined to 100::/64 enter the network through AS65002 may conflict with AS65004’s policy that all traffic to that same destination exit the network by being forwarded to AS65003.
The original intent of BGP policy was the policy of the sender overrides the policy of the receiver, as expressed in the design of the metrics (the multiple exit discriminator, or MED, has a lower priority than the preference). In real deployments, however, exit and entry policies are more fluid and entangled. These relationships will be considered in each of the sections below, each of which describes a different way to influence or control how traffic destined to a single reachable destination.
Let’s begin with the Multiple Exist Discriminator, or MED.
MED is a suggestion or request to neighboring autonomous systems to forward traffic for reachable destination along a particular path. For instance, AS65001 may desire for traffic being sent to 100::/64 be sent to B in the network diagram, rather than to A or through its link to AS65003.
However, the MED is not a transitive attribute of a BGP route. This means that if AS65001 sets the MED so that entry B is preferred, and sends this MED to AS65003, AS65003 will strip (or reset) the MED before advertising 100::/64 to either AS65004 or AS65002.
MED, in this case, would be useful to help AS65002 determine whether to send this traffic to A or B, but not whether to send the traffic to AS65001 or AS65003. AS65002 will, instead, rely on local policy, primarily preference, to determine which exit point to use. If AS65002 determines the best path to 100::/64 is through one of its direct connections to AS65001 (either A or B), and there is no other reason for AS65002 to choose one path over the other, the MED will be used to determined which path to use.
Because AS65003 only has one connection to AS65001, the MED will not impact its bestpath decision at all. Because AS65001’s MED has been reset or stripped in all the routes to 100::/64 AS65004 receives, AS65001’s MED will not play a role in any bestpath decision there, either (AS65002 or AS65003 may set the MED when sending routes to AS65004, which may influence the path AS65004 chooses, but again only when choosing between multiple connections to the same peering AS).
Because MED is only considered nominally useful, it is often stripped off routes when they are received from another AS.
Hedge 124: Geoff Huston and the State of BGP
Another year of massive growth in the number and speed of connections to the global Internet—what is the impact on the global routing table? Goeff Huston joins Donald Sharp and Russ White to discuss the current state of the BGP table, the changes in the last several years, where things might go, and what all of this means. This is part two of a two part episode.
