CONTENT TYPE
BGP Policies (Part 4)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
In this post, I’ll cover the first of a few ways to give surrounding autonomous systems a hint about where traffic should enter a network. Note this is one of the most vexing problems in BGP policy, so there will be a lot of notes across the next several posts about why some solutions don’t work all that well, or when they will and won’t work.
There are at least three reasons an operator may want to control the point at which traffic enters their network, including:
- Controlling the inbound load on each link. It might be important to balance inbound and outbound load to maintain settlement-free peering, or to equally use all available inbound bandwidth, or to ensure the quality of experience is not impacted by overusing a single link.
- Accounting for geographically dispersed entry points. For instance, while the two entry points into AS65001 might appear to be topologically close, they might be geographically diverse, with one being in South America and the other being in North America.
- Ensuring flows requiring symmetric paths are properly handled. A common use case is the use of stateful packet filters or port address translators, both of which require inbound and outbound traffic to be routed through a single device.
All these reasons apply to all kinds of network operators, so this section will examine the various techniques used to control traffic entry points from the perspective of AS65001 in the following network—

Policies designed to control the point at which traffic enters an operator’s network will often conflict with policies designed to control the point at which traffic exits some other operator’s network. For instance, AS65001’s policy that all traffic destined to 100::/64 enter the network from AS65002 may conflict with AS6500’2 policy that all traffic destined to 100::/64 leave its network by being forwarded to AS65003.
This effect is not just seen between directly connected autonomous systems. For instance, AS65001’s policy that all traffic destined to 100::/64 enter the network through AS65002 may conflict with AS65004’s policy that all traffic to that same destination exit the network by being forwarded to AS65003.
The original intent of BGP policy was the policy of the sender overrides the policy of the receiver, as expressed in the design of the metrics (the multiple exit discriminator, or MED, has a lower priority than the preference). In real deployments, however, exit and entry policies are more fluid and entangled. These relationships will be considered in each of the sections below, each of which describes a different way to influence or control how traffic destined to a single reachable destination.
Let’s begin with the Multiple Exist Discriminator, or MED.
MED is a suggestion or request to neighboring autonomous systems to forward traffic for reachable destination along a particular path. For instance, AS65001 may desire for traffic being sent to 100::/64 be sent to B in the network diagram, rather than to A or through its link to AS65003.
However, the MED is not a transitive attribute of a BGP route. This means that if AS65001 sets the MED so that entry B is preferred, and sends this MED to AS65003, AS65003 will strip (or reset) the MED before advertising 100::/64 to either AS65004 or AS65002.
MED, in this case, would be useful to help AS65002 determine whether to send this traffic to A or B, but not whether to send the traffic to AS65001 or AS65003. AS65002 will, instead, rely on local policy, primarily preference, to determine which exit point to use. If AS65002 determines the best path to 100::/64 is through one of its direct connections to AS65001 (either A or B), and there is no other reason for AS65002 to choose one path over the other, the MED will be used to determined which path to use.
Because AS65003 only has one connection to AS65001, the MED will not impact its bestpath decision at all. Because AS65001’s MED has been reset or stripped in all the routes to 100::/64 AS65004 receives, AS65001’s MED will not play a role in any bestpath decision there, either (AS65002 or AS65003 may set the MED when sending routes to AS65004, which may influence the path AS65004 chooses, but again only when choosing between multiple connections to the same peering AS).
Because MED is only considered nominally useful, it is often stripped off routes when they are received from another AS.
Hedge 124: Geoff Huston and the State of BGP
Another year of massive growth in the number and speed of connections to the global Internet—what is the impact on the global routing table? Goeff Huston joins Donald Sharp and Russ White to discuss the current state of the BGP table, the changes in the last several years, where things might go, and what all of this means. This is part two of a two part episode.
BGP Policies (Part 3)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
There are many reasons an operator might want to select which neighboring AS through which to send traffic towards a given reachable destination (for instance, 100::/64). Each of these examples assumes the AS in question has learned multiple paths towards 100::/64, one from each peer, and must choose one of the two available paths to forward along.
In the following network—

From AS65001’s perspective
Assume AS65001 is some form of content provider, which means it offers some service such as bare metal compute, cloud services, search engines, social media, etc. Customers from AS65006 are connecting to its servers, located on the 100::/64 network, which generates a large amount of traffic returning to the customers.
From the perspective of AS hops, it appears the path from AS65001 to AS65006 is the same length—if this is true, AS65001 does not have any reason to choose one path or another (given there is no measurable performance difference, as in the cases described above from AS65006’s perspective). However, the AS hop count does not accurately describe the geographic distances involved:
- The geographic distance between 100::/64 and the exit towards AS65003 is very short
- The geographic distance between AS100::/64 and the exits towards AS65002 is very long
- The total geographic distance packets travel when following either path is about the same
In this case, AS65001 can either choose to hold on to packets destined to customers in AS65006 for a longer or shorter geographic distance.
While carrying the traffic over a longer geographic distance is more expensive, AS65001 would also like to optimize for the customer’s quality of experience (QoE), which means AS65001 should hold on to the traffic for as long as possible.
Because customers will use AS65001’s services in direct relation to their QoE (the relationship between service usage and QoE is measurable in the real world), AS65001 will opt to carry traffic destined to customers as long as possible—another instance of cold potato routing.
This is normally implemented by setting the preference for all routes equal and relying on the IGP metric part of the BGP bestpath decision process to control the exit point. IGP metrics can then be tuned based on the geographic distance from the origin of the traffic within the network and the exit point closest to the customer.
An alternative, more active, solution would be to have a local controller monitor the performance of individual paths to a given reachable destination, setting the preferences on individual reachable destinations and tuning IGP metrics in near-real-time to adjust for optimal customer experience.
Another alternative is to have a local controller monitor the performance individual paths and use MPLS, segment routing, or some other mechanism to actively engineer or steer the path of traffic through the network.
Some content providers may directly peer with transit and edge providers to reach customers more quickly, to reduce costs, and to increase their control over customer-facing traffic. For instance, if AS65001 is a content provider that transits traffic through [65002,65005] to reach customers in AS65006. To avoid transiting multiple autonomous systems, AS65001 can run a link directly to AS65005.
In some cases, content providers will build long-haul fiber optics (including undersea cable operations, see this site for examples) to avoid transiting multiple autonomous systems.
While the operator can end up paying a lot to build and operate long-haul optical links, this cost is offset is offset by decreasing paying transit providers for high levels of asymmetric traffic flows. Beyond this, content providers can control user experience more effectively the longer they control the user’s traffic. Finally, content providers can gain more information by connecting closer to users, feeding into Kai-Fu Lee’s virtuous cycle.
Note: content providers peering directly with edge providers and through IXPs is one component of the centralization of the Internet.
A failed alternative to the techniques described here was the use of automatic disaggregation at the content provider’s autonomous system borders. For instance, if a customer connected to a server in 100::/64 by sending traffic via the [65003,65001] link, an automated system will examine the routing table to see which route is currently being used to reach the customer’s reachable destination. If traffic forwarded to this customer’s address would normally pass through one of the [65001,65002] links, a local host route is created and distributed into AS65001 to draw this traffic to the exit connected to AS65003.
The theory behind this automatic disaggregation was that the customer will always take the shortest path from their perspective to reach the service. This assumption fails, in practice, however, so this scheme was ultimately abandoned.
Hedge 123: Geoff Huston and the State of BGP
Another year of massive growth in the number and speed of connections to the global Internet—what is the impact on the global routing table? Goeff Huston joins Donald Sharp and Russ White to discuss the current state of the BGP table, the changes in the last several years, where things might go, and what all of this means. This is part one of a two part episode.
Hedge 122: The Internet Architecture Board

What is the Internet Architecture Board (IAB) of the IETF? What role does the IAB play in the larger ecosystem of building and deploying standard protocols? In this episode of the Hedge, Tom and Ethan “flip roles” with Russ to ask these questions.
BGP Policies (Part 2)
At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.
There are many reasons an operator might want to select which neighboring AS through which to send traffic towards a given reachable destination (for instance, 100::/64). Each of these examples assumes the AS in question has learned multiple paths towards 100::/64, one from each peer, and must choose one of the two available paths to forward along.
In the following network—

From AS65004’s perspective…
Transit providers primarily choose the most optimal exit from their AS to reduce the amount of peering settlement they are paying by using and maintaining settlement-free peering where possible and reducing the amount of time and distance traffic is carried through their network (through hot potato routing, discussed in more detail below).
If, for instance, AS65004 has a paid peering relationship with AS65002, and a contract with AS65003 which is settlement-free so long as the traffic between AS65004 and AS65003 is roughly symmetric. AS65004 has two roughly equal-cost paths (both have the same AS Path length) towards 100::/64. In this situation, AS 65004 is going to direct traffic towards AS65003 to maintain symmetrical traffic flows and direct any remaining traffic towards AS65002.
This kind of balancing is normally done through a controller or network management system that monitors the balance of traffic with AS65003, adjusting the preference of sets of routes to attain the correct balance with AS65003 while reducing the costs of using the link to AS65002 to the minimum possible.
From AS65005’s perspective…
AS65005 can either send traffic originating in AS65001, received from AS65002, and destined to AS65006, to either AS65004—a peer—or AS65006—a customer. The internal path between the entry point for this traffic is longer if the traffic is carried to AS65006, and shorter if the traffic is carried to AS65004. These longer and shorter paths give rise to the concepts of hot and cold potato routing.
If AS65006 is paying AS65005 for transit, AS65005 would normally carry traffic across the longer path to its border with AS65006. This is cold potato routing. AS65005’s reason for choosing this option is to maximize revenue from the customer. First, as the link between AS65005 and AS65006 becomes busier, AS65006 is likely to upgrade the link, generating additional revenue for AS65005. Even if the traffic level is not increasing, steady traffic flow encourages the customer to maintain the link, which protects revenue. Second, AS65005 can control the quality-of-service AS65006 receives by keeping the traffic within its network for as long as possible, improving the customer’s perception of the service they are receiving.
Cold potato routing is normally implemented by setting the preference on routes learned from customers, so these routes are preferred over all routes learned from peers.
If AS6006 is not paying AS65005 for transit, it is to AS65005’s advantage to carry the traffic as short a distance as possible. In this case, although AS65005 is directly connected to AS65006, and the destination is in AS65006, AS65005 will choose to direct the traffic towards its border with AS65004 (because there is a valid route learned for this reachable destination from AS65004).
This is hot potato routing—like the kids’ game, you want to hold on to the traffic for as short an amount of time as possible. Hot potato routing is normally implemented by setting the preference on routes to the same and relying on the IGP metric component of the BGP bestpath decision process to find the closest exit point.
Next week I’ll continue this series on BGP interdomain policies… feel free to leave a comment if you think I’ve explained something incorrectly, etc.
Hedge 121: Computing in the Network with Marie-Jose Montpetit

Can computation be drawn into the network, rather than always being pushed to the edge of the network? Taking content distribution networks as a starting point, the COIN research group is looking at ways to make networks more content and computationally aware, bringing compute into the network itself. Join Alvaro Retana, Marie-Jose Montpetit, and Russ White, as we discuss the ongoing research around computing in the network.
