Hedge 218: Longer than /24’s

Most providers will only accept a /24 or shorter IPv4 route because routers have always had limited amounts of forwarding table space. In fact, many hardware and software IPv4 forwarding implementations are optimized for a /24 or shorter prefix length. Justin Wilson joins Tom Ammon and Russ White to discuss why the DFZ might need to be expanded to longer prefix lengths, and the tradeoffs involved in doing so.



Upcoming Training: BGP Policy

On July 21st I’ll be teaching BGP Policy over at Safari Books Online. From the description:

This course begins by simplifying the entire BGP policy space into three basic kinds of policies that operators implement using BGP—selecting the outbound path, selecting the inbound path, and “do not transit.” A use case is given for each of these three kinds, or classes, of policies from the perspective of a transit provider, and another from the perspective of a nontransit operator connected to the edge of the ‘net. With this background in place, the course will then explore each of the many ways these classes of policy may be implemented using local preference, AS Path prepending, various communities, AS Path poisoning, and other techniques. Positive and negative aspects of each implementation path will be considered.

Please register here.

My courses are going through a bit of updating, but I think August and September will be How the Internet Really Works, followed by an updated course on troubleshooting. I’m incorporating more tools into the course, including (of course!) ChatGPT. Watch this space for upcoming announcements.

Hedge 151: Cecilia Testart and the Value of the RPKI

If you advertise routes through a provider to the global Internet, you might be wondering if you should go through the trouble of registering in the RPKI and advertising ROAs. What is the tradeoff for the work involved in what seems like a complex process? Cecelia Testart joins Jeremy White and Russ White to discuss recent work in measuring the value of the RPKI.


It’s also worth reading Cecelia’s article on this topic.

Hedge 144: IPv6 Lessons Learned

We don’t often do a post-mortem on the development and deployment of new protocols … but here at the Hedge we’re going to brave these deep waters to discuss some of the lessons we can learn from the development and deployment of IPv6, especially as they apply to design and deployment cycles in the “average network” (if there is such at thing). Join us as James Harr, Tom Ammon, and Russ White consider the lessons we can learn from IPv6’s checkered history.


Route Servers and Loops

From the question pile: Route servers (as opposed to route reflectors) don’t change anything about a BGP route when re-advertising it to a peer, whether iBGP or eBGP. Why don’t route servers cause routing loops (or other problems) in a BGP network?

Route servers are often used by Internet Exchange Points (IXPs) to distribute routes between connected BGP speakers. BGP route servers

  • Don’t change anything about a received BGP route when advertising the route to its peers (other BGP speakers)
  • Don’t install routes received through BGP into the local routing table

Shouldn’t using route servers in a network—pontentially, at least—cause routing loops or other BGP routing issues? Maybe a practical example will help.

Assume b, e, and s are all route servers in their respective networks. Starting at the far left, a receives some route, 101::/64, and sends it on to b,, which then sends the unmodified route to c. When c receives traffic destined to 101::/64, what will happen? Regardless of whether these routers are running iBGP or eBGP, b will not change the next hop, so when c receives the route, a is still the next hop. If there’s no underlying routing protocol, c won’t know how to reach A, so it will ignore the route and drop the traffic. Even if there is an underlying routing protocol, c’s route to 101::/64’s route passes through b, and b isn’t installing any routing information learned from BGP into its local routing table (because it’s a route server). b is going to drop traffic destined to 101::/64.

We can solve this simple problem by adding a new link between the two clients of the route server, as shown in the center diagram. Here, d sends 101::/64 to e, which then sends the unchanged route to g. Since g has a direct connection to d, we can assume g will send traffic destined to 101::/64 directly to d, where it will be forwarded to the destination. Why wouldn’t d and g peer directly instead of counting on e to carry routes between them? In most cases this kind of indirect peering is done to increase network scale. If there are thousand routes like d and g, it will be simpler for them all to peer to e than to build a full mesh of connections.

Why not use a route reflector rather than a route server in this situation? Route reflectors can only be used to carry routes between iBGP peers. If d, e, and g are all in different autonomous systems, route reflectors cannot be used to solve this problem.

But this brings us back to the original question—route reflectors use the cluster list to prevent loops within an AS (the cluster list is similar in form and function to the AS path carried between autonomous systems, but it uses router ID’s rather than AS numbers to describe the path)?

If you have multiple route servers connected to one another you can, in fact, form routing loops.

In this network, a is sending 101::/64 to b, which is then sending the route, unmodified, to e. Because of some local policy, e is choosing the path through a, which means e forwards traffic destined to 101::/64 to c. At the same time, e is advertising 101::/64 to b, which is then sending the route (unmodified) to a, and a is choosing the path through c. In this case, a permanent (persistent) routing loop is formed through the control plane, primarily because no single BGP speaker has a complete view of the topology. The two route servers, by hiding the real path to 101::/64, makes is possible to form a routing loop.

The deploy route servers without forming these kinds of loops—

  • BGP speakers learning routes from route servers should be directly connected—there should not be destinations reachable via some “hidden” intermediate hop
  • Route servers should send all the routes they learn from clients; they should not use bestpath to choose which routes to send to clients

These restrictions prevent routing loops from forming when deploying route servers—but they also restrict the use of route servers to situations like carrying routes between BGP speakers connected to a single fabric.

Cisco filed a patent some time back describing a method to prevent routing loops when using BGP route servers; it makes interesting reading for folks who want to dive a little deeper.

BGP Peering (2)

I recorded the beginnings of a BGP training series over at Packet Pushers a short while back; they’ve released these onto youtube (so you can find the entire series there). I’m highlighting one of these every couple of weeks ’til I’ve gone through the entire set of recordings. In this recording, I’m talking through some more interesting aspects of BGP peering, including challenges with IPv6 link local nexthops, promiscuous peering, and capabilities.


Hedge 142: George Michaelson and the Pace of IPv6 Deployment

IPv6 is still being deployed, years after the first world IPv6 day, even more years after its first acceptance as an Internet standard by the IETF. What is taking so long? George Michaelson (APNIC) joins Tom Ammon and Russ White on this episode of the Hedge to discuss the current pace of IPv6 deployment, where there are wins, and why things might be moving more slowly in other areas.