CONTENT TYPE
The Hedge 70: FR Routing Update
FR Routing is a widely used and supported open source routing stack. In this episode of the Hedge, Alistair Woodman, Quentin Young, Donald Sharp, Tom Ammon, and Russ White discuss recent updates, additions to the CI/CD system, the release process, and operating system support. If you’re looking for a good open source, containerized routing stack for everything from route servers to DC fabrics and labbing to production, you should check out FR Routing.
It is Easier to Move a Problem than Solve it (RFC1925, Rule 6)

Early on in my career as a network engineer, I learned the value of sharing. When I could not figure out why a particular application was not working correctly, it was always useful to blame the application. Conversely, the application owner was often quite willing to share their problems with me, as well, by blaming the network.
A more cynical way of putting this kind of sharing is the way RFC 1925, rule 6 puts is: “It is easier to move a problem around than it is to solve it.”
Of course, the general principle applies far beyond sharing problems with your co-workers. There are many applications in network and protocol design, as well. Perhaps the most widespread case deployed in networks today is the movement to “let the controller solve the problem.” Distributed routing protocols are hard? That’s okay, just implement routing entirely on a controller. Understanding how to deploy individual technologies to solve real-world problems is hard? Simple—move the problem to the controller. All that’s needed is to tell the controller what we intend to do, and the controller can figure the rest out. If you have problems solving any problem, just call it Software Defined Networking, or better yet Intent Based Networking, and the problems will melt away into the controller.
Pay no attention to the complexity of the code on the controller, or those pesky problems with CAP Theorem, or any protests on the part of long-term engineering staff who talk about total system complexity. They are probably just old curmudgeon who are protecting their territory in order to ensure they have a job tomorrow. Once you’ve automated the process you can safely ignore how the process works; the GUI is always your best guide to understanding the overall complexity of the system.
Examples of moving the problem abound in network engineering. For instance, it is widely known that managing customers is one of the hardest parts of operating a network. Customers move around, buy new hardware, change providers, and generally make it very difficult for providers by losing their passwords and personally identifying information (such as their Social Security Number in the US). To solve this problem, RFC8567 suggests moving the problem of storing enough information to uniquely identify each person into the Domain Name System. Moving the problem from the customer to DNS clearly solves the problem of providers (and governments) being able to track individuals on a global basis. The complexity and scale of the DNS system is not something to be concerned with, as DNS “just works,” and some method of protecting the privacy of individuals in such a system can surely be found. After all, it’s just software.
If the DNS system becomes too complex, it is simple enough to relieve DNS of the problem of mapping IP addresses to the names of hosts. Instead, each host can be assigned a single host that is used regardless of where it is attached to the network, and hence uniquely identifies the host throughout its lifetime. Such a system is suggested in RFC2100 and appears to be widely implemented in many networks already, at least from the perspective of application developers. Because DNS is “too slow,” application developers find it easier to move the problem DNS is supposed to solve into the routing system by assigning permanent IP addresses.
Another great example of moving a problem rather than solving it is RFC3215, Electricity over IP. Every building in the world, from houses to commercial storefronts, must have multiple cabling systems installed in order to support multiple kinds of infrastructure. If RFC3215 were widely implemented, a single kind of cable (or even fiber optics, if you want your electricity faster) can be installed in all buildings, and power carried over the IP network running on these cables (once the IP network is up and running). Many ancillary problems could be solved with such a scheme—for instance, power usage could be measured based on a per-packet system, rather than the sloppier kilowatt-hour system currently in use. Any bootstrap problems can be referred to the controller. After all, it’s just software.
The bottom line is this: when you cannot figure out how to solve a problem, just move it to some other system, especially if that system is “just software,” so the problem now becomes “software defined. This is also especially useful if moving the problem can be accomplished by claiming the result is a form of automation.
Moving problems around is always much easier than solving them.
Rethinking BGP on the DC Fabric (part 2)
In my last post on this topic, I laid out the purpose of this series—to start a discussion about whether BGP is the ideal underlay control plane for a DC fabric—and gave some definitions. Here, I’d like to dive into the reasons to not use BGP as a DC fabric underlay control plane—and the first of these reasons is BGP converges very slowly and requires a lot of help to converge at all.
Examples abound. I’ve seen the results of two testbeds in the last several years where a DC fabric was configured with each router (switch, if you prefer) in a separate AS, and some number of routes pushed into the network. In both cases—one large-scale, the other a more moderately scaled network on physical hardware—BGP simply failed to converge. Why? A quick look at how BGP converges might help explain these results.

Assume we are watching the 110::/64 route (attached to A, on the left side of the diagram), at P. What happens when A loses it’s connection to 110::/64? Assuming every router in this diagram is in a different AS, and the AS path length is the only factor determining the best path at every router.
Watching the route to 110::/64 at P, you would see the route move from G to M as the best path, then from M to K, then from K to N, and then finally completely drop out of P’s table. This is called the hunt because BGP “hunts,” apparently trying every path from the current best path to the longest possible path before finally removing the route from the network entirely. BGP isn’t really “hunting;” this is just an artifact of the way BGP speakers receive, process, and send updates through the network.
If you consider a more complex topology, like a five-stage butterfly fabric, you will find there are many (very many) alternate longer-length paths available for BGP to hunt through on a withdraw. Withdrawing thousands of routes at the same time, combined with the impact of the hunt, can put BGP in a state where it simply never converges.
To get BGP to converge, various techniques must be used. For instance, placing all the routers in the spine so they are in the AS, configuring path filters at ToR switches so they are never used as a transit path, etc. Even when these techniques are used, however, BGP can still require a minute or so to perform a withdraw.
This means the BGP configuration cannot be the same on every device—it is determined by where the device is located—which harms repeatability, the BGP configuration must contain complex filters, and messing up the configuration can bring the entire fabric down.
There are several counters to the problem of slow convergence, and the complex configurations required to make BGP converge more quickly, but this post is pushing against its limit … so I’ll leave these until next time.
The Hedge 69: Container Networking Done Right
Everyone who’s heard me talk about container networking knows I think it’s a bit of a disaster. This is what you get, though, when someone says “that’s really complex, I can discard the years of experience others have in designing this sort of thing and build something a lot simpler…” The result is usually something that’s more complex. Alex Pollitt joins Tom Ammon and I to discuss container networking, and new options that do container networking right.
Rethinking BGP on the DC Fabric
Everyone uses BGP for DC underlays now because … well, just because everyone does. After all, there’s an RFC explaining the idea, every tool in the world supports BGP for the underlay, and every vendor out there recommends some form of BGP in their design documents.
I’m going to swim against the current for the moment and spend a couple of weeks here discussing the case against BGP as a DC underlay protocol. I’m not the only one swimming against this particular current, of course—there are at least three proposals in the IETF (more, if you count things that will probably never be deployed) proposing link-state alternatives to BGP. If BGP is so ideal for DC fabric underlays, then why are so many smart people (at least they seem to be smart) working on finding another solution?
But before I get into my reasoning, it’s probably best to define a few things.
In a properly design data center, there are at least three control planes. The first of these I’ll call the application overlay. This control plane generally runs host-to-host, providing routing between applications, containers, or virtual machines. Kubernetes networking would be an example of an application overlay control plane.
The second of these I’ll call the infrastructure overlay. This is generally going to be eVPN running BGP, most likely with VXLAN encapsulation, and potentially with segment routing for traffic steering support. This control plane will typically run on either workload supporting hosts, providing routing for the hypervisor or internal bridge, or on the Top of Rack (ToR) routers (switches, but who knows what “router” and “switch” even mean any longer?).
Now notice that not all networks will have both application and infrastructure overlays—many data center fabrics will have one or the other. It’s okay for a data center fabric to only have one of these two overlays—whether one or both are needed is really a matter of local application and business requirements. I also expect both of these to use either BGP or some form of controller-based control plane. BGP was originally designed to be an overlay control plane; it only makes sense to use it where an overlay is required.
I’ll call the third control plane the infrastructure underlay. This control plane provides reachability for the tunnel head- and tail-ends. Plain IPv4 or IPv6 transport is supported here; perhaps some might inject MPLS as well.
My argument, over the next couple of weeks, is BGP is not the best possible choice for the infrastructure underlay. What I’m not arguing is every network that runs BGP as the infrastructure underlay needs to be ripped out and replaced, or that BGP is an awful, horrible, no-good choice. I’m arguing there are very good reasons not to use BGP for the infrastructure underlay—that we need to start reconsidering our monolithic assumption that BGP is the “only” or “best” choice.
I’m out of words for this week; I’ll begin the argument proper in my next post… stay tuned.
The Hedge 66: Daniel Migault and the ADD Working Group

The modern DNS landscape is becoming complex even for the end user. With the advent of so many public resolvers, DNS over TLS (DoT) and DNS over HTTPS (DoH), choosing a DNS resolver has become an important task. The ADD working group will, according to their page—
In this episode of the Hedge, Daniel Migault joins Alvaro Retana and Russ White to discuss Requirements for Discovering Designated Resolvers, draft-box-add-requirements-02.
Agglutinating Problems Considered Harmful (RFC2915, Rule 5)
In the networking world, many equate simplicity with the fewest number of moving parts. According to this line of thinking, if there are 100 routers, 10 firewalls, 3 control planes, and 4 management systems in a network, then reducing the number of routers to 95, the number of firewalls to 8, the number of control planes to 1, and the number of management systems to 3 would make the system “much simpler.” Disregarding the reduction in the number of management systems, scientifically proven to always increase in number, it does seem that reducing the number of physical devices, protocols in use, etc., would tend to decrease the complexity of the network.
The wise engineers of the IETF, however, has a word of warning in this area that all network engineers should heed. According to RFC1925, rule 5: “It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases this is a bad idea.” When “conventional wisdom” and the wisdom of engineers with the kind of experience and background as those who write IETF documents contradict one another, it is worth taking a deeper look.
A good place to begin is with other RFCs that might provide examples, or otherwise shed light on this situation. Two of particular interest are RFC1776 and RFC3093.
RFC1776 describes a very simplified transport protocol for use in the Internet and private networks. In normal packet formats there are many different components, such as a header and data sections. The header is normally made up of many different fields, such as the source address, the destination address, the quality of service, etc. The data section of the packet may also be divided into many different fields providing information for such functionality as error detection, flow control, and indicators of which application on the destination host this information is destined to (the port number is an example).
The authors of RFC1776 decided that the wisdom of making a single appliance which provides many services, the firewall being the classic example, and the wisdom of using a single protocol for everything, for instance using BGP for data center fabrics and interdomain connectivity, should be applied fully to the formatting of transport packets. In the spirit of agglutination common to all network engineering, RFC1776 recommends replacing the entire contents of a transport packet with a single address. The address must be a bit longer, of course, to carry the actual data, but using a single large field is inherently simpler than using many different fields. To accomplish this task, RFC1776 specifies a packet with 1696 octets (bytes) of address space. The number of octets originally selected is compatible with ATM, an older technology which uses a 53-octet cell but should also be compatible with all modern transport systems.
While the many advantages of this system are not fully described in the specification, it should be obvious packets containing a single field—the destination address—will be easier to hosts to generate and transmit, and easier for hosts to receive and process. The entire processing of the packet will just be transferring the address field directly into memory for consumption by any application running on the host that desires to consume it. The specification does note, however, that security is much simpler because there is no “user data” to secure.
RFC3093, a more recent example of agglutination in order to simplify network design and operation. This authors of this RFC note that applications are already moving to using a single port, 80, for all traffic, as most firewalls already pass traffic transmitted through this port without restrictions. The authors note the operation of the Internet would be much simpler if all applications ran over port 80. In this way, all applications could pass through firewalls without modification, while the firewalls themselves remain perfectly operational, fulfilling their intended purpose. Implementing this specification would also simplify the absolute mess of port and protocol numbers used in transporting data today, agglutinating them all down to a single port. As less is always simpler, this would create a simpler, easier to manage, global Internet.
The lessons to learn, after examining the options, may not be what was originally intended. Reducing the number of parts does not necessarily reduce the complexity of the overall system. If you haven’t found the tradeoffs, you haven’t looked hard enough.
