Lessons from Andromeda

A common complaint I hear among network engineers is that the lessons and techniques used by truly huge scale networks simply are not applicable to more “standard scale” networks. The key point, however, is balance—to look for the ideas and concepts that are interesting and at least somewhat novel, and then see how they might be applied to products and systems in all networks. Learning concepts can help you understand design patterns you might encounter almost anywhere. One recent paper, for instance, details Andromeda, a large scale networking system designed and operated by Google, one of the few truly huge networks in the world—

Andromeda is designed around a flexible hierarchy of flow processing paths. Flows are mapped to a programming path dynamically based on feature and performance requirements.

While the paper describes the general compute environment, and the forwarding process on individual nodes, the most interesting part from a network engineering perspective is hoverboard. While this concept behind hoverboard has been implemented in previous systems, it is usually hidden under the covers of a vertically integrated system, and therefore not normally something you see the inner workings of. To understand hoverboard, you have to begin with a little theory about the distribution and management of control plane data in a network.

TL;DR[time-span]

  • Splitting the control plane between reachability, topology, and policy enables some interesting new ways to think about scaling and complexity in network design
  • The distribution of policy does not need to be static, or fixed, but rather can be distributed to different places at different times depending on need and efficiency
  • The operation of large scale networks has much to teach us about the efficient operation of networks in general

 

Eyvonne and I did a short take on policy at the edge that might be a helpful starting point.

The closer you can implement policy at the edge, the more efficient your use of network bandwidth is. The closer you implement your policy to the edge, however, the more distributed your policy is—and distributed policy tends to be difficult to maintain across time. This is a classic example of the state/optimization/surface triad (described in Navigating Network Complexity, for instance). The more state you add to the network in more places, the more optimal your use of resources will be.

Dalton, Michael, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshuman Gupta, Brian Fahs, Dima Rubinstein, et al. 2018. “Andromeda: Performance, Isolation, and Velocity at Scale in Cloud Network Virtualization.” In , 373–87. https://www.usenix.org/conference/nsdi18/presentation/dalton.

There are a number of different solutions to this problem. One I have often advocated in the past is layering the control plane; one part of the control plane handles topology and reachability, while the other part handles policy implementation. The hoverboard idea is a variant of this idea of layering. In normal IP routing, all traffic from a host passes through a default gateway. Thus, the host only needs to know a minimal amount of control plane information. However, not knowing this information often precludes the host from being able to implement anything other than “blind policy.” Policy, then, is often implemented at the first hop in the network, the default gateway, which often becomes an appliance (like a firewall) to support the policy and forwarding load.

Hoverboard takes a slightly different path. The first-hop router (the default gateway) remains in place, and is the primary point of policy implementation in the network. However, the first-hop router has a back-channel to the host through a controller. The controller manages the policy at all the network edges (the policy overlay in the layered control plane idea above). When the traffic level for a flow reaches a specific level, the first-hop router signals the controller to move the policy from the first-hop to the host itself. Using an illustration from the paper itself—

This balances the distribution of policy against the efficiency of packet forwarding across time. Once a flow has become large enough, or has lasted long enough in time, policies related to that flow can be transferred from the network device to the host. This kind of coordination assumes a number of things, including: the ability to “see” the flows at both the host and the network (or deep telemetry); a layered control plane where reachability, topology, and policy are handled separately; and an overlaying controller that brokers policy onto the network and attached devices.

This kind of vertical integration is difficult to achieve in and environment built by multiple vendors. Some form of standard would be needed to carry the policy from one point to another, for instance—such as a well-designed set of policy descriptions built in a common modeling language… Perhaps something like YANG? 🙂

But while this kind of system would be difficult to deploy in an environment across multiple vendors, and without a single point of control from the systems and software side, this is the kind of thing that could make networks scale much better, be simpler to operate, and allow operators to manage complexity in a way that makes sense.