Research: P Fat Trees

Link speeds in data center fabrics continue to climb, with 10g, 25g, 40g, and 100g widely available, and 400g promised in just a few short years. What isn’t so obvious is how these higher speeds are being reached. A 100g link, for instance, is really four 25g links bundled as a single link at the physical layer. If the optics are increasing in speed, and the processors are increasing in their ability to switch traffic, why are these higher speed links being built in this way? According to the paper under investigation today, the reason is the speed of the chips that serialize traffic from and deserialize traffic off the optical medium. The development of the Complementary metal–oxide–semiconductor, of CMOS, chips required to build ever faster optical interfaces seems to have stalled out at around 25g, which means faster speeds must be achieved by bundling multiple lower speed links.

Mellette, William M., Alex C. Snoeren, and George Porter. “P-FatTree: A Multi-Channel Datacenter Network Topology.” In Proceedings of the 15th ACM Workshop on Hot Topics in Networks, 78–84. HotNets ’16. New York, NY, USA: ACM, 2016.

The authors then point out that many data operators have moved towards some form of chassis device in order to reduce the costs of cabling and optics. Chassis devices most often use some form of spine and leaf internally to switch traffic between the input and output ports across a short run copper fabric, resulting in a switching path within the chassis router that looks something like the following figure.

The spine and leaf in connecting the switching ASICs are one of the main reasons data center operators move away from chassis devices; the number of hops through the network becomes unstable with the addition of these internal spine and leaf fabrics, the backpressure and quality of service is essentially unmanageable across this fabric on most devices, and there is little in the way of traffic analysis that can be done on this internal fabric. The authors do not address these problems, however.

Rather, they address the added set of switching ASICs in the spine layer of the internal spine and leaf network. As it turns out, the switching ASICs themselves are a major consumer of power, and heat generator, in switches. They argue that removing this internal spine layer would greatly reduce the amount of power required in a fabric, as well as the amount of heat generated.
To do this, they propose unbundling the links attached to each SerDes CMOS chip, exposing them as individual links to the control plane. This would allow the switching path to be shortened to something like the figure below.

Exposing the unbundled links to the external control plane allows each stage of the internal fabric to be treated as another hop in the network, and hence for “normal” ECMP to choose the path through the chassis fabric.

The authors suggest the four unbundled links attached to a single switching ASIC can be treated as a member of a different “switching plane,” which, in effect, creates four virtual topologies across the fabric, each of which is one quarter the speed of the total fabric bandwidth. Each virtual topology could run its own control plane, producing four somewhat redundant networks, and the ability to steer traffic onto any given plane at the edge of the network for traffic engineering, policy separation, or any other purpose. The result is a fabric that is more flexible in use, while retaining a fixed hop count through the fabric, and reducing the ASIC count in the fabric by around one third.

This is an interesting concept, but it would require an entire fabric to be built this way from the ground up; there is little chance of a brown field deployment of this kind of design. One tradeoff in this kind of design would be the additional control plane state, including assigning four addresses to each host (although this might be mitigated by the clever use of anycast), and the maintenance of four control planes, etc. Another design tradeoff would be the shared risk link groups involved in splitting a single optical fiber and ASIC into four circuits—these aren’t exactly “virtual circuits,” but they share many of the same characteristics.

Weekend Reads 062918

The Internet and related digital systems that the United States did so much to create have effectuated and symbolized US military, economic, and cultural power for decades. The question raised by this essay is whether these systems, like the Roman Empire’s roads, will come to be seen as a platform that accelerated US decline. @The Hoover Institute

Article 13 reverses one of the key legal doctrines that allowed the Internet to thrive: the idea that computer networks are not “publishers” and are therefore not liable for the actions or statements of their users. This means that you can sue an individual user for libel or copyright infringement, but not the e-mail service or bulletin board or social media platform on which he did it. This immunity made it possible for computer networks to open a floodgate of content produced by independent individuals, without requiring service providers to serve as editors or moderators. —Robert Tracinski @The Federalist

The U.S. Supreme Court today ruled that the government needs to obtain a court-ordered warrant to gather location data on mobile device users. The decision is a major development for privacy rights, but experts say it may have limited bearing on the selling of real-time customer location data by the wireless carriers to third-party companies. —Krebs on Security

The need for an access model for non-public Whois data has been apparent since GDPR became a major issue before the community well over a year ago. Now is the time to address it seriously, and not with half measures. We urgently need a temporary model for access to non-public Whois data for legitimate uses, while the community undertakes longer-term policy development efforts. —Fabricio Vayra @CircleID

More and more companies, government agencies, educational institutions, and philanthropic organizations are today in the grip of a new phenomenon. I’ve termed it “metric fixation.” The key components of metric fixation are the belief that it is possible–and desirable–to replace professional judgment (acquired through personal experience and talent) with numerical indicators of comparative performance based upon standardized data (metrics); and that the best way to motivate people within these organizations is by attaching rewards and penalties to their measured performance. —Jerry Muller @Fast Company

I wish 5G, with its 490 Mbit/sec. speeds and download latency times of 17 milliseconds, was just around the corner. It’s not. I know, I know. AT&T Mobility, Verizon Wireless, and the pairing of T-Mobile and Sprint are all promising 5G real soon now. They’re … fibbing. —Steven J. Vaughan-Nichols @IT World

Digital collaboration technologies are accelerating productivity in the post-phone-call workplace, but tools like Yammer, Workplace by Facebook, and Slack have their dark side. While these channels can help speed group decision-making, they also serve as an enterprise blind spot for insider threats to do their worst – not to mention being open conduits for spreading negativity and toxic behaviors among the ranks. —Ericka Chickowski @Dark Reading

We have reached a point in the evolution of cyber security where handsoff, behind-the-scenes cyber defense should be the norm. Clearly, the best solution would be to deploy less-vulnerable systems. This is a topic that has received great attention for approximately five decades, but developers continue to resist using tools and techniques that have been shown to be effective, such as code minimization, employing formal development methods, and using type-safe languages. —Josiah Dykstra, Eugene H. Spafford @ACM

May 2018

April 2018

The Universal Fat Tree

Have you ever wondered why spine-and-leaf networks are the "standard" for data center networks? While the answer has a lot

Whatever is vOLT-HA?

Many network engineers find the entire world of telecom to be confusing—especially as papers are peppered with a lot of