Reaction: AT&T’s Paper on dNOS

The AT&T White Paper: What they get Right, what they get Wrong

AT&T recently published a paper on dNOS, an open, disaggregated, Network Operating System for any kind of hardware. They list three primary purposes for their effort at helping the networking industry build an open source dNOS:

  • To improve the rate of innovation and introduction of new features
  • To provide more flexibility in network design
  • To reduce costs where possible by taking advantage of purchasing “at scale”

How could disaggregation help with these three goals? The first of these, the rate of innovation, is really about packaging and perception, but we often forget that perception is a large part of innovation. If software developers always believe they must wait on the hardware, and hardware developers always feel like they must wait on the software, then the two teams develop an interlocking system that can slow down the pace at which either team can operate. One certain way to drive innovation is to break up such interconnected systems, allowing each one to use the features of the other in ways not originally intended, or drive the other team to create new features through competition. For instance, if the software team develops a new switching feature, and that feature proves to be very popular and useful, it can (at least sometimes) drive the hardware team to create a parallel feature in hardware.

The second of these depends on having a team that is ready and able to customize a solution to a particular environment. Most companies appear to be afraid of this sort of customization today, preferring to chose a vendor with a “lot of features,” and enabling what is needed to achieve a particular goal. The problem with the “throw features into the product, and only use what you want,” is you are still paying for those features in development costs, testing costs, deployment costs, and risks of failure.

The third of these is really a matter of simple scale. While it might seem most companies are not going to have the scale needed to take advantage of volume pricing, as the hardware market becomes more accustomed to selling “just hardware,” they will be more flexible on pricing, bringing the value of volume down to the mid scale market over time.

There are some interesting architecture decisions and questions in this document that will need to be worked out over time, however. For instance, in figure 2 the paper shows what appears to be a single data repository for the FIB and other system state.

And on page 8: “Basic network state information is stored in a set of common shared data structures. These data structures include information about interface state, neighbor resolution tables, forwarding information base (FIB) state, and other such items.”

This kind of design—all the rage in NOS design now-a-days—is a really bad architectural decision. There are two different kinds of data here, one of which wants moderate access speed with strong and flexible structuring, the other of which wants the fastest possible access with minimal and fixed structuring. Specifically, the Forwarding Information Base (FIB) wants speed, and contains information whose structure does not change all that often. For this kind of data store, fixed length fields organized into some sort of quick lookup tree is often best. The structure of configuration data changes often, however, and quick lookup isn’t as important as having a strongly modeled database so multiple clients can read the information without needing to impute the structure. This quandry is a classic case of “Type Length Value (TLV) versus fixed length fields” in the design of protocols. There is no single “right” answer—there are just tradeoffs, and choosing the right solution for the problem at hand.

It’s hard to judge what the intent is here, though. Do the authors intend multiple tables, or a single table? There seems to be some confusion in the document. For instance, on page 12, the authors state:

It is important the base operating system is the authoritative repository for basic network state information. In a Linux environment, this means network state is stored in the Linux kernel data structures. The netlink protocol is the primary method for populating those structures and acts as the conduit between the base operating system and the control and management plane applications.

Which seems to contradict with the figure and the statement on page 8. Perhaps the intent is to be flexible in which designs are chosen?

There is a desire to deploy on both bare metal and virtualized systems (page 11), a goal that is laudible but often difficult to create in practice while preserving performance. The entire system must often be redesigned below the netlinks interface to get good performance in both situations.

Overall, this is an interesting paper. There are some good points, some bad points, and some confusions that need to be ironed out. Ultimately, however, it is good for the disaggregated ecosystem that a large player is trying to standardize on and prefer a more open platform.

Weekend Reads: The Relay Box Attack

West Midlands Police believe it is the first time the high-tech crime has been caught on camera. Relay boxes can receive signals through walls, doors and windows but not metal. The theft took just one minute and the Mercedes car, stolen from the Elmdon area of Solihull on 24 September, has not been recovered. @BBCWest Midlands Police believe it is the first time the high-tech crime has been caught on camera. Relay boxes can receive signals through walls, doors and windows but not metal. The theft took just one minute and the Mercedes car, stolen from the Elmdon area of Solihull on 24 September, has not been recovered. @BBC

There’s rising worry that corporations are taking over America. But after reviewing a slew of the bids by cities and states wooing Amazon’s massive second headquarters, I don’t think “takeover” quite captures what’s going on. More like “surrender.” Last month Amazon announced it got 238 offers for its new, proposed 50,000-employee HQ2. I set out to see what’s in them, but only about 30 have been released so far under public-record acts. @The Seattle Times

The Supreme Court will hear oral arguments in Carpenter v. United States on November 29th. Carpenter centers on whether law enforcement needs a warrant to access 127 days of historic cell-site location information (CSLI). The case is important because of the great quantity of demands for location information now being made by law enforcement, because the location information that is sought is very revealing, and because law enforcement often obtains such data without obtaining a warrant, which increases the likelihood that sensitive location information about innocent people is collected. @The Center for Democracy and Technology

It’s amazing how congressional Republicans have been singularly unable, since winning the White House and both houses of Congress, to advance any major legislative priorities for their voters, but still quite able to advance legislation that most Republican voters would oppose — if they learned about it.
Republican leaders are sponsoring three bills that would expand the U.S. surveillance state under the guise of improving education and government efficiency. A grassroots opposition letter lists and summarizes the bills, the second of which passed the House last week… @The Federalist

Are the very large, very successful tech megaplatforms a problem that needs solving? Are they suppressing competition, innovation, free speech, democracy? I’m skeptical that case has been proven to the extent a strong public policy response is required ASAP. And I am equally skeptical of the solution set being offered by those who are quite comfortable that the anti-tech case has been proven. Break ‘em up! Regulate the heck out of them! (Sotto voce: Nationalize them.) @ The American Enterprise Institute