The CORD Architecture
Edge provider networks, supporting DSL, voice, and other services to consumers and small businesses, tend to be more heavily bound by vendor specific equipment and hardware centric standards. These networks are built around the more closed telephone standards, rather than the more open internetworking standards, and hence they tend to be more expensive to operate and manage. As one friend said about a company that supplies this equipment, “they just print money.” The large edge providers, such as AT&T and Verizon, however, are not endless pools of money. These providers are in a squeeze between the content providers, who are taking large shares of revenue, and consumers, who are always looking for a lower price option for communications and new and interesting services.
If this seems like an area that’s ripe for virtualization to you, you’re not alone. AT&T has been working on a project called CORD for a few years in this area; they published a series of papers on the topic that make for interesting reading:
- A set of slides covering the concept can be found here
- A recording of a talk on this topic can be found here
- A shorter white paper on the topic can be found here
- A web site with tests, configurations, and other information is here
On the last site, there is an actual reference implementation document that walks through much of the hardware they’ve selected. The documents certainly push every “modern” idea in the stack, including OpenStack, OpenFlow, Docker containers, and commodity/white box hardware.
My impressions?
First, I’m not convinced Openflow is going to represent the best set of tradeoffs possible at scale, even if it can truly scale to tens of thousands of devices. No matter how magical centralizing the control plane might seem in terms of simplicity and ease of management, the control plane is, and always will be, akin to a database, and hence will be subject to the rules of CAP theorem. Telco operators are, of course, still more comfortable in the centralized management end of things, so they might be willing (and potentially even able) to make the trade offs required to centralize the control plane. This isn’t going to set a wide pattern for the rest of the world, where a hybrid model of some kind is still going to be a better fit.
Second, nothing in the paper discusses the problems of hardware abstraction and common management among the various white boxes. If there’s one thing I’ve seen up close and personal since moving to a hyper scaler, it’s that one of the more difficult problems to face in the wild is that you either lose performance with a single common interface across a range of chipsets, or you need to find a way to manage the multiple chipset interfaces, including having a plan for future changes. There is a practical limit to the number of chipsets you can support either way, and a practical limit on the number of devices you can run an open software package on efficiently. These are just the realities of life intruding on the whole “white box” game—you’re moving from buying everything from Compaq to caring about who makes the chipset inside. I don’t know if this piece of the puzzle is being glossed over, or if they’ve already faced this reality in the hardware reference platform choice (how much of the hardware platform choice is being guided by this problem).
Third, I wonder how much efficiency in processing and network utilization they’re giving up to get rid of these racks of proprietary equipment. Again, there’s little mention of the problem in the papers I’ve read so far, but clearly there’s going to be some additional bandwidth usage and trombone routing across these fabrics. What is the impact on services, quality of service, and other “stuff?” It would be interesting to see how these questions are worked out in real deployments.
Finally, the information provided in the papers all point to a small spine & leaf at the pod level. You can be certain these are being pulled onto larger spine and leaf fabrics in local points of presence, data centers, or foglets—whatever you want to call the things any more—but there’s little mention of the overall network architecture in the public information I’ve seen. Providers will be providers, after all; the network architecture, overall, is still considered a fairly strategic piece of information.
Nonetheless, if you want a broad idea of how NFV, white box, and other interesting ideas are proposed to play out in the world of large scale edge providers, this is an interesting area to read in.