Reaction: Centralization Wins

Warning: in this post, I am going to cross a little into philosophy, governance, and other odd subjects. Here there be dragons. Let me begin by setting the stage:

Decentralized systems will continue to lose to centralized systems until there’s a driver requiring decentralization to deliver a clearly superior consumer experience. Unfortunately, that may not happen for quite some time. —Todd Hoff @High Scalability

And the very helpful diagram which accompanies the quote—

The point Todd Hoff, the author makes, is that five years ago he believed the decentralized model would win, in terms of the way the Internet is structured. However, today he doesn’t believe this; centralization is winning. Two points worth considering before jumping into a more general discussion.

First, the decentralized model is almost always the most efficient in almost every respect. It is the model with the lowest signal-to-noise ratio, and the model with the highest gain. The simplest way to explain this is to note the primary costs in a network is the cost of connectivity, and the primary gain is the amount of support connections provide. The distributed model offers the best balance of these two.

Second, what we are generally talking about here is data and the people connections, rather than the physical infrastructure. While physical hardware is important, it will always lag behind the data and people connections by some amount of time.

These things said, what is the point?

If decentralized is the most efficient model, then why is centralization winning? There are two drivers that can cause centralization to win. The first is scarcity of resources. For instance, you want a centralized authority (within some geographic area) to build roads. Who wants two roads to their house? And would that necessitate two garages, one for each road system? Physical space is limited in a way that, in relation to road systems, centralization (within geographic areas!) makes sense.

The second is regulatory capture. Treating a resource that is not physically constrained as a scarce resource forces centralization, generally to the benefit of the resulting private/public partnership. The overall system is less efficient, but through regulatory capture, or rent seeking, the power accrues to a single entity, which then uses its power to enforce the centralized model.

In the natural order of things, there is disruption of some sort. During the disruption phase, things are changing, and the cost of maintaining a connection in the network is low, so the network tends towards a distributed model. Over time, a distributed model will emerge naturally. Finally, as one node, or small set of nodes, gain(s) power, the network will tend towards centralization. If regulatory bodies can be found and captured, then the centralized model will be enforced. The more capture, the more strongly the centralized model will become entrenched.

Over time, if innovation is allowed (there is often an attempt to suppress innovation, but this is a two-edged sword), some new “thing,” whether a social movement or a technology—generally a blend of both—will build a new distributed network of some sort, and thus disrupt the centralized network.

What does all of this have to do with network engineering? We are currently moving towards strong regulatory capture with highly centralized networks. The owners of those centralized resources are battling it out, trying to strengthen their regulatory capture in every way possible—the war over net neutrality is but one component of this ongoing battle (which is why it often seems there are no “good folks” and “bad folks” in this argument). At some point, that battle will be decisively won, and one kind of information broker is going to win over the other.

In the process, the “old guard,” the “vendors,” are being required to change their focus, and trying to survive. Disaggregation, and “software defined,” are two elements of this shift in power.

The question is: will we reach full centralization? Or will some new idea—a new/old technology that organizes information in a different way—disrupt the coalescing centralized powers?

The answer to this question impacts what skills you should be learning now, and how you approach the rest of your career (or life!). A lot of your career rests not just on understanding the command lines and hardware, but reaching beyond these into understanding the technologies, and even beyond the technologies to understand the way organizations work. What you should be learning now, and what you are paying attention to, should not reflect the last war, but the next one. Should you study for a certification? Which one? Should you focus on a vendor, or a software package, or a specific technology? For how long, and how deeply?

And yes, I fully understand you cannot sell your longer term technical ability to prospective employers; the entire hiring process today is tailored to hedgehogs rather than foxes. This, however, is a topic for some other time.

Reaction: Nerd Knobs and Open Source in Network Software

This is an interesting take on where we are in the data networking world—

Tech is commoditizing, meaning that vendors in the space are losing feature differentiation. That happens for a number of reasons, the most obvious of which is that you run out of useful features. Other reasons include the difficulty in making less-obvious features matter to buyers, lack of insight by vendors into what’s useful to start off with, and difficulty in getting media access for any story that’s not a promise of total revolution. Whatever the reason, or combination of reasons, it’s getting harder for network vendors to promote features they offer as the reasons to buy their stuff. What’s left, obviously, is price. —Tom Nolle @CIMI

There are things here I agree with, and things I don’t agree with.

Tech is commoditizing. I’ve talked about this before; I think networking is commoditizing at the device level, and the days of appliance based networking are behind us. But are networks themselves a commodity? Not any more than any other system.

We are running out of useful features, so vendors are losing feature differentiation. This one is going to take a little longer… When I first started in network engineering, the world was multiprotocol, and we had a lot of different transports. For instance, we took cases on IPX, VIP, Appletalk, NetBios, and many other protocols. These all ran on top of Ethernet, T1, Frame, ATM, FDDI, RPR, Token Ring, ARCnet, various sorts of serial links … The list always felt a little too long, to me. Today we have IPv4, IPv6, and MPLS on top of Ethernet, pretty much. All transports are framed as Ethernet, and all upper layer protocol use some form of IP. MPLS sits in the middle as the most common “transport enhancer.” The first thing to note is that the space across which useful features can be created is considerably smaller than it used to be.

To some degree, the second space in which useful features can be developed is in supporting specific application requirements. For instance, it was a big deal getting voice to run on IP. Today, we throw bandwidth and some light QoS at the problem, and call it done. We still have a lot of Ethernet over IP right now, but I suspect this will eventually “go away,” as well. At least I hope so. If we could get a few key vendors to stop pushing mobility as a feature only available across a single flooding domain, it would simplify everyone’s lives.

Hint hint.

To put this another way, features are sold into complexity, and the network has become radically simpler in the areas where appliance based vendors have always sold features.

At the same time, anyone who reads/listens to/interacts with anything I have said in the last few years knows I do not think networks are becoming simpler. To the contrary, I think networks are becoming more complex. To solve hard problems in a way that interacts with environmental instability well, you need complexity. If one part of the network is becoming simpler, and we still want to solve these kinds of problems, we must be shifting the complexity to another part of the network.

And we are.

Look at eVPNs, for instance, and you will see where one sort of shift is happening—from the transport to the control plane. We are throwing a lot of complexity into BGP to solve a set of problems that, honestly, should not exist in the first place—stretching broadcast domains to solve a mobility problem. Another kind of shift is what we see happening at the hyperscalers. Contrary to popular belief, you should care about what the hyperscalers are doing, because what they are doing will be commonplace in a network near you soon enough. It used to be odd to see a data center fabric with 100,000 hosts. I can name a few dozen companies that are at this scale today, and the number is only growing.

Worse, many of those companies are trying to manage this kind of scale without taking advantage of the lessons learned at the content providers—because the folks in those organizations read all the time about the millions of servers in the hyperscale world, and how “the solutions hyperscalers use don’t apply to you.”

Let’s recount some history. Ethernet only was a thing only the hyperscalers of the day (the IX’s) did many years ago, and would never apply to everyone. Spine and leaf fabrics were something the hyperscalers of the day did many years ago, and would never apply to everyone. High speed fiber links were a thing only the large scale providers of the day did, and would never apply to everyone. 10g to the server was something only the hyperscalers of the day did, and would never apply to everyone. MPLS was something only the hyperscalers of the day did, and would never apply to you. Shall I continue? One of the side effects of there are no providers, there are no enterprises; there are problems, and there are solutions is just this: whatever tool works, use it. Ignore all the folks saying “you can’t use that tool, because you’re not a part of the select club that is supposed to use it.”

But this does point us in another direction of complexity, the definition of new features, and the differentiation process.

Scale and manageability are the features that sell right now.

We have gone from a world where you can sell features, because everyone assumed spending money on the network, to a world in which you must prove the network is worth spending money on. Which means treating routers and switches as cattle, and figuring out how to understand what the network is doing and why, are the important places to look. Basic operation might be “done,” but figuring out how to build a good network that runs well, and showing value, is far from done, and probably will not ever be done.

In the end, I agree with the original article—open source is going to drive commoditization at the device level.

I will go farther, and say open source is going to force us to rethink the rush to new features, to rethink the entire security space, and to rethink the way we handle code quality. Operators are going to be forced to take more of all of these things into their own hands, or to outsource it through much stronger vertical integration than we’ve seen in the past.

The network world is definitely changing; the collision with open source is one more element in what is likely to come in the future, and it will drive commoditization at some level. At the same time, this very commoditization is going to leave a much more interesting set of problems in its wake, and a more interesting industry to live in.

Are you ready?

Reaction: Network software quality

Over at IT ProPortal, Dr Greg Law has an article up chiding the networking world for the poor software quality. To wit—

When networking companies ship equipment out containing critical bugs, providing remediation in response to their discovery can be almost impossible. Their engineers back at base often lack the data they need to reproduce the issue as it’s usually kept by clients on premise. An inability to cure a product defect could result in the failure of a product line, temporary or permanent withdrawal of a product, lost customers, reputational damage, and product reengineering expenses, any of which could have a material impact on revenue, margins, and net income.

Let me begin here: Dr. Law, you are correct—we have a problem with software quality. I think the problem is a bit larger than just the networking world—for instance, my family just purchased two new vehicles, a Volvo and a Fiat. Both have Android systems in the center screen. And neither will connect correctly with our Android based phones. It probably isn’t mission critical, like it could be for a network, but it is annoying.

But even given software quality is a widespread issue in our world, it is still true that networks are something of a special case. While networks are often just hidden behind the plug, they play a much larger role in the way the world works than most people realize. Much like the train system at the turn of the century, and the mixed mode transportation systems that enable us to put dinner on the table every night, the network carries most of what really matters in the virtual world, from our money to our medical records.

Given the assessment is correct—and I think it is—what is the answer?

One answer is to simply do better. To fuss at the vendors, and the open source projects, and to make the quality better. The beatings, as they say, will continue until moral improves. If anyone out there thinks this will really work, raise your hands. No, higher. I can’t see you. Or maybe no-one has their hands raised.

What, then, is the solution? I think Dr. Law actually gets at the corner of what the solution needs to be in this line—

The complexity of the network stack though, is higher than ever. An increased number of protocols leads to a more complex architecture, which in turn severely impacts operational efficiency of networks.

For a short review, remember that complexity is required to solve hard problems. Specifically, the one hard problem complexity is designed to solve is environmental uncertainty. Because of this, we are not going to get rid of complexity any time soon. There are too many old applications, and too many old appliances, that no-one willing to let go of. There are too many vendors trying to keep people within their ecosystem, and too many resulting one-off connectors to bridge the gap, that will never be replaced. Complexity isn’t really going to be dramatically reduced until we bite the bullet and take these kinds of organizational and people problems on head on.

In the meantime, what can we do?

Design simpler. Stop stacking tons of layers. Focus on solving problems, rather than deploying technologies. Stop being afraid to rip things out.

If you have read my work in the past, for instance Navigating Network Complexity, or Computer Networking Problems and Solutions, or even The Art of Network Architecture, you know the drill.

We can all cast blame at the vendors, but part of this is on us as network engineers. If you want better quality in your network, the best place to start is with the network you are working on right now, the people who are designing and deploying that network, and the people who make the business decisions.

Reaction: Some Sayings that Sum Up Networking

Over at the CIMI blog, Tom Nolle has a mixed bag of sayings and thoughts about the computer networking world, in particular how it relates to the media. Some of these were interesting enough that they seemed worth highlighting and writing a bit more on.

“News” means “novelty”, not “truth”. In much of the computer networking world, news is what sells products, rather than business need. In turn, Novelty is what drives the news. The “straight line” connection, then is from novelty to news to product, and product manufacturers know this. This is not just a vendor driven problem, however; this is also driven by recruitment, and padding resumes, and many other facets of the networking nerd culture.

On the other hand, novelty is never a good starting place for network design. Rather, network design needs to start with problems that need to be solved, proceeds by considering how those problems can be solved with technologies, then builds requirements based on the problems and technologies, and finally considers which products can be used to implement all of this at the lowest long term cost. This is not to say novelty is not useful, or is not justified, but rather that novelty is not the point.

How can you overcome the drive to novelty through the news cycle? Go back to basics. Every “novel” thing you are looking at in the latest news story is something that has been invented and implemented before in a different package, and with a different name. Apply rule 11 liberally to all marketing claims, look for the problem to be solved, push back on the requirements, think systemically, manage your own expectations, and go back to basics.

To a user, “the network” is whatever isn’t on their desk or in their device. This is a point folks who work on the network for a living often forget. Talking to a non-networking person about networking technology is often like talking to someone who commutes on the train about how the train works; it might be interesting, but they often just do not care. There are several implications here: the first is that if your business relies on the network (and most do, whether or not they realize it), as the network engineer, you need to go beyond just making the train work, to helping others understand that why and how the network (the train) runs is important to reaching the overall business goals. There is an entire movement within the networking world that would say: “networks are a commodity, just like the train is, just move the packets and shut up.” I do not tend to agree with this; for a city, a train is not a commodity, it is a vital resource that grows business and interacts with people’s lives. The network is like the train to a city; it might be a commodity for the person riding it, but it is not for the overall business.

There’s no substitute for knowing what you’re doing. But what does it mean to “know what you are doing?” In a large complex system, you can know what is on “your layer,” or “your piece of the system,” plus one or two levels above and below. The rest is rumor and pop psychology.

In a world where there is just too much information, how can you “know what you are doing?” First, you can use rule 11 to your advantage, and realize that everything that is, has been before. If you know the underlying technology, then the implementation is much easier to learn (if you need to learn it at all!). If you know the pattern, then you can see the details much more easily. Second, you can insist on radical simplicity, which will make the process of knowing the entire system much easier. Third, you can intentionally think systematically, and functionally, rather than orienting yourself to products.

Reaction: The Power of Open APIs

Disaggregation, in the form of splitting network hardware from network software, is often touted as a way to save money (as if network engineering were primarily about saving money, rather than adding value—but this is a different soap box). The primary connections between disaggregation and saving money are the ability to deploy white boxes, and the ability to centralize the control plane to simplify the network (think software defined networks here—again, whether or not both of these are true as advertised is a different discussion).

But drivers that focus on cost miss more than half the picture. A better way to drive the value of disaggregation, and the larger value of networks within the larger network technology sphere, is through increased value. What drives value in network engineering? It’s often simplest to return to Tannenbaum’s example of the station wagon full of VHS backup tapes. To bring the example into more modern terms, it is difficult to beat the bandwidth of an overnight box full of USB thumb drives in terms of pure bandwidth.

In this view, networks can primarily be seen as a sop to human impatience. They are a way to get things done more quickly. In the case of networks quantity—speed—often becomes a form of quality—increased value.

But what does disaggregation have to do with speed? The connection is the open API.

When you disaggregate a network device into hardware and software, you necessarily create a stable, openly accessible API between the software and the hardware. Routing protocols, and other control plane elements must be able to build a routing table that is somehow then passed on to the forwarding hardware, so packets can be forwarded through the network. A fortuitous side effect of this kind of open API is that anyone can use it to control the forwarding software.

Enter the new release of ScyllaDB. According to folks who test these things, and should know, ScyllaDB is much faster than Cassandra, another industry leading open source database system. How much faster? Five to ten times faster. A five- to ten-fold improvement in database performance is, potentially, a point of quantity that can easily have a different quality. How much faster could your business handle orders, or customer service calls, or many other things, if you could speed the database end of the problem up even five-fold? How many new ways of processing information to gain insight from that data about business operations, customers, etc.

How does Scylla provide these kinds of improvements over Cassandra? In the first place, the newer database system is written in a faster language, C++ rather than Java. Scylla also shards processing across processor cores more efficiently. It doesn’t rely on the page cache.

None of this has to do with network disaggregation—but there is one way the Scylla developers improved the performance of their database that does relate to network disaggregation: ScyllaDB writes directly to the network interface card using DPDK. The interesting point, from a network engineering point of view, is that simply would not be possible without disaggregation between hardware and software opening up DPDK as an interface for a database to directly push packets to the hardware.

The side effects of disaggregation are only beginning to be felt in the network engineering world; the ultimate effects could reshape the way we think about application performance on the network, and the entire realm of network engineering.

Reaction: DNS Complexity Lessons

Recently, Bert Hubert wrote of a growing problem in the networking world: the complexity of DNS. We have two systems we all use in the Internet, DNS and BGP. Both of these systems appear to be able to handle anything we can throw at them and “keep on ticking.”

this article was crossposted to CircleID

But how far can we drive the complexity of these systems before they ultimately fail? Bert posted this chart to the APNIC blog to illustrate the problem—

I am old enough to remember when the entire Cisco IOS Software (classic) code base was under 150,000 lines; today, I suspect most BGP and DNS implementations are well over this size. Consider this for a moment—a single protocol implementation that is larger than an entire Network Operating System ten to fifteen years back.

What really grabbed my attention, though, was one of the reasons Bert believes we have these complexity problems—

DNS developers frequently see immense complexity not as a problem but as a welcome challenge to be overcome. We say ‘yes’ to things we should say ‘no’ to. Less gifted developer communities would have to say no automatically since they simply would not be able to implement all that new stuff. We do not have this problem. We’re also too proud to say we find something (too) hard.

How often is this the problem in network design and deployment? “Oh, you want a stretched Ethernet link between two data centers 150 miles apart, and you want an eVPN control plane on top of the stretched Ethernet to support MPLS Traffic Engineering, and you want…” All the while the equipment budget is ringing up numbers in our heads, and the realyl cool stuff we will be able to play with is building up on the list we are writing in front of us. Then you hear the ultimate challenge—”if you were a real engineer, you could figure out how to do this all with a pair of routers I can buy down at the local office supply store.”

Some problems just do not need to be solved in the current system. Some problems just need to have their own system built for them, rather than reusing the same old stuff because, well, “we can.”

The real engineer is the one who knows how to say “no.”