Whither Network Engineering? (Part 2)

In the first post of this series at the turn of 2019, I considered the forces I think will cause network engineering to radically change. What about the timing of these changes? I hear a lot of people say” “this stuff isn’t coming for twenty years or more, so don’t worry about it… there is plenty of time to adapt.” This optimism seems completely misplaced to me. Markets and ideas are like that old house you pass all the time—you know the one. No-one has maintained it for years, but it is so … solid. It was built out of the best timber, by people who knew what they were doing. The foundation is deep, and it has lasted all these years.

Then one day you pass a heap of wood on the side of the road and realize—this is that old house that seemed so solid just a few days ago. Sometime in the night, that house that was so solid collapsed. The outer shell was covering up a lot of inner rot. Kuhn, in The Structure of Scientific Revolutions, argues this is the way ideas always go. They appear to be solid one day, and then all the supports that looked so solid just moments before the collapse are all shown to be full of termites. The entire system of theories collapses in what seems like a moment compared to the amount of time the theory has stood. History has borne this way of looking at things out.

The point is: we could wake up in five years’ time and find the entire structure of the network engineering market has changed while we were asleep at the console running traceroute. I hear a lot of people talk about how it will take tens of years for any real change to take place because some class of businesses (usually the enterprise) do not take up new things very quickly. This line of thinking assumes the structure of business will remain the same—I think this tends to underestimate the symbiotic relationship between business and information technology. We have become so accustomed to seeing IT as a cost center that has little bearing on the overall business that it is hard to shift our thinking to the new realities that are starting to take hold.

While some niche retailers are doing okay, most of the the broad-based ones are in real trouble. Shopping malls are like ghost towns, bookstores are closing in droves; even grocery stores are struggling in many areas. This is not about second-day delivery—this is about data. Companies must either be in a deep niche or learn to work with data to survive. Companies that can most effectively combine and use data to anticipate, adapt to, and influence consumer behavior will survive. The rest will not.

Let me give some examples that might help. Consider Oak Island Hardware, a local hardware store, Home Depot, Sears, and Amazon. First, there are two kinds of businesses here; while all four have products that overlap, they service two different kinds of needs. In the one case, Home Depot and Oak Island Hardware cater to geographically localized wants where physical presence counts. When your plumbing starts to leak, you don’t have time to wait for next-day delivery. If you are in the middle of rebuilding a wall or a cabinet and you need another box of nails, you are not waiting for a delivery. You will get in your car and drive to the nearest place that sells such things. To some degree, Oak Island Hardware and Home Depot are in a separate kind of market than Sears and Amazon.

Consider Sears and Amazon as a pair first. Amazon internalized its data handling, and builds semi-custom solutions to support that data handling. Sears tried to focus on local stores, inventory management, and other traditional methods of retail. Sears is gone, Amazon remains. So Home Depot and Oak Island Hardware have a “niche” that protects them (to some degree) from the ravages of the data focused world. Now consider Oak Island Hardware versus Home Depot. Here the niche is primarily geographical—there just is not enough room on Oak Island to build a Home Depot. When people need a box of nails “now,” they will often choose the closer store to get those nails.

On the other hand, what kind of IT needs does a stand-alone store like Oak Island Hardware have? I do not think they will be directly hiring any network engineers in the near future. Instead, they will be purchasing IT services in the form of cloud-based applications. These cloud-based applications, in turn, will be hosted on … disaggregated stacks run by providers.

The companies in the broader markets that are doing well have have built fully- or semi-customized systems to handle data efficiently. The network is no longer treated as a “thing” to be built; it is just another part of a larger data delivery system. Ultimately, businesses in broader markets that want to survive need to shift their thinking to data. The most efficient way to do this is to shift to a disaggregated, layered model similar to the one the web- and hyper-scalers have moved to.

I can hear you out there now, reading this, saying: “No! They can’t do this! The average IT shop doesn’t have the skilled people, the vision, the leadership, the… The web- and hyper-scalers have specialized systems built for a single purpose! This stuff doesn’t apply to enterprise networks!”

In answer to this plethora of objections, let me tell you a story.

Once, a long time ago, I was sent off to work on installing a project called PC3; a new US Air Force personnel management system. My job was primarily on the network side of the house, running physical circuits through the on-base systems, installing inverse multiplexers, and making certain the circuits were up and running. At the same time, I had been working on the Xerox STAR system on base, as well as helping design the new network core running a combination of Vines and Netware over optical links connecting Cabletron devices. We already had a bunch of other networks on base, including some ARCnet, token bus, thicknet, thinnet, and a few other things, so packet switching was definitely already a “thing.”

In the process of installing this PC3 system, I must have said something about how this was such old technology, and packet switching was eventually going to take over the world. In return, I got an earful or two from one of the older techs working on the job with me. “Russ,” he said, “you just don’t understand! Packet switching is going to be great for some specialized environments, but circuit switching has already solve the general purpose cases.”

Now, before you laugh at the old codger, he made a bunch of good points. At that time, we were struggling to get a packet switched network up between seven buildings, and then trying to figure out how to feed the packet switched network into more buildings. The circuit switched network, on the other hand, already had more bandwidth into every building on base than we could figure out how to bring to those seven buildings. Yes, we could push a lot more bandwidth across a couple of rooms, but even scaling bandwidth out to an entire large building was a challenge.

What changed? The ecosystem. A lot of smart people bought into the vision of packet switched networking and spent a lot of time figuring out how to make it do all the things no-one thought it could do, and apply it to problems no-one thought it could apply to. They learned how to take the best pieces of circuit-switched technology and apply it in the packet switched world (remember the history of MPLS).

So before you say “disaggregation does not apply to the enterprise,” remember the lesson of packet switched networks—and the lessons of a million other similar technologies. Disaggregation might not apply in the same way to web- and hyper-scale networks and enterprise networks, but this does not mean it does not apply at all. Do not throw the baby out with the bathwater.

As the disaggregation ecosystem grows—and it will grow—the options will become both broader and deeper. Rather than seeing the world as standards versus open-source, we will need learn to see standards plus open source. Instead of seeing the ecosystem as commercial versus open source, we will need to learn to see commercial plus open source. Instead of seeing protocols on appliances supporting applications, we need to will learn to see hardware and software. As the ecosystem grows, we will learn to learn from many places, including appliance-based networking, the world of servers, application development, and … the business. We will need to directly apply what makes sense and learn wisdom from the rest.

What does this mean for network engineering skills? That is the topic of the third post in this series.

Whither Network Engineering? (Part 1)

An article on successful writers who end up driving delivery trucks. My current reading in epistemology for an upcoming PhD seminar. An article on the bifurcation of network engineering skills. Several conversations on various slacks I participate in. What do these things have in common? Just this:

What is to become of network engineering?

While it seems obvious network engineering is changing, it is not so easy to say how it is changing, and how network engineers can adapt to those changes. To better understand these things, it is good to back up and take in a larger view. A good place to start is to think about how networks are built today.

Networks today are built using an appliance and circuit model. To build a network, an “engineer” (we can argue over the meaning of that word) tries to gauge how much traffic needs to be moved between different points in the business’ geographical space, and then tries to understand the shape of that traffic. Is it layer 2, or layer 3? Which application needs priority over some other application?

Once this set of requirements is drawn up, a long discussion over the right appliances and circuits to purchase to fulfill them. There may be some thought put into the future of the business, and perhaps some slight interaction with the application developers, but, in general, the network is seen pretty much as plumbing. So long as the water glass is filled quickly, and the toilets flush, no-one really cares how it works.

There are many results of building networks this way. First, the appliances tend to be complex devices with many different capabilities. Since a single appliance must serve many different roles for many different customers running many different applications, each appliance must be like a multitool, or those neat kitchen devices you see on television (it slices, it dices, it can even open cans!). While this is neat, it tends to cause technologies to be misapplied, and means each appliance is running tens of millions of lines of code—code very few people understand.

This situation has led, on the one hand, to a desire to simplify. The first way operators are simplifying is to move all their applications to the cloud. Many people see this as replacing just the data center, but this misunderstands the draw of cloud, and why businesses are moving to it. I have heard people say, “oh, there will still be the wide area, and there will still be the campus, even if my company goes entirely to the cloud.” In my opinion, this answer does not effectively grapple with the concept of cloud computing.

If a business desires to divest itself of its network, it will not stop with the data center. 5G, SD-WAN, and edge computing are going to fundamentally change the way campus and WAN are done. If you could place your application in a public cloud service and have the data and application distributed to every remote site without needing a data center, on site equipment, and circuits into each of those remote sites, would you do it? To ask is to know the answer.

If most companies move all their data to cloud service, then the only network engineers who survive will be at those providers, transit providers, and other supporting roles. The catch here is that cloud providers do not treat the network as a separate “thing,” and hence they do not really have “network engineers” in the traditional sense. So in this scenario, the network engineer still changes radically, and there are very few of them around, mostly working for providers of various kinds.

On the other hand, the drive to simplify has led to strongly vertically integrated vendor-based solutions consisting of hardware and software. The easy button, the modern mainframe, or whatever you want to call it. In this case, the network engineer works at the vendor rather than the enterprise. They tend to have very specialized knowledge, and there are few of them.

There is a third option, of course: disaggregation.

In this third option, the company will invest in the network and applications as a single, combined strategic asset. Like a cloud provider or web scaler, these companies will not see the network as a “thing” to be invested in separately. Here there will be engineers of one kind or another, and a blend of things purchased from vendors and things built in-house. They will see the applications through the hardware as a complete system, rather than as an investment in appliances and circuits. Perhaps the following diagram will help.

The left side of this diagram is how we build networks today: appliances connected through the control plane, with network management and applications riding on top. The disaggregated view of the network treats the control plane somewhat like an application, and the operating system like any other operating system. The hardware is fit to task; this does not mean it is a ”commodity,” but rather that the hardware life cycle and tuning is untied from the optimization of the software operating environment. In the disaggregated view, the software stack is fit to the company and its business, rather than to the hardware. This is the crucial difference between the two models.

There are two ways to view the competition between the company that moves to the cloud, the company that moves to black box integrated solutions, and the company that disaggregates. My view is that the companies that move to the cloud, or choose the block box, will only survive if they live in a fairly narrow niche where the data they collect, produce, and rely on is narrow in scope—or rather, not generally usable.

Those companies that try to live in the broader market, and give their data to a cloud provider, or give their IT systems entirely to a vendor, will be eaten. Why do I think this? Because data is the new oil. Data grants and underlies every kind of power that relates to making any sort of money any longer—political power, social power, supply-chain efficiency, and anything else you can name. There are no chemical companies, there are only data companies. This is the new normal, and companies that do not understand this new normal will either need to be in a niche small enough that their data is unique in a way that protects them, or they will be eaten. George Gilder, in Knowledge and Power, is one of the better explanations of this process you can pick up.

If data is at the heart of your business and you either give it to someone else, or you fail to optimize your use of it, you will be at a business disadvantage. That business disadvantage will grow over time until it becomes an economic millstone around the company itself. Can you say Sears? What about Toys-R-Us?

Technology like 5G, edge computing, and cloud, mixed in with the pressure to reduce the complexity of running a network and subsuming it into the larger life of IT, are forming a wrecking ball directed at network engineering as we know it. Which leaves us with the question: whither network engineering?