The Network Sized Holes in Serverless

Until about 2017, the cloud was going to replace all on-premises data centers. As it turns out, however, the cloud has not replaced all on-premises data centers. Why not? Based on the paper under review, one potential answer is because containers in the cloud are still too much like “serverfull” computing. Developers must still create and manage what appear to be virtual machines, including:

  • Machine level redundancy, including georedundancy
  • Load balancing and request routing
  • Scaling up and down based on load
  • Monitoring and logging
  • System upgrades and security
  • Migration to new instances

Serverless solves these problems by placing applications directly onto the cloud, or rather a set of libraries within the cloud.

Jonas, Eric, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, et al. “Cloud Programming Simplified: A Berkeley View on Serverless Computing.” ArXiv:1902.03383 [Cs], February 9, 2019.

The authors define serverless by contrasting it with serverfull computing. While software is run based on an event in serverless, software runs until stopped in a cloud environment. While an application does not have a maximum run time in a serverfull environment, there is some maximum set by the provider in a serverless environment. The server instance, operating system, and libraries are all chosen by the user in a serverfull environment, but they are chosen by the provider in a serverless environment. The serverless environment is a higher-level abstraction of compute and storage resources than a cloud instance (or an on-premises solution, even a private cloud).

These differences add up to faster application development in a serverless environment; application developers are completely freed from any system administration tasks to focus entirely on developing and deploying useful software. This should, in theory, free application developers to focus on solving business problems, rather than worrying about any of the infrastructure. Two key points the authors point out in the serverless realm are the complex software techniques used to bring serverless processes up quickly (such as preloading and holding the VM instances that back services), and the security isolation provided through VM level separation.

The authors provide a section on challenges in serverless environments and the workarounds to these challenges. For instance, one problem with real-time video compression is the object store used to communicate between processes running on a serverless infrastructure is too slow to support fine-grained communication, while the functions are too course-grained to support some of the required tasks. To solve this problem, they propose using function-to-function communication, which moves the object store out of the process. This provides dramatic processing speedups, as well as reducing the costs of serverless to a fraction of a cloud instance.

One of the challenges discussed here is the problem of communication patterns, including broadcast, aggregation, and shuffle. Each of these, of course rely on the underlying network to transport data between the compute nodes on which serverless functions are running. Since the serverless user cannot determine where a particular function will run, the performance of the underlying transport is—of course–quite variable. The authors say: “Since the application cannot control the location of the cloud functions, a serverless computing application may need to send two and four orders of magnitude more data than an equivalent VM-based solution.”

And this is where the network sized hole in serverless comes into play. It is common fare today to say the network is “just a commodity.” Speeds are feeds are so high, and so easy to build, that we do not need to worry about building software that knows how to use a network efficiently, or even understands the network at all. That matching network to software requirements is a thing of the past—bandwidth is all a commodity now.

The law of leaky abstractions, however, will always have its say—a corollary here is higher level abstractions will always have larger and more consequential leaks. The solutions offered to each of the challenges listed in the paper are all, in fact, resolved by introducing layering violations which allow the developer to “work around” an inefficiency at some lower layer in the abstraction. Ultimately, such work arounds will compound into massive technical debt, and some “next new thing” will come along to “solve the problems.”

Moving data ultimately still takes time, still takes energy; the network still (often) needs to be tuned to the data being moved. Serverless is a great technology for some solutions—but there is ultimately no way to abstract out the hard work of building an entire system tuned to do a particular task and do it well. When you face abstraction, you should always ask: what is gained, and what is lost?

Learn to Code?

A long, long time ago, in a galaxy far away, I went to school to learn art and illustration. In those long ago days, folks in my art and illustration classes would sometimes get into a discussion about what, precisely, to do with an art degree. My answer was, ultimately, to turn it into a career building slides and illustrations in the field of network engineering. 😊 And I’m only half joking.

The discussion around the illustration board in those days was whether it was better to become an art teacher, or to focus just on the art and illustration itself. The two sides went at it hammer and tongs over weeks at a time. My only contribution to the discussion was this: even if you want to be the ultimate in the art world, a fine artist, you must still have a subject. While much of modern art might seem to be about nothing much at all, it has always seemed, to me, that art must be about something.

This week I was poking around one of the various places I tend to poke on the ‘net and ran across this collage. Click to see the full image.

Get the point? If you are a coal miner out of work, just learn to code. This struck me as the same sort of argument we used to have in our art and illustration seminars. But what really concerns me is the number of people who are considering leaving network engineering behind because they really believe there is no future in doing this work. They are replacing “learn network engineering” with “learn to code.”

Not to be too snarky, but sure—you do that. Let me know how it goes when you are out there coding… what, precisely? The entire point of coding is to code something useful, not to just run off and build trivial projects you can find in Git and code training classes.

It seems, to me, that if you are an artist, having some in-depth knowledge of something in the real world would make your art have more impact. If you know, for instance, farming, then you can go beyond just getting the light right. If you know farming, then you can paint (or photograph) a farm with much more understanding of what is going on in the images, and hence to capture more than just the light or the mood. You can capture the movement, the flow of work, even the meaning.

In the same way, it seems, to me, that if you are coder, having some in-depth knowledge of something you might build code for would mean your code will have more impact. It might be good to have a useful subject, like maybe… building networks? Contrary to what seems to be popular belief, building a large-scale network is still not a simple thing to do. So long as there are any kind of resource constraints in the design, deployment, and operation of networks, I do not see how it can ever really be an “easy thing to do.”

So yes—learn to code. I will continue to encourage people to learn to code. But I will also continue to encourage folks to learn how to design, build, and operate big networks. All of us cannot sit around and code full time, after all. There are still engineering problems to be solved in other areas, challenges to be tackled, and things to be built.

Coding is a good skill, but understanding how networks really work, how to design them, how to build them, and how to operate them, are also all good skills. Learning to code can multiply your skills as a network engineer. But if you do not have network engineering skills to start with, multiplying them by learning to code is not going to be the most useful exercise.

Reaction: Open Source

As long-standing contributor to open standards, and someone trying to become more involved in the open source world (I really need to find an extra ten hours a day!), I am always thinking about these ecosystems, and how the relate to the network engineering world. This article on RedisDB, and in particular this quote, caught my attention—

There’s a longstanding myth in the open-source world that projects are driven by a community of contributors, but in reality, paid developers contribute the bulk of the code in most modern open-source projects, as Puppet founder Luke Kanies explained in our story earlier this year. That money has to come from somewhere.

The point of the article is a lot of companies that support open source projects, like RedisDB, are moving to a more closed source solutions to survive. The cloud providers are called out as a source of a lot of problems in this article, as they consume a lot of open source software, but do not really spend a lot of time or effort in supporting it. Open source, in this situation, becomes a sort of tragedy of the commons, where everyone things someone else is going to do the hard work of making a piece of software viable, so no-one does any of the work. Things are made worse because the open source version of the software is often “good enough” to solve 80% of the problems users need solved, so there is little incentive to purchase anything from the companies that do the bulk of the work in the community.

In some ways this problem relates directly to the concept of disaggregated networking. Of course, as I have said many times before, disaggregation is not directly tied to open source, nor even open standards. Disaggregation is simply seeing the hardware and software as two different things. Open source, in the disaggregated world, provides a set of tools the operator can use as a base for customization in those areas where customization makes sense. Hence open source and commercial solutions compliment one another, rather than one replacing the other.

All that said, how can the open source community continue to thrive if some parts of the market take without giving back? Simply put, it cannot. There are ways, however, of organizing open source projects which encourage participation in the community, even among corporate interests. FR Routing is an example of a project I think is well organized to encourage community participation.

There are two key points to the way FR Routing is organized that I think is helpful in controlling the tragedy of the commons. First, there is not just one company in the world commercializing FR Routing. Rather, there are many different companies using FR Routing, either by shipping it in a commercial product, or by using it internally to build a network (and the network is then sold as a service to the customers of the company). Not every user of FR Routing is using only this one routing stack in their products or networks, either. This first point means there is a lot of participation from different companies that have an interest in seeing the project succeed.

Second, the way FR Routing is structured, no single company can gain control of the entire community. This allows healthy debate on features, code structure, and other issues within the community. There are people involved who supply routing expertise, others who supply deployment expertise, and a large group of coders, as well.

One thing I think the open source world does too often is to tie a single project to a single company, and that company’s support. Linux thrives because there are many different commercial and noncommercial organizations supporting the kernel and different packages that ride on top of the kernel. FR Routing is thriving for the same reason.

Yes, companies need to do better at supporting open source in their realm, not only for their own good, but for the good of the community. Yes, open source plays a vital role in the networking community. I would even argue closed source companies need to learn to work better with open source options in their area of expertise to provide their customers with a wider range of options. This will ultimately only accrue to the good of the companies that take this challenge on, and figure out how to make it work.

On the other side of things, open source is probably not going to solve all the problems in the networking, or any other, industry in the future. And the open source community needs to learn how to build structures around these projects that are both more independent, and more sustainable, over the long run.

Whither Network Engineering? (Part 3)

In the previous two parts of this series, I have looked at the reasons I think the networking ecosystem is bound to change and why I think disaggregation is going to play a major role in that change. If I am right about the changes happening, what will become of network engineers? The bifurcation of knowledge, combined with the kinds of networks and companies noted in the previous posts in this series, point the way. There will, I think, be three distinct careers where the current “network engineer” currently exists on the operational side:

  1. Moving up the stack, towards business, the more management role. This will be captured primarily by the companies that operate in market verticals deep and narrow enough to survive without a strong focus on data, and hence can survive a transition to black box, fully integrated solutions. This position will largely be focused on deploying, integrating, and automating vertically integrated, vendor-driven systems and managing vendor relationships.
  2. Moving up the stack, towards software and business, the disaggregated network engineering role (I don’t have a better name for this presently). This will be in support of companies that value data to the point of focusing on its management as a separate “thing.” The network will no longer be a “separate line item,” however, but rather part of a larger system revolving around the data that makes the company “go.”
  3. Moving down the stack, towards the hardware, the network hardware, rack-and-stack, cabling, power, etc., engineer. Again, I do not have a good name for this role right now.

There will still be a fairly strong “soft division” between design and troubleshooting in the second role. Troubleshooting will primarily be handled by the vendor in the first role.

Perhaps the diagram below will help illustrate what I think is happening, and will continue to happen, in the network engineering field.

The old network engineering role, shown in the lower left corner of the two halves of the illustration, focused on the appliances and circuits used to build networks, with some portion of the job interacting with protocols and management tools. The goal is to provide the movement of data as a service, with minimal regard to the business value of that data. This role will, in my opinion, transition to the entire left side of the illustration as a company moves to black box solutions. The real value offered in this new role will be in managing the contracts and vendors used to supply what is essentially a commodity.

On the right side is what I think the disaggregated path looks like. Here the network engineering role has largely moved away from hardware; this will increasingly become a largely specialized vendor driven realm of work. On the other end, the network engineer will focused more on software, from protocols to applications, and how they drive and add value to the business. Again, the role will need to move up the stack towards the business to continue adding value; away from hardware, and towards software.

I could well be wrong. I would not be happy or sad if I am right or wrong.

None of these are invalid choices to make, or bad roles to fill. I do not know what role fits “you” best, your life, nor your interests. I am simply observing what I think is happening in the market, and trying to understand where things are going, because I think this kind of thinking helps provide clarity in a confusing world.

In both the first and second roles, you must move up the stack to add value. This is what happened in the worlds of electronic engineering and personal computers as they both disaggregated away from an appliance model. Living through these past experiences is part of what leads me to believe this same kind of movement will happen in the world of networking technology. Further, I think I already see these changes happening in parts of the market, and I cannot see any reason these kinds of changes should not move throughout the entire market fairly rapidly.

What is the percentage of these two roles in the market? Some people think the second role will simply not exist, in fact, other than at vendors. Others think the second role will be a vanishingly small part of the market. I tend to think the percentages will be more balanced because of shifts in the business environment that is happening in parallel with (or rather driving) these changes. Ultimately, however, the number of people in each role will driven by the business environment, rather than the networking world.

Will there be “network engineers” in the future?

If we look at the progress of time from left to right, there is a big bulge ahead, followed by a slope off, and then a long tail. This is my understanding of the current network engineering skill set. We are at A as I write this, just before the big bulge of radical change at B, and I think much farther along than many others believe. At C, there will still be network engineers in the mold of the network engineers of today. They will be valiantly deploying appliance based networks for those companies who have a vertical niche deep enough to survive. There will be vendors still supporting these companies and engineers, too. There will just be a very few of them. Like COBOL and FORTRAN coders today, they will live on the long tail of demand. I suspect a number of the folks who live in this long tail will even consider themselves the “real legacy” of network engineering, while seeing the rest of the network operations and engineering market is more of “software engineers” and “administrators.”

That’s all fine by me; I just know I’d rather be in the bubble of demand than the long tail. 🙂

What should I do as a network engineer? This is the tricky question.

First, I cannot tell you which path to take of the ones I have presented. I cannot, in fact, tell you precisely what these roles are going to look like, nor whether there will be other roles available. For instance, I have not discussed what I think vendors look like after this change at all; there will be some similar roles, and some different ones, in that world.

Second, all the roles I’ve described (other than the hardware focused role) involve moving up the stack into a more software and business focus. This means that to move into these roles, you need to gain some business acumen and some software skills. If this is all correct, then now is the time to gain those skills, rather than later. I intend to post more on these topics in the future, so watch this space.

Third, don’t be fatalistic about any of this. I hear a lot of people say things like “I don’t have any influence over the market or my company.” Wrong. Rather than throwing our hands up in frustration and waiting for our fates (or heads) to be handed to us on a silver platter, I want to suggest a way forward. I know that none of us can entirely control the future—my worldview does not allot the kind of radical freedom this would entail to individual humans. At the same time, I am not a fatalist, and I tend to get frustrated with people who argue they have no control, so we should just “sit back, relax, and enjoy the ride.” We have freedom to do different things in the future within the context and guard rails set by our past decisions (and other things outside the scope of a technical blog).

My suggestion is this: take a hard look at what I have written here, decide for yourself where you think I am right and where I am wrong, and make career decisions based on what you think is going to happen. I have seen multiple people end up at age 50 or 60 with a desire to work, and yet with no job. I cannot tell you what percentage of any particular person’s situation is because of ageism, declining skills, or just being in the wrong place at the wrong time (I tend to think all three play a different role in every person’s situation). On the other hand, if you focus on what you can change—your skills, attitude, and position—and stop worrying so much about the things you cannot change, you will be a happier person.

Fourth, this fatalism stretches to the company you work for, and anyplace you might work in the future. There is a strong belief that network engineers cannot influence business leadership. Let me turn this around: If you stop talking about chipsets and optical transceivers, and start talking about the value of data and how the company needs to think about that value, then you might get a seat at the table when these discussions are taking place. You are not helpless here; if you learn how to talk to the business, there is at least some chance (depending on the company, of course) that you can shape the future of the company you work for. If nothing else, you can use your thinking in this area to help you decide where you want to work next.

Now, let’s talk about some risk factors. While these trends seem strong to me, it is still worth asking: what could take things in a different direction? One thing that would certainly change the outlook would be a major economic crash or failure like the Great Depression. This might seem unthinkable to most people, but more than a few of the thinkers I follow in the economic and political realms are suggesting this kind of thing is possible. If this happens, companies will be holding things together with tin cans, bailing wire, and duct tape; in this case, all bets are off. Another could be the collapse of the entire disaggregation ecosystem. Perhaps another could be someone discovering how to break the State/Optimization/Surface triad, or somehow beat CAP theorem.

There is also the possibility that people, at large, will reject the data driven economy that is developing, intentionally moving back to a more personally focused world with local shopping, and offline friends rather than online. I would personally support such a thing, but but while I think such a move could happen, I do not see it impacting every area of life. The “buy local” mantra is largely focused on bookstores, food, and some other areas. Notice this, however: if “buy local” is really what it means, then it means buying from locally owned stores, rather than shifting from an online retailer to a large chain mixed online/offline retailer. Buy local is not a panacea for appliance based network engineering, and may even help drive the changes I see ahead.

So there you have it: in this first week of 2019, this is what I think is going to happen in the world of networking technology. I could be way wrong, and I am sticking my neck out a good bit in publishing this little series.

As always, this is more of a two-way conversation than you imagine. I read the comments here and on LinkedIn, and even (sometimes) on Twitter, so tell me what you think the network future of network engineering will be. I am not so old, and certain of myself, that I cannot learn new things! 🙂

Whither Network Engineering? (Part 2)

In the first post of this series at the turn of 2019, I considered the forces I think will cause network engineering to radically change. What about the timing of these changes? I hear a lot of people say” “this stuff isn’t coming for twenty years or more, so don’t worry about it… there is plenty of time to adapt.” This optimism seems completely misplaced to me. Markets and ideas are like that old house you pass all the time—you know the one. No-one has maintained it for years, but it is so … solid. It was built out of the best timber, by people who knew what they were doing. The foundation is deep, and it has lasted all these years.

Then one day you pass a heap of wood on the side of the road and realize—this is that old house that seemed so solid just a few days ago. Sometime in the night, that house that was so solid collapsed. The outer shell was covering up a lot of inner rot. Kuhn, in The Structure of Scientific Revolutions, argues this is the way ideas always go. They appear to be solid one day, and then all the supports that looked so solid just moments before the collapse are all shown to be full of termites. The entire system of theories collapses in what seems like a moment compared to the amount of time the theory has stood. History has borne this way of looking at things out.

The point is: we could wake up in five years’ time and find the entire structure of the network engineering market has changed while we were asleep at the console running traceroute. I hear a lot of people talk about how it will take tens of years for any real change to take place because some class of businesses (usually the enterprise) do not take up new things very quickly. This line of thinking assumes the structure of business will remain the same—I think this tends to underestimate the symbiotic relationship between business and information technology. We have become so accustomed to seeing IT as a cost center that has little bearing on the overall business that it is hard to shift our thinking to the new realities that are starting to take hold.

While some niche retailers are doing okay, most of the the broad-based ones are in real trouble. Shopping malls are like ghost towns, bookstores are closing in droves; even grocery stores are struggling in many areas. This is not about second-day delivery—this is about data. Companies must either be in a deep niche or learn to work with data to survive. Companies that can most effectively combine and use data to anticipate, adapt to, and influence consumer behavior will survive. The rest will not.

Let me give some examples that might help. Consider Oak Island Hardware, a local hardware store, Home Depot, Sears, and Amazon. First, there are two kinds of businesses here; while all four have products that overlap, they service two different kinds of needs. In the one case, Home Depot and Oak Island Hardware cater to geographically localized wants where physical presence counts. When your plumbing starts to leak, you don’t have time to wait for next-day delivery. If you are in the middle of rebuilding a wall or a cabinet and you need another box of nails, you are not waiting for a delivery. You will get in your car and drive to the nearest place that sells such things. To some degree, Oak Island Hardware and Home Depot are in a separate kind of market than Sears and Amazon.

Consider Sears and Amazon as a pair first. Amazon internalized its data handling, and builds semi-custom solutions to support that data handling. Sears tried to focus on local stores, inventory management, and other traditional methods of retail. Sears is gone, Amazon remains. So Home Depot and Oak Island Hardware have a “niche” that protects them (to some degree) from the ravages of the data focused world. Now consider Oak Island Hardware versus Home Depot. Here the niche is primarily geographical—there just is not enough room on Oak Island to build a Home Depot. When people need a box of nails “now,” they will often choose the closer store to get those nails.

On the other hand, what kind of IT needs does a stand-alone store like Oak Island Hardware have? I do not think they will be directly hiring any network engineers in the near future. Instead, they will be purchasing IT services in the form of cloud-based applications. These cloud-based applications, in turn, will be hosted on … disaggregated stacks run by providers.

The companies in the broader markets that are doing well have have built fully- or semi-customized systems to handle data efficiently. The network is no longer treated as a “thing” to be built; it is just another part of a larger data delivery system. Ultimately, businesses in broader markets that want to survive need to shift their thinking to data. The most efficient way to do this is to shift to a disaggregated, layered model similar to the one the web- and hyper-scalers have moved to.

I can hear you out there now, reading this, saying: “No! They can’t do this! The average IT shop doesn’t have the skilled people, the vision, the leadership, the… The web- and hyper-scalers have specialized systems built for a single purpose! This stuff doesn’t apply to enterprise networks!”

In answer to this plethora of objections, let me tell you a story.

Once, a long time ago, I was sent off to work on installing a project called PC3; a new US Air Force personnel management system. My job was primarily on the network side of the house, running physical circuits through the on-base systems, installing inverse multiplexers, and making certain the circuits were up and running. At the same time, I had been working on the Xerox STAR system on base, as well as helping design the new network core running a combination of Vines and Netware over optical links connecting Cabletron devices. We already had a bunch of other networks on base, including some ARCnet, token bus, thicknet, thinnet, and a few other things, so packet switching was definitely already a “thing.”

In the process of installing this PC3 system, I must have said something about how this was such old technology, and packet switching was eventually going to take over the world. In return, I got an earful or two from one of the older techs working on the job with me. “Russ,” he said, “you just don’t understand! Packet switching is going to be great for some specialized environments, but circuit switching has already solve the general purpose cases.”

Now, before you laugh at the old codger, he made a bunch of good points. At that time, we were struggling to get a packet switched network up between seven buildings, and then trying to figure out how to feed the packet switched network into more buildings. The circuit switched network, on the other hand, already had more bandwidth into every building on base than we could figure out how to bring to those seven buildings. Yes, we could push a lot more bandwidth across a couple of rooms, but even scaling bandwidth out to an entire large building was a challenge.

What changed? The ecosystem. A lot of smart people bought into the vision of packet switched networking and spent a lot of time figuring out how to make it do all the things no-one thought it could do, and apply it to problems no-one thought it could apply to. They learned how to take the best pieces of circuit-switched technology and apply it in the packet switched world (remember the history of MPLS).

So before you say “disaggregation does not apply to the enterprise,” remember the lesson of packet switched networks—and the lessons of a million other similar technologies. Disaggregation might not apply in the same way to web- and hyper-scale networks and enterprise networks, but this does not mean it does not apply at all. Do not throw the baby out with the bathwater.

As the disaggregation ecosystem grows—and it will grow—the options will become both broader and deeper. Rather than seeing the world as standards versus open-source, we will need learn to see standards plus open source. Instead of seeing the ecosystem as commercial versus open source, we will need to learn to see commercial plus open source. Instead of seeing protocols on appliances supporting applications, we need to will learn to see hardware and software. As the ecosystem grows, we will learn to learn from many places, including appliance-based networking, the world of servers, application development, and … the business. We will need to directly apply what makes sense and learn wisdom from the rest.

What does this mean for network engineering skills? That is the topic of the third post in this series.

Whither Network Engineering? (Part 1)

An article on successful writers who end up driving delivery trucks. My current reading in epistemology for an upcoming PhD seminar. An article on the bifurcation of network engineering skills. Several conversations on various slacks I participate in. What do these things have in common? Just this:

What is to become of network engineering?

While it seems obvious network engineering is changing, it is not so easy to say how it is changing, and how network engineers can adapt to those changes. To better understand these things, it is good to back up and take in a larger view. A good place to start is to think about how networks are built today.

Networks today are built using an appliance and circuit model. To build a network, an “engineer” (we can argue over the meaning of that word) tries to gauge how much traffic needs to be moved between different points in the business’ geographical space, and then tries to understand the shape of that traffic. Is it layer 2, or layer 3? Which application needs priority over some other application?

Once this set of requirements is drawn up, a long discussion over the right appliances and circuits to purchase to fulfill them. There may be some thought put into the future of the business, and perhaps some slight interaction with the application developers, but, in general, the network is seen pretty much as plumbing. So long as the water glass is filled quickly, and the toilets flush, no-one really cares how it works.

There are many results of building networks this way. First, the appliances tend to be complex devices with many different capabilities. Since a single appliance must serve many different roles for many different customers running many different applications, each appliance must be like a multitool, or those neat kitchen devices you see on television (it slices, it dices, it can even open cans!). While this is neat, it tends to cause technologies to be misapplied, and means each appliance is running tens of millions of lines of code—code very few people understand.

This situation has led, on the one hand, to a desire to simplify. The first way operators are simplifying is to move all their applications to the cloud. Many people see this as replacing just the data center, but this misunderstands the draw of cloud, and why businesses are moving to it. I have heard people say, “oh, there will still be the wide area, and there will still be the campus, even if my company goes entirely to the cloud.” In my opinion, this answer does not effectively grapple with the concept of cloud computing.

If a business desires to divest itself of its network, it will not stop with the data center. 5G, SD-WAN, and edge computing are going to fundamentally change the way campus and WAN are done. If you could place your application in a public cloud service and have the data and application distributed to every remote site without needing a data center, on site equipment, and circuits into each of those remote sites, would you do it? To ask is to know the answer.

If most companies move all their data to cloud service, then the only network engineers who survive will be at those providers, transit providers, and other supporting roles. The catch here is that cloud providers do not treat the network as a separate “thing,” and hence they do not really have “network engineers” in the traditional sense. So in this scenario, the network engineer still changes radically, and there are very few of them around, mostly working for providers of various kinds.

On the other hand, the drive to simplify has led to strongly vertically integrated vendor-based solutions consisting of hardware and software. The easy button, the modern mainframe, or whatever you want to call it. In this case, the network engineer works at the vendor rather than the enterprise. They tend to have very specialized knowledge, and there are few of them.

There is a third option, of course: disaggregation.

In this third option, the company will invest in the network and applications as a single, combined strategic asset. Like a cloud provider or web scaler, these companies will not see the network as a “thing” to be invested in separately. Here there will be engineers of one kind or another, and a blend of things purchased from vendors and things built in-house. They will see the applications through the hardware as a complete system, rather than as an investment in appliances and circuits. Perhaps the following diagram will help.

The left side of this diagram is how we build networks today: appliances connected through the control plane, with network management and applications riding on top. The disaggregated view of the network treats the control plane somewhat like an application, and the operating system like any other operating system. The hardware is fit to task; this does not mean it is a ”commodity,” but rather that the hardware life cycle and tuning is untied from the optimization of the software operating environment. In the disaggregated view, the software stack is fit to the company and its business, rather than to the hardware. This is the crucial difference between the two models.

There are two ways to view the competition between the company that moves to the cloud, the company that moves to black box integrated solutions, and the company that disaggregates. My view is that the companies that move to the cloud, or choose the block box, will only survive if they live in a fairly narrow niche where the data they collect, produce, and rely on is narrow in scope—or rather, not generally usable.

Those companies that try to live in the broader market, and give their data to a cloud provider, or give their IT systems entirely to a vendor, will be eaten. Why do I think this? Because data is the new oil. Data grants and underlies every kind of power that relates to making any sort of money any longer—political power, social power, supply-chain efficiency, and anything else you can name. There are no chemical companies, there are only data companies. This is the new normal, and companies that do not understand this new normal will either need to be in a niche small enough that their data is unique in a way that protects them, or they will be eaten. George Gilder, in Knowledge and Power, is one of the better explanations of this process you can pick up.

If data is at the heart of your business and you either give it to someone else, or you fail to optimize your use of it, you will be at a business disadvantage. That business disadvantage will grow over time until it becomes an economic millstone around the company itself. Can you say Sears? What about Toys-R-Us?

Technology like 5G, edge computing, and cloud, mixed in with the pressure to reduce the complexity of running a network and subsuming it into the larger life of IT, are forming a wrecking ball directed at network engineering as we know it. Which leaves us with the question: whither network engineering?

Ossification and Fragmentation: The Once and Future ‘net

Mostafa Ammar, out of Georgia Tech (not my alma mater, but many of my engineering family are alumni there), recently posted an interesting paper titled The Service-Infrastructure Cycle, Ossification, and the Fragmentation of the Internet. I have argued elsewhere that we are seeing the fragmentation of the global Internet into multiple smaller pieces, primarily based on the centralization of content hosting combined with the rational economic decisions of the large-scale hosting services. The paper in hand takes a slightly different path to reach the same conclusion.

cross posted at CircleID


  • Networks are built based on a cycle of infrastructure modifications to support services
  • When new services are added, pressure builds to redesign the network to support these new services
  • Networks can ossify over time so they cannot be easily modified to support new services
  • This causes pressure, and eventually a more radical change, such as the fracturing of the network

The author begins by noting networks are designed to provide a set of services. Each design paradigm not only supports the services it was designed for, but also allows for some headroom, which allows users to deploy new, unanticipated services. Over time, as newer services are deployed, the requirements on the network change enough that the network must be redesigned.
This cycle, the service-infrastructure cycle, relies on a well-known process of deploying something that is “good enough,” which allows early feedback on what does and does not work, followed by quick refinement until the protocols and general design can support the services placed on the network. As an example, the author cites the deployment of unicast routing protocols. He marks the beginning of this process as 1962, when Prosser was first deployed, and then as 1995, when BGPv4 was deployed. Across this time routing protocols were invented, deployed, and revised rapidly. Since around 1995, however—a period of over 20 years at this point—routing has not changed all that much. So there were around 35 years of rapid development, followed by what is now over 20 years of stability in the routing realm.

Ossification, for those not familiar with the term, is a form of hardening. Petrified wood is an ossified form of wood. An interesting property of petrified wood is that is it fragile; if you pound a piece of “natural” wood with a hammer, it dents, but does not shatter. Petrified, or ossified, wood shatters, like glass.

Multicast routing is held up as an opposite example. Based on experience with unicast routing, the designers of multicast attempted to “anticipate” the use cases, such that early iterations were clumsy, and failed to attain the kinds of deployment required to get the cycle of infrastructure and services started. Hence multicast routing has largely failed. In other words, multicast ossified too soon; the cycle of experience and experiment was cut short by the designers trying to anticipate use cases, rather than allowing them to grow over time.

Some further examples might be:

  • IETF drafts and RFCs were once short, and used few technical terms, in the sense of a term defined explicitly within the context of the RFC or system. Today RFCs are veritable books, and require a small dictionary to read.
  • BGP security, which is mentioned by the author as a victim of ossification, is actually another example of early ossification destroying the experiment/enhancement cycle. Early on, a group of researchers devised the “perfect” BGP security system (which is actually by no means perfect—it causes as many security problems as it resolves), and refused to budge once “perfection” had been reached. For the last twenty years, BGP security has not notably improved; the cycle of trying and changing things has been stopped this entire time.

There are also weaknesses in this argument, as well. It can be argued that the reason for the failure of widespread multicast is because the content just wasn’t there when multicast was first considered—in fact, that multicast content still is not what people really want. The first “killer app” for multicast was replacing broadcast television over the Internet. What has developed instead is video on demand; multicast is just not compelling when everyone is watching something different whenever they want to.

The solution to this problem is novel: break the Internet up. Or rather, allow it to break up. The creation of a single network from many networks was a major milestone in the world of networking, allowing the open creation of new applications. If the Internet were not ossified through business relationships and the impossibility of making major changes in the protocols and infrastructure, it would be possible to undertake radical changes to support new challenges.

The new challenges offered include IoT, the need for content providers to have greater control over the quality of data transmission, and the unique service demands of new applications, particularly gaming. The result has been the flattening of the Internet, followed by the emergence of bypass networks—ultimately leading to the fragmentation of the Internet into many different networks.

Is the author correct? It seems the Internet is, in fact, becoming a group of networks loosely connected through IXPs and some transit providers. What will the impact be on network engineers? One likely result is deeper specialization in sets of technologies—the “enterprise/provider” divide that had almost disappeared in the last ten years may well show up as a divide between different kinds of providers. For operators who run a network that indirectly supports some other business goal (what we might call “enterprise”), the result will be a wide array of different ways of thinking about networks, and an expansion of technologies.

But one lesson engineers can certainly take away is this: the concept of agile must reach beyond the coding realm, and into the networking realm. There must be room “built in” to experiment, deploy, and enhance technologies over time. This means accepting and managing risk rather than avoiding it, and having a deeper understanding of how networks work and why they work that way, rather than the blind focus on configuration and deployment we currently teach.