I remember a time long ago—but then again, everything seems like it was “long ago” to me—when I was flying out to see an operator in a financial district. Someone working with the account asked me what I normally wear… which is some sort of button down and black or grey pants in pretty much any situation. Well, I will put on a sport jacket if I’m teaching in some contexts, but still, the black/grey pants and some sort of button down are pretty much a “uniform” for me. The person working on the account asked me if I could please switch to ragged shorts, a t-shirt, and grow a pony tail because … the folks at the operator would never believe I was an engineer if I dressed to “formal.”
Now I’ve never thought of what I wear as “formal…” it’s just … what I wear. Context, however, is king.
In other situation, I saw a sales engineer go to a store and buy an entirely new outfit because he came to the company’s building wearing a suit and tie … The company in question deals in outdoor gear, and the location was in a small midwestern town, so a suit, well, let’s just say it didn’t “fit.” Again, context is king.
We have—particularly in tech—become accustomed to casual dress codes. Coders, network engineers, and many others just wear whatever whenever. We are expected, in fact, to dress beyond casual—even to be ratty, as in the stories above. Is this really a good thing, though?
As a famous fashion historian has said: “Americans began the 20th century in bustles and bowler hats and ended it in velour sweatsuits and flannel shirts.” I’ve never worn bustles, bowler hats, or velour sweatsuits (do they even make such a thing???), but the flannel shirts I totally relate to. Although… I do only buy flannels that don’t have large-scale checks of any kind. It’s hard to find plain or subtle patterns, but I actively search them out.
The point of this aside about clothes is this—one of the problems with casual clothing is it doesn’t create the kinds of limits, or lines, in your life that more formal clothing creates. If you wear the same clothes all the time, then you have the same mindset all the time. You’re always working, or you’re always playing—whatever it is.
According to Tara Lookabaugh, this is a major cause of stress in our lives. In the “old days,” salesmen wore a three-piece suit at work, then came home to a smoking jacket or perhaps jeans, or something similar. This was not only a matter of preventing dangerous chemicals from entering the home (think of a factory worker), it also helped create a mental barrier between work and home life.
A barrier that no longer exists.
Does removing this barrier increase stress? I think it does. Can it be replaced with some other kind of barrier? This is one reason I have an office in my house—or at least a clearly set-apart apace dedicated to working on the computer. It’s not that I won’t pick my laptop up and go outside on a beautiful day, or even down to the library at the Seminary just for a change of scenery. When I’m home, however, when I cross the “work-space threshold,” I somehow think about work things.
So I’ve attempted to replace the barrier between work and the rest of my world created by clothing with a physical “work-space barrier.” Does this work all the time? Not really, but it’s at least a start in the right direction. Of course, I also only have one computer, which is probably another place where I could create a barrier between work and “other things.”
But creating such a barrier is important for mental health, regardless of how you do it.
What do you think? Do you think “casual dress” has created more stress than it has resolved? Is Tara’s argument right? What other methods are there out there to create this sort of barrier?
If you are looking for a good resolution for 2020 still (I know, it’s a bit late), you can’t go wrong with this one: this year, I will focus on making the networks and products I work on truly simpler. Now, before you pull Tom’s take out on me—
There are those that would say that what we’re doing is just hiding the complexity behind another layer of abstraction, which is a favorite saying of Russ White. I’d argue that we’re not hiding the complexity as much as we’re putting it back where it belongs – out of sight. We don’t need the added complexity for most operations.
Three things: First, complex solutions are always required for hard problems. If you’ve ever listened to me talk about complexity, you’ve probably seen this quote on a slide someplace—
[C]omplexity is most succinctly discussed in terms of functionality and its robustness. Specifically, we argue that complexity in highly organized systems arises primarily from design strategies intended to create robustness to uncertainty in their environments and component parts.
You cannot solve hard problems—complex problems—without complex solutions. In fact, a lot of the complexity we run into in our everyday lives is a result of saying “this is too complex, I’m going to build something simpler.” (here I’m thinking of a blog post I read last year that said “when we were building containers, we looked at routing and realized how complex it was… so we invented something simpler… which, of course, turned out to be more complex than dynamic routing!)
Second, abstraction can be used the right way to manage complexity, and it can be used the wrong way to obfuscate or mask complexity. The second great source of complexity and system failure in our world is we don’t abstract complexity so much as we obfuscate it.
Third, abstraction is not a zero-sum game. If you haven’t found the tradeoffs, you haven’t looked hard enough. This is something expressed through the state/optimization/surface triangle, which you should know at this point.
Returning to the top of this post, the point is this: Using abstraction to manage complexity is fine. Obfuscation of complexity is not. Papering over complexity “just because I can” never solves the problem, any more than sweeping dirt under the rug, or papering over the old paint without bothering to fix the wall first.
We need to go beyond just figuring out how to make the user interface simpler, more “intent-driven,” automated, or whatever it is. We need to think of the network as a system, rather than as a collection of bits and bobs that we’ve thrown together across the years. We need to think about the modules horizontally and vertically, think about how they interact, understand how each piece works, understand how each abstraction leaks, and be able to ask hard questions.
For each module, we need to understand how things work well enough to ask is this the right place to divide these two modules? We should be willing to rethink our abstraction lines, the placement of modules, and how things fit together. Sometimes moving an abstraction point around can greatly simplify a design while increasing optimal behavior. Other times it’s worth it to reduce optimization to build a simpler mouse trap. But you cannot know the answer to this question until you ask it. If you’re sweeping complexity under the rug because… well, that’s where it belongs… then you are doing yourself and the organization you work for a disfavor, plain and simple. Whatever you sweep under the rug of obfuscation will grow and multiply. You don’t want to be around when it crawls back out from under that rug.
For each module, we need to learn how to ask is this the right level and kind of abstraction? We need to learn to ask does the set of functions this module is doing really “hang together,” or is this just a bunch of cruft no-one could figure out what to do with, so they shoved it all in a black box and called it done?
Above all, we need to learn to look at the network as a system. I’ve been harping on this for so long, and yet I still don’t think people understand what I am saying a lot of times. So I guess I’ll just have to keep saying it. 😊
The problems networks are designed to solve are hard—therefore, networks are going to be complex. You cannot eliminate complexity, but you can learn to minimize and control it. Abstraction within a well-thought-out system is a valid and useful way to control complexity and understanding how and where to create modules at the edges of which abstraction can take place is a valid and useful way of controlling complexity.
Don’t obfuscate. Think systemically, think about the tradeoffs, and abstract wisely.
Once the shipping department drops the box off with that new switch, router, or “firewall,” what happens next? You rack it, cable it up, turn it on, and start configuring, right? There are access to controls to configure—SSH, keys, disabling standard accounts, disabling telnet—interface addresses to configure, routing adjacencies to configure, local policies to configure, and… After configuring all of this, you can adjust routing in the network to route around the new device, and then either canary the device “in production” (if you run your network the way it should be run), or find some prearranged maintenance time to bring the new device online and test things out. After all of this, you can leave the new device up and running in the network, and move on to the next task.
Until it breaks.
Then you consult the documentation to remind yourself why it was configured this way, consult the documentation to understand how the application everyone is complaining about not working should work, etc. There are the many hours spent sitting on the console gathering information by running various commands and the output of various logs. Eventually, once you find the problem, you can either replace the right parts, or reconfigure the right bits, and get everything running again.
In the “modern” world (such as it is), we think it’s a huge leap forward to stop configuring devices manually. If we can just automate the configuration of all that “stuff” we have to do at the beginning, after the box is opened and before the device is placed into service, we think we have this whole networking thing pretty well figured out.
Even if you had everything in your network automated, you still haven’t figured this networking thing out.
We need to move beyond automation. Where do we need to move to? It’s not one place, but two. The first is we need to move beyond automation to autonomous operation. As an example, there is a shiny new system that is currently being widely deployed to automate the deployment and management of containers. Part of this system is the automation of connectivity, including routing, between containers. The routing system being deployed as part of this system is essentially statically configured policy-based routing combined with network address translation.
Let me point something out that is not going to be very popular: this is a step backwards in terms of making the system autonomous. Automating static routing information is not a better solution than building a real, dynamic, proactive, autonomic, routing system. It’s not simpler—trust me, I say this as someone who has operated large networks which used automated static routes to do everything.
The “opsification of everything” is neat, but it shouldn’t be our end goal.
Now part of this, I know, is the fault of vendors. Vendors who push EGPs onto data center fabrics because, after all, “the configuration complexity doesn’t matter so long as you can automate it.” The configuration complexity does matter, because configuration complexity belies an underlying protocol complexity, and sets up long and difficult troubleshooting sessions that are completely unnecessary.
The second place we need to move in the networking world? The focus on automation is just another form of focusing on configuration. We abstract the configuration, and we touch a lot more devices at once, but we are still thinking about configuration. The more we think about configuration, the less we think about how the system should work, how it really works, what the gaps are, and how to bridge those gaps. So long as we are focused on the configuration, automated or not, we are not focused on how the network can bring value to the business. The longer we are focused on configuration, the less value we are bringing to the business, and the more likely we are to end up being replaced by … an automated system … no matter how poorly that automated system actually works.
And no, the cloud isn’t going to solve this. Containers aren’t going to solve this. The “automated configuration pattern” is already being repeated in the cloud. As more complex workloads are moved into the cloud, the problems there are only going to get harder. What starts out as a “simple” system using policy-based routing analogs and network address translation configured through an automation server will eventually look complex against the hardest problems we had to solve using T1’s, frame relay circuits, inverse multiplexers, wire down patch panels, and mechanical switch crossbar frames. It’s fun to pretend we don’t need dynamic routing to solve the problems that face the network—at least until you hit hard problems, and have to relearn the lessons of the last 20+ years.
Yes, I know vendors are partly to blame for this. I know that, for a vendor, it’s easier to get people to buy into your CLI, or your entire ecosystem, rather than getting them to think about how to solve the problems your business is handing them.
On the other hand, none of this is going to change from the top down. This is only going to change when the average network engineer starts asking vendors for truly simpler solutions that don’t require reams configuration information. It will change when network engineers get their heads out of the configuration and features, and into the business problems.
We all use the OSI model to describe the way networks work. I have, in fact, included it in just about every presentation, and every book I have written, someplace in the fundamentals of networking. But if you have every looked at the OSI model and had to scratch your head trying to figure out how it really fits with the networks we operate today, or what the OSI model is telling you in terms of troubleshooting, design, or operation—you are not alone. Lots of people have scratched their heads about the OSI model, trying to understand how it fits with modern networking. There is a reason this is so difficult to figure out.
The OSI Model does not accurately describe networks.
What set me off in this particular direction this week is an article over at Errata Security:
The OSI Model was created by international standards organization for an alternative internet that was too complicated to ever work, and which never worked, and which never came to pass. Sure, when they created the OSI Model, the Internet layered model already existed, so they made sure to include today’s Internet as part of their model. But the focus and intent of the OSI’s efforts was on dumb networking concepts that worked differently from the Internet.
This is partly true, and yet a bit … over the top. 🙂 OTOH, the point is well taken: the OSI model is not an ideal model for understanding networks. Maybe a bit of analysis would be helpful in understanding why.
First, while the OSI model was developed with packet switching networks in mind, the general idea was to come as close as possible to emulating the circuit-switched networks widely deployed at the time. A lot of thought had gone into making those circuit-switched networks work, and applications had been built around the way they worked. Applications and circuit-switched networks formed a sort of symbiotic relationship, just as applications form with packet-switched networks today; it was unimaginable, at the time, that “everything would change.”
So while the designers of the OSI model understood the basic value of the packet-switched network, they also understood the value of the circuit-switched network, and tried to find a way to solve both sets of problems in the same network. Experience has shown it is possible to build a somewhat close-to-circuit switched network on top of packet switched networks, but not quite in the way, nor as close to perfect emulation, as those original designers thought. So the OSI model is a bit complex and perhaps overspecified, making it less-than-useful today.
Second, the OSI model largely ignored the role of middleboxes, focusing instead on the stacks implemented and deployed in hosts. This, again, makes sense, as there was no such thing as a device specialized in the switching of packets at the time. Hosts took packets in and processed them. Some packets were sent along to other hosts, other packets were consumed locally. Think PDP-11 with some rough code, rather than even an early Cisco CGS.
Third, the OSI model focuses on what each layer does from the perspective of an application, rather than focusing on what is being done to the data in order to transmit it. The OSI model is built “top down,” rather than “bottom up,” in other words. While this might be really useful if you are an application developer, it is not so useful if you are a network engineer.
So—what should we say about the OSI model?
It was much more useful at some point in the past, when networking was really just “something a host did,” rather than its own sort of sub-field, with specialized protocols, techniques, and designs. It was a very good attempt at sorting out what a network needed to do to move traffic, from the perspective of an application.
What it is not, however, is really all that useful for network engineers working within an engineering specialty to understand how to design protocols, and how to design networks on which those protocols will run. What should we replace it with? I would begin by pointing you to the RINA model, which I think is a better place to start. I’ve written a bit about the RINA model, and used the RINA model as one of the foundational pieces of Computer Networking Problems and Solutions.
Since writing that, however, I have been thinking further about this problem. Over the next six months or so, I plan to build a course around this question. For the moment, I don’t want to spoil the fun, or put any half-backed thoughts out there in the wild.
Until about 2017, the cloud was going to replace all on-premises data centers. As it turns out, however, the cloud has not replaced all on-premises data centers. Why not? Based on the paper under review, one potential answer is because containers in the cloud are still too much like “serverfull” computing. Developers must still create and manage what appear to be virtual machines, including:
- Machine level redundancy, including georedundancy
- Load balancing and request routing
- Scaling up and down based on load
- Monitoring and logging
- System upgrades and security
- Migration to new instances
Serverless solves these problems by placing applications directly onto the cloud, or rather a set of libraries within the cloud.
The authors define serverless by contrasting it with serverfull computing. While software is run based on an event in serverless, software runs until stopped in a cloud environment. While an application does not have a maximum run time in a serverfull environment, there is some maximum set by the provider in a serverless environment. The server instance, operating system, and libraries are all chosen by the user in a serverfull environment, but they are chosen by the provider in a serverless environment. The serverless environment is a higher-level abstraction of compute and storage resources than a cloud instance (or an on-premises solution, even a private cloud).
These differences add up to faster application development in a serverless environment; application developers are completely freed from any system administration tasks to focus entirely on developing and deploying useful software. This should, in theory, free application developers to focus on solving business problems, rather than worrying about any of the infrastructure. Two key points the authors point out in the serverless realm are the complex software techniques used to bring serverless processes up quickly (such as preloading and holding the VM instances that back services), and the security isolation provided through VM level separation.
The authors provide a section on challenges in serverless environments and the workarounds to these challenges. For instance, one problem with real-time video compression is the object store used to communicate between processes running on a serverless infrastructure is too slow to support fine-grained communication, while the functions are too course-grained to support some of the required tasks. To solve this problem, they propose using function-to-function communication, which moves the object store out of the process. This provides dramatic processing speedups, as well as reducing the costs of serverless to a fraction of a cloud instance.
One of the challenges discussed here is the problem of communication patterns, including broadcast, aggregation, and shuffle. Each of these, of course rely on the underlying network to transport data between the compute nodes on which serverless functions are running. Since the serverless user cannot determine where a particular function will run, the performance of the underlying transport is—of course–quite variable. The authors say: “Since the application cannot control the location of the cloud functions, a serverless computing application may need to send two and four orders of magnitude more data than an equivalent VM-based solution.”
And this is where the network sized hole in serverless comes into play. It is common fare today to say the network is “just a commodity.” Speeds are feeds are so high, and so easy to build, that we do not need to worry about building software that knows how to use a network efficiently, or even understands the network at all. That matching network to software requirements is a thing of the past—bandwidth is all a commodity now.
The law of leaky abstractions, however, will always have its say—a corollary here is higher level abstractions will always have larger and more consequential leaks. The solutions offered to each of the challenges listed in the paper are all, in fact, resolved by introducing layering violations which allow the developer to “work around” an inefficiency at some lower layer in the abstraction. Ultimately, such work arounds will compound into massive technical debt, and some “next new thing” will come along to “solve the problems.”
Moving data ultimately still takes time, still takes energy; the network still (often) needs to be tuned to the data being moved. Serverless is a great technology for some solutions—but there is ultimately no way to abstract out the hard work of building an entire system tuned to do a particular task and do it well. When you face abstraction, you should always ask: what is gained, and what is lost?
A long, long time ago, in a galaxy far away, I went to school to learn art and illustration. In those long ago days, folks in my art and illustration classes would sometimes get into a discussion about what, precisely, to do with an art degree. My answer was, ultimately, to turn it into a career building slides and illustrations in the field of network engineering. 😊 And I’m only half joking.
The discussion around the illustration board in those days was whether it was better to become an art teacher, or to focus just on the art and illustration itself. The two sides went at it hammer and tongs over weeks at a time. My only contribution to the discussion was this: even if you want to be the ultimate in the art world, a fine artist, you must still have a subject. While much of modern art might seem to be about nothing much at all, it has always seemed, to me, that art must be about something.
This week I was poking around one of the various places I tend to poke on the ‘net and ran across this collage. Click to see the full image.
Get the point? If you are a coal miner out of work, just learn to code. This struck me as the same sort of argument we used to have in our art and illustration seminars. But what really concerns me is the number of people who are considering leaving network engineering behind because they really believe there is no future in doing this work. They are replacing “learn network engineering” with “learn to code.”
Not to be too snarky, but sure—you do that. Let me know how it goes when you are out there coding… what, precisely? The entire point of coding is to code something useful, not to just run off and build trivial projects you can find in Git and code training classes.
It seems, to me, that if you are an artist, having some in-depth knowledge of something in the real world would make your art have more impact. If you know, for instance, farming, then you can go beyond just getting the light right. If you know farming, then you can paint (or photograph) a farm with much more understanding of what is going on in the images, and hence to capture more than just the light or the mood. You can capture the movement, the flow of work, even the meaning.
In the same way, it seems, to me, that if you are coder, having some in-depth knowledge of something you might build code for would mean your code will have more impact. It might be good to have a useful subject, like maybe… building networks? Contrary to what seems to be popular belief, building a large-scale network is still not a simple thing to do. So long as there are any kind of resource constraints in the design, deployment, and operation of networks, I do not see how it can ever really be an “easy thing to do.”
So yes—learn to code. I will continue to encourage people to learn to code. But I will also continue to encourage folks to learn how to design, build, and operate big networks. All of us cannot sit around and code full time, after all. There are still engineering problems to be solved in other areas, challenges to be tackled, and things to be built.
Coding is a good skill, but understanding how networks really work, how to design them, how to build them, and how to operate them, are also all good skills. Learning to code can multiply your skills as a network engineer. But if you do not have network engineering skills to start with, multiplying them by learning to code is not going to be the most useful exercise.
As long-standing contributor to open standards, and someone trying to become more involved in the open source world (I really need to find an extra ten hours a day!), I am always thinking about these ecosystems, and how the relate to the network engineering world. This article on RedisDB, and in particular this quote, caught my attention—
There’s a longstanding myth in the open-source world that projects are driven by a community of contributors, but in reality, paid developers contribute the bulk of the code in most modern open-source projects, as Puppet founder Luke Kanies explained in our story earlier this year. That money has to come from somewhere.
The point of the article is a lot of companies that support open source projects, like RedisDB, are moving to a more closed source solutions to survive. The cloud providers are called out as a source of a lot of problems in this article, as they consume a lot of open source software, but do not really spend a lot of time or effort in supporting it. Open source, in this situation, becomes a sort of tragedy of the commons, where everyone things someone else is going to do the hard work of making a piece of software viable, so no-one does any of the work. Things are made worse because the open source version of the software is often “good enough” to solve 80% of the problems users need solved, so there is little incentive to purchase anything from the companies that do the bulk of the work in the community.
In some ways this problem relates directly to the concept of disaggregated networking. Of course, as I have said many times before, disaggregation is not directly tied to open source, nor even open standards. Disaggregation is simply seeing the hardware and software as two different things. Open source, in the disaggregated world, provides a set of tools the operator can use as a base for customization in those areas where customization makes sense. Hence open source and commercial solutions compliment one another, rather than one replacing the other.
All that said, how can the open source community continue to thrive if some parts of the market take without giving back? Simply put, it cannot. There are ways, however, of organizing open source projects which encourage participation in the community, even among corporate interests. FR Routing is an example of a project I think is well organized to encourage community participation.
There are two key points to the way FR Routing is organized that I think is helpful in controlling the tragedy of the commons. First, there is not just one company in the world commercializing FR Routing. Rather, there are many different companies using FR Routing, either by shipping it in a commercial product, or by using it internally to build a network (and the network is then sold as a service to the customers of the company). Not every user of FR Routing is using only this one routing stack in their products or networks, either. This first point means there is a lot of participation from different companies that have an interest in seeing the project succeed.
Second, the way FR Routing is structured, no single company can gain control of the entire community. This allows healthy debate on features, code structure, and other issues within the community. There are people involved who supply routing expertise, others who supply deployment expertise, and a large group of coders, as well.
One thing I think the open source world does too often is to tie a single project to a single company, and that company’s support. Linux thrives because there are many different commercial and noncommercial organizations supporting the kernel and different packages that ride on top of the kernel. FR Routing is thriving for the same reason.
Yes, companies need to do better at supporting open source in their realm, not only for their own good, but for the good of the community. Yes, open source plays a vital role in the networking community. I would even argue closed source companies need to learn to work better with open source options in their area of expertise to provide their customers with a wider range of options. This will ultimately only accrue to the good of the companies that take this challenge on, and figure out how to make it work.
On the other side of things, open source is probably not going to solve all the problems in the networking, or any other, industry in the future. And the open source community needs to learn how to build structures around these projects that are both more independent, and more sustainable, over the long run.