The word on the street is that everyone—especially network engineers—must learn to code. A conversation with a friend and an article passing through my RSS reader brought this to mind once again—so once more into the breach. Part of the problem here is that we seem to have a knack for asking the wrong question. When we look at network engineer skill sets, we often think about the ability to configure a protocol or set of features, and then the ability to quickly troubleshoot those protocols or features using a set of commands or techniques.
This is, in some sense, what various certifications have taught us—we have reached the expert level when we can configure a network quickly, or when we can prove we understand a product line. There is, by the way, a point of truth in this. If you claim your expertise is with a particular vendor’s gear, then it is true that you must be able to configure and troubleshoot on that vendor’s gear to be an expert. There is also a problem of how to test for networking skills without actually implementing something, and how to implement things without actually configuring them. This is a problem we are discussing in the new “certification” I’ve been working on, as well.
This is also, in some sense, what the hiring processes we use have taught us. Computers like to classify things in clear and definite ways. The only clear and definite way to classify networking skills is by asking questions like “what protocols do you understand how to configure and troubleshoot?” It is, it seems, nearly impossible to test design or communication skills in a way that can be easily placed on a resume.
Coding, I think, is one of those skills that is easy to appear to measure accurately, and it’s also something the entire world insists is the “coming thing.” No coding skills, no job. So it’s easy to ask the easy question—what languages do you know, how many lines of code have you written, etc. But again, this is the wrong question (or these are the wrong questions).
What is the right question? In terms of coding skills, more along the lines of something like, “do you know how to build and use tools to solve business problems?” I phrase it this way because the one thing I have noticed about every really good coder I have known is they all spend as much time building tools as they do building shipping products. They build tools to test their code, or to modify the code they’ve already written en masse, etc. In fact, the excellent coders I know treat functions like tools—if they have to drive a nail twice, they stop and create a hammer rather than repeating the exercise with some other tool.
So why is coding such an important skill to gain and maintain for the network engineer? This paragraph seems to sum it up nicely for me—
“Coding is not the fundamental skill,” writes startup founder and ex-Microsoft program manager Chris Granger. What matters, he argues, is being able to model problems and use computers to solve them. ”We don’t want a generation of people forced to care about Unicode and UI toolkits. We want a generation of writers, biologists, and accountants that can leverage computers.”
It’s not the coding that matters, it’s “being able to model problems and use computers to solve them.” This is the essence of tool building or engineering—seeing the problem, understanding the problem, and then thinking through (sometimes by trial and error) how to build a tool that will solve the problem in a consistent, easy to manage way. I fear that network engineers are taking their attitude of configuring things and automating it to make the configuration and troubleshooting faster. We seem to end up asking “how do I solve the problem of making the configuration of this network faster,” rather than asking “what business problem am I trying to solve?”
To make effective use of the coding skills we’re telling everyone to learn, we need to go back to basics and understand the problems we’re trying to solve—and the set of possible solutions we can use to solve those problems. Seen this way, the routing protocol becomes “just another tool,” just like a function call, that can be used to solve a specific set of problems—instead of a set of configuration lines that we invoke like a magic incantation to make things happen.
Coding skills are important—but they require the right mindset if we’re going to really gain the sorts of efficiencies we think are possible.
From time immemorial, humor has served to capture truth. This is no different in the world of computer networks. A notable example of using humor to capture truth is the April 1 RFC series published by the IETF. RFC1925, The Twelve Networking Truths, will serve as our guide.
According to RFC1925, the first fundamental truth of networking is: it has to work. While this might seem to be overly simplistic, it has proven—over the years—to be much more difficult to implement in real life than it looks like in a slide deck. Those with extensive experience with failures, however, can often make a better guess at what is possible to make work than those without such experience. The good news, however, is the experience of failure can be shared, especially through self-deprecating humor.
Consider RFC748, which is the first April First RFC published by the IETF, the TELNET RANDOMLY-LOSE Option. This RFC describes a set of additional signals in the TELNET protocol (for those too young to remember, TELNET is what people used to communicate with hosts before SSH and web browsers!) that instruct the server not to provide random losses through such things as “system crashes, lost data, incorrectly functioning programs, etc., as part of their services.” The RFC notes that many systems apparently have undocumented features that provide such losses, frustrating users and system administrators. The option proposed would instruct the server to disable features which cause these random losses.
Lesson learned? Although one of the general rules of application design is the network is not reliable, the counter rule suggested by RFC748 is the application is not reliable, either. This a key point in the race to Mean Time to Innocence (MTTI). RFC1882, published a few years after RFC748, is a veritable guidebook for finding problems in a network, including transceiver failures, databases with broken b-trees, unterminated contacts, and a plethora of other places to look. Published just before Christmas, RFC1882 is an ideal guide for those who want to spend time with their families during the most festive times of the year.
Another common problem in large-scale networks is services that want to choose to operate from the safety and security of an anonymous connection. RFC6593 describes the Doman Pseudonym System, specifically designed to support services that do not wish to be discovered. The specification describes two parties to the protocol, the first being the seeker, or “it,” and the second being the service which is attempting to hide from it. The process used is for the seeker to send a transmission declaring the beginning of the search sequence called the “ready or not,” followed by a countdown during which “it” is not allowed to peek at a list of available services. During this countdown, the service may change its name or location, although it will be penalized if discovered doing so. This Domain Pseudonym System is the perfect counterpart to the Domain Name System normally used to discover services on large-scale networks, as shown by the many networks that already deploy such a hide-and-seek method to managing services.
What if all the above guidance for network operators fails, and you are stuck troubleshooting a problem? RFC2321 has an answer to this problem: RITA — The Reliable Internetwork Troubleshooting Agent. The typical RITA is described as 51.25cm in length, and yellow/orange in color. The first test the operator can perform with the RITA is placing it on the documentation for the suspect system, or on top of the suspect system itself. If the RITA eventually flies away, there is a greater than 90% chance there is a defect in the system tested. The odds of the defects in the tested system being the root cause of the problem the operator is currently troubleshooting is not guaranteed, however. The RITA has such a high success rate because it is believed that 100% of systems in operation do, in fact, contain defects. The 10% failure rate primarily occurs in cases where the RITA itself dies during the test, or decides to go to sleep rather than flying to some other location.
Each of these methods can help the network operator fulfill the first rule of networking: it has to work.
According to Maor Rudick, in a recent post over at Cloud Native, programming is 10% writing code and 90% understanding why it doesn’t work. This expresses the art of deploying network protocols, security, or anything that needs thought about where and how. I’m not just talking about the configuration, either—why was this filter deployed here rather than there? Why was this BGP community used rather than that one? Why was this aggregation range used rather than some other? Even in a fully automated world, the saying holds true.
So how can you improve the understandability of your network design? Maor defines understandability as “the dev who creates the software is to effortlessly … comprehend what is happening in it.” Continuing—“the more understandable a system is, the easier it becomes for the developers who created it to change it in a way that is safe and predictable.” What are the elements of understandability?
Documentation must be complete, clear, concise, and organized. The two primary failings I encounter in documentation are completeness and organization. Why something is done, when it was last changed, and why it was changed are often missing. The person making the change just assumes “I’ll remember this, or someone will figure it out.” You won’t, and they won’t. Concise is the “other side” of complete … Recording unsubstantial changes just adds information that won’t ever be needed. You have to balance between enough and too much, of course.
Organization is another entire problem in documentation—most people have a favorite way to organize things. When you get a team of people all organizing things based on their favorite way, you end up with a mess. Going back in time … I remember that just about everyone who was assigned to the METNAV shop began their time by re-organizing the tools. Each time the re-organization made things so much easier to find, and improved the MTTR for the airfield equipment we supported … After a while, you’d think someone would ask, “Does re-organizing all the tools every year really help? Or are you just making stuff up for new folks to do?”
Moving beyond documentation, what else can we do to make our networks more understandable?
First, we can focus on actually making networks simpler. I don’t mean just glossing things over with a pretty GUI, or automating thousands of lines of configuration using Python. I mean taking steps by using protocols that are simpler to run, require less configuration, and produce more information you can use for troubleshooting—choose something like IS-IS for your DC fabric underlay rather than BGP, unless you really have several hundred thousand of underlay destinations (hint, if you’ve properly separated “customer” routes in the overlay from “infrastructure” routes in the underlay, you shouldn’t have this kind of routing tangle in the underlay anyway).
What about having multiple protocols that do the same job? Do you really need three or four routing protocols, four or five tunneling protocols, and five or six … well, you get the idea. Reducing the sheer number of protocols running in your network can make a huge difference in the tooling troubleshooting time. What about having four or five kinds of boxes in your network that fulfill the same role? Okay—so maybe you have three DC fabrics, and you run each one using a different vendor. But is there is any reason to have three DC fabrics, each of which has a broad mix of equipment from five different vendors? I doubt it.
Second, you can think about what you would measure in the case of failure, how you would measure it, and put the basic piece in place in the design phase to do those measurements. Don’t wait until you need the data to figure out how to get at it, and what the performance results of trying to get it are going to be.
Third, you can think about where you put policy in your network. There is no “right” answer to this question, other than … be consistent. The first option is to put all your policy in one place—say, on the devices that connect the core to the aggregation, or the devices in the distribution layer. The second option is to always put the policy as close to the source or destination of the traffic impacted by the policy. In a DC fabric, you should always put policy and external connectivity in the T0 or ToR, never in the spine (it’s not a core, it’s a spine).
Maybe you have other ideas on how to improve understandability in networks … If you do, get in touch and let’s talk about it. I’m always looking for practical ways to make networks more understandable.
In the argument between OSPF and BGP in the data center fabric over at Justin’s blog, I am decidedly in the camp of IS-IS. Rather than lay my reasons out here, however (a topic for another blog post?), I want to focus on something else Justin said that I think is incredibly important for network engineers to understand.
I think whiteboards are the most important tool for network design currently available, which makes me sad. I wish that wasn’t true, I want much better tools. I can’t even tell you the number of disasters averted by 2-3 great network engineers arguing over a whiteboard.
I remember—way back—when I was working on the problems around making a link-state protocol work well in a Mobile Ad Hoc Network (MANET), we had two competing solutions presented to the IETF. The first solution was primarily based on whiteboarding through various options and coming up with one that should reduce flooding to an acceptable level. The second was less optimal on the whiteboard but supported by simulations showing it should reduce flooding more effectively.
Which solution “won?” I don’t know what “winning” might mean here, but the solution designed on the whiteboard has been widely deployed and is now showing up in other places—take a look at distoptflood and the flooding optimizations in RIFT. Both are like what we designed all those years ago to make OSPF work in a MANET.
Does this mean simulations, labbing, and testing are useless? No.
Each of these tools brings a different set of strengths to the table when trying to solve a problem. For instance, there is no way to optimize the flooding in a protocol unless you really know how flooding works, what information you have available, where things might fail, what features are designed into the protocol to prevent or work around failures, etc. These things are what you need to put together and understand on a whiteboard.
You cannot think of every possible situation that needs to be simulated, nor can you simulate every possible order of operation, or every possible timing problem that might occur. If you know the protocol, however, you can cover most of this ground and design in fail-safes. Simulations are necessary, but not sufficient to network and protocol design.
On the other side of the coin, Justin points out the stack they were using was known to have a weak BGP implementation and a strong OSPF implementation. This, and other scale and timing issues, will only show up under a simulation. For instance, you cannot answer the question how large can the LSDB become before the processor keels over and dies on a whiteboard—it’s just not possible. The whiteboard is necessary, but not sufficient to network and protocol design.
The danger is that we attempt to replace one with the other—that we either ignore the value of simulation because it’s hard (or even impossible), or we ignore the power of the whiteboard because we don’t understand the protocols well enough to stand at the whiteboard and argue. Every team needs to have at least one person who can “see” how the network will converge just because they understand the protocol. And every team needs to have at least one person who can throw together a simulation and show how the network will converge in the same situation.
Whiteboards and simulations are both crucial tools. Learn the protocols and design well enough to use the one and find or build the tools needed to use the other. Missing either one is going to leave a blind spot in your abilities as an engineer.
Which one are you missing in your skill set right now? If you know the answer to that question, then you know at least one thing you need to learn “next.”
A short while back, the Linux Foundation (Networking), or LFN, published a white paper about the open source networking ecosystem. Rather than review the paper, or try to find a single theme, I decided to just write down “random thoughts” as I read through it. This is the (rather experimental) result.
The paper lists five goals of the project which can be reduced to three: reducing costs, increasing operator’s control over the network, and increasing security (by increasing code inspection). One interesting bit is the pairing of cost reduction with increasing control. Increasing control over a network generally means treating it less like an opaque box and more like a disaggregated set of components, each of which can be tuned in some way to improve the fit between network services, network performance, and business requirements. The less a network is an opaque box, however, the more time and effort required to manage it. This only makes sense—tuning a network to perform better requires time and talent, both of which cost money.
The offsetting point here is disaggregation and using open source can save money—although in my experience it never does. Again, running disaggregated software and hardware requires time and talent, both of which cost money. Intuitively, then, reducing costs and increasing control don’t “pair” together like this. Building a more tunable, flexible network might be possible at the same cost as building a network in some “more traditional way,” but “costs less” is generally the last thing on my mind when I’m thinking about disaggregation, open source, and more modular, systemic wholistic designs.
The bottom line—if you are going to try to sell disaggregation to your boss, do not play the “cost card.” Focus on the business side of things, instead. Another way of putting this—don’t reduce what you are selling to a commodity, or an item with no business value other than reducing costs. First, this is simply not true. Second, you’ll never meet a salesman who says, “what I’m selling you is a commodity, so I’m really just after getting it to you for cheaper.” Even toilet paper companies sell on quality. Networks are more important than toilet paper to the business, right?
A second interesting point— “the network control layer is where end-to-end complex network services are designed and executed.” I broadly agree with this statement, but … I have been pushing, for some time, the idea that there is not just one control plane. Part of the problem with network design is we tend to modularize topologically, using one form or another of hierarchy, quite nicely. What we do no do, however, is think about how to modularize vertically to create a true wholistic view of the system “as a whole.” We do create multiple layers of data planes (they’re called tunnels!), but we do not think about creating multiple layers of control planes.
Going way back in time, when transit providers first started scaling, BGP speakers would not advertise a route that was not also present in a local IGP table (not just the routing table, but the actual OSPF. EIGRP, or IS-IS table). This was called synchronization, and it was designed to ensure traffic being forwarded into the network from the edge had a real path through the network. Since BGP can “skip over” routers using multihop, it was far too easy to send traffic to a router that was not running BGP, and hence did not have forwarding information for the destination … even though some router just one or two hops away did because it was running BGP.
Operators soon discovered keeping their IGP and BGP tables synchronized simply isn’t possible—the IGPs just could not support the route counts required. This led to running BGP on every router in the AS, which led to confederations, then to route reflectors, then to MPLS and other tunneling mechanisms to reduce state in the network core.
Running BGP on every router cleanly separates internal or infrastructure routes from external or customer routes. This is an effective use of information hiding vertically, an overlay with some information, and an underlay with other information. This is what I’ve been pressing for in the DC fabric space for quite a while now … and its something the network engineering community needs to explore and flesh out more in many other spaces.
So, there it is—I could go on, but I think you’re probably already bored with my random thoughts.
For those not following the current state of the ITU, a proposal has been put forward to (pretty much) reorganize the standards body around “New IP.” Don’t be confused by the name—it’s exactly what it sounds like, a proposal for an entirely new set of transport protocols to replace the current IPv4/IPv6/TCP/QUIC/routing protocol stack nearly 100% of the networks in operation today run on. Ignoring, for the moment, the problem of replacing the entire IP infrastructure, what can we learn from this proposal?
What I’d like to focus on is deterministic networking. Way back in the days when I was in the USAF, one of the various projects I worked on was called PCI. The PCI network was a new system designed to unify the personnel processing across the entire USAF, so there were systems (Z100s, 200s, and 250s) to be installed in every location across the base where personnel actions were managed. Since the only wiring we had on base at the time was an old Strowger mainframe, mechanical crossbars at a dozen or so BDFs, and varying sizes of punch-downs at BDFs and IDFs, everything for this system needed to be punched- or wrapped-down as physical circuits.
It was hard to get anything like real bandwidth over paper-wrapped cables with lead shielding that had been installed in the 1960s (or before??), wrap-downs, and 66 blocks. The max we could get in terms of bandwidth was about a T1, and often less. We needed more bandwidth than this, so we installed inverse multiplexers that would combine the bandwidth across multiple physical circuits to something large enough for the PCI system to run on.
The problem we ran into with these inverse multiplexers was they would sometimes fall out of synchronization, meaning frames would be delivered out of order—one of the two cardinal violations of deterministic networks. Since PCI was designed around a purely deterministic networking model, these failures caused havoc with HR. Not a good thing.
The second and third cardinal rules of deterministic networking are that there will be no jitter, and the network will deliver all packets accepted by the network. To make these rules work, there must be some form of entrance gating (circuit setup, generally speaking, with busy signals), fixed packet sizes, and strict quality of service.
In contrast, the rules of packet-switched (non-deterministic) networking are: all packets are accepted, packets can be (almost) any size, there is no guarantee any particular packet will be delivered, and there’s no way to know what the jitter and delay on delivering any particular packet might be.
Some kinds of payloads just need deterministic semantics, while others just need deterministic semantics. The problem, then, is how to build a single network that supports both kinds of semantics. Solving this problem is where thinking through the situation using a problem-solution mindset can help you, as a network engineer (or protocol designer, or software creator) understand what can be done, what cannot be done, and what the limitations are going to be no matter what solution you choose.
There are, at base, only three solutions to this problem. The first is to build a network that supports both deterministic and packet-switched traffic from the control planes to the switching path. The complexity of doing something like this would ultimately outweigh any possible benefit, so let’s leave that solution aside. The second is to emulate a deterministic network on top of a packet-switched network, and the third is to emulate a packet-switched network on top of a deterministic network. Consider the last option first—emulating a packet-switched network on top of a deterministic network. For those who are old enough, this is IP-over-ATM. It didn’t work. The inefficiencies of trying to stuff variably sized packets into the fixed-size frames required to create a deterministic network were so significant that it just … didn’t work. The control planes were difficult to deploy and manage—think ATM LANE, for instance—and the overall network was just really complicated.
Today, what we mostly do is emulate deterministic networks over packet-switched networks. This design allows traffic that does well in a packet-switched environment to run perfectly fine, and traffic that likes “some” deterministic properties, like voice, to work fine with a little effort on the part of the network designer and operator. Building something with all the properties of a genuinely deterministic network on top of a packet-switched network, however, is difficult.
To get there, you need a few things. First, you need some way to strictly control/steer flows, so the mix of packet and deterministic traffic is going allow you to meet deterministic requirements on every link through the network. Second, you need some excellent QoS mechanism that knows how to provide deterministic results while mixing in packet-switched traffic “underneath.” Third, you need to overprovision enough to take up any “slack” and variability, as well as to account for queuing and clock on/clock off delays through switching devices.
The truth is we can come pretty close—witness the ability of IP networks to carry video streaming and what looks like traditional voice today. Can it get better? I’m confident we can build systems that are ever closer to emulating a truly deterministic network on top of a packet-switched network—so long as we mind the tradeoffs and are willing to throw capacity and hardware at the problem.
Can we ever truly emulate packet-switched networks on top of deterministic networks? I don’t see how. It’s not just that we’ve tried and failed; it’s that the math just doesn’t ever seem to “work right” in some way or another.
So while the “new IP” proposal brings up an interesting problem—future applications may need more deterministic networking profiles—it doesn’t explain why we should believe either building a completely parallel deterministic network or trying to flip the stack to emulate a packet-switched network on top of a deterministic network, makes more sense than what we are doing today.
Looking at this from a problem/solution perspective helps clarify the situation, and produce a conclusion about which path is best, without even getting into specific protocols or implementations. Really understanding the problem you are trying to solve, even at an abstract level, and then working through all the possible solutions, even ones that might not have been invented yet (although I can promise you then have been invented), can help you get your mind around the engineering possibilities.
It is probably not how you’re accustomed to looking at network design, protocol selection, etc. But it’s one you should start using.
I think we can all agree networks have become too complex—and this complexity is a result of the network often becoming the “final dumping ground” of every problem that seems like it might impact more than one system, or everything no-one else can figure out how to solve. It’s rather humorous, in fact, to see a lot of server and application folks sitting around saying “this networking stuff is so complex—let’s design something better and simpler in our bespoke overlay…” and then falling into the same complexity traps as they start facing the real problems of policy and scale.
This complexity cannot be “automated away.” It can be smeared over with intent, but we’re going to find—soon enough—that smearing intent on top of complexity just makes for a dirty kitchen and a sub-standard meal.
While this is always “top of mind” in my world, what brings it to mind this particular week is a paper by Jen Rexford et al. (I know Jen isn’t on the lead position in the author list, but still…) called A Clean Slate 4D Approach to Network Control and Management. Of course, I can appreciate the paper in part because I agree with a lot of what’s being said here. For instance—
We believe the root cause of these problems lies in the control plane running on the network elements and the management plane that monitors and configures them. In this paper, we argue for revisiting the division of functionality and advocate an extreme design point that completely separates a network’s decision logic from the protocols that govern interaction of network elements.
In other words, we’ve not done our modularization homework very well—and our lack of focus on doing modularization right is adding a lot of unnecessary complexity to our world. The four planes proposed in the paper are decision, dissemination, discovery, and data. The decision plane drives network control, including reachability, load balancing, and access control. The dissemination plane “provides a robust and efficient communication substrate” across which the other planes can send information. The discovery plane “is responsible for discovering the physical components of the network,” giving each item an identifier, etc. The data plane carries packets edge-to-edge.
I do have some quibbles with this architecture, of course. To begin, I’m not certain the word “plane” is the right one here. Maybe “layers,” or something else that implies more of a modular concept with interactions, and less a “ships in the night” sort of affair. My more substantial disagreement is with the placement of “interface configuration” and where reachability is placed in the model.
Consider this: reachability and access control are, in a sense, two sides of the same coin. You learn where something is to make it reachable, and then you block access to it by hiding reachability from specific places in the network. There are two ways to control reachability—by hiding the destination, or by blocking traffic being sent to the destination. Each of these has positive and negative aspects.
But notice this paradox—the control plane cannot hide reachability towards something it does not know about. You must know about something to prevent someone from reaching it. While reachability and access control are two sides of the same coin, they are also opposites. Access control relies on reachability to do its job.
To solve this paradox, I would put reachability into discovery rather than decision. Discovery would then become the discovery of physical devices, paths, and reachability through the network. No policy would live here—discovery would just determine what exists. All the policy about what to expose about what exists would live within the decision plane.
While the paper implies this kind of system must wait for some day in the future to build a network using these principles, I think you can get pretty close today. My “ideal design” for a data center fabric right now is (modified) IS-IS or RIFT in the underlay and eVPN in the overlay, with a set of controllers sitting on top. Why?
IS-IS doesn’t rely on IP, so it can serve in a mostly pure discovery role, telling any upper layers where things are, and what is changing. eVPN can provide segmented reachability on top of this, as well as any other policies. Controllers can be used to manage eVPN configuration to implement intent. A separate controller can work with IS-IS on inventory and lifecycle of installed hardware. This creates a clean “break” between the infrastructure underlay and overlays, and pretty much eliminates any dependence the underlay has on the overlay. Replacing the overlay with a more SDN’ish solution (rather than BGP) is perfectly do-able and reasonable.
While not perfect, the link-state underlay/BGP overlay model comes pretty close to implementing what Jen and her co-authors are describing, only using protocols we have around today—although with some modification.
But the main point of this paper stands—a lot of the reason for the complexity we fact today is simply because we modularize using aggregation and summarization and call the job “done.” We aren’t thinking about the network as a system, but rather as a bunch of individual appliances we slap together into places, which we then connect through other places (or strings, like between tin cans).
Something to think about the next time you get that 2AM call.