The OSI Model: STOP IT!

The OSI model is perhaps the best-known—and perhaps the most-loved—model in the networking world. It’s taught in every basic networking course, and just about every blog (other than this one) has some article explaining the model someplace or another (for instance, here is one of the better examples).

The reality is, however, that I’ve been in the networking business for 30’ish years and I’ve never once used the OSI model for anything practical. I’ve used the model when writing books because just about every book on networking has to have a section on the OSI model. I’ve used the model when writing a paper comparing two different protocols, back in the multiprotocol days (VIP versus IPX versus IP), but we don’t have those kinds of arguments very often any longer.

So, we all learn the OSI model, and yet I don’t know of anyone who actually uses the OSI model in understanding how protocols work, or how to troubleshoot a network. There’s the “it’s a layer two problem” statement, and that’s about the end its useful life, it seems.

Let me make a suggestion—learn, use, and teach the RINA model instead. Okay, so what is the RINA model? It is the Recursive Internet Network Architecture, hence RINA. The observation John Day made when creating the RINA model is this: there are only four problems you need to solve to move data from one host to another. It just happens that those four problems need to be solved multiple times: across each physical link, between each pair of hosts, and then again between each pair of applications (at least—John Day argues there should be eight layers rather than seven, but that argument is outside the scope of this simple blog post).

The four problems are: transport (which I would call marshaling), multiplexing, error control, and flow control. Protocols that solve these problems tend to solve two of the four problems rather than one; the pairings almost always seem to be marshaling with multiplexing and error control with flow control.

On a single Ethernet hop, multiplexing and marshaling are generally solved by the physical link Ethernet protocol. Error and flow control, on the other hand, are generally solved by the data link protocol (something we don’t much think about any longer). Again, multiplexing and marshaling are generally solved by IP, while error and flow control are generally solved by TCP.

The model doesn’t perfectly describe the purpose of every protocol in the modern networking stack, but it comes much closer than the OSI model. Since it’s recursive, you can even insert layers to represent tunnels without a lot of hand-waving in the process, or “breaking” the model.

Why is the RINA model better for understanding protocols? Because it describes what the protocol is trying to accomplish, rather than where it lives in the stack, or what kinds of interfaces it provides. Why is the RINA model better for troubleshooting? Because if you know what isn’t working, it gives you a better general area to look in to find the problem.

My challenge for 2021 is, then—learn and use the RINA model. I (humbly) believe it will actually make you a better network engineer.

The Hedge 62: Jacob Hess and the Importance of History

At first glance, it would seem like the history of a technology would have little to do with teaching that technology. Jacob Hess of NexGenT joins us in this episode of the Hedge to help us understand why he always includes the history of a technology when teaching it—a conversation that broadened out into why learning history is important for all network engineers.

download

You can find the history of networking here.

Underhanded Code and Automation

So, software is eating the world—and you thought this was going to make things simpler, right? If you haven’t found the tradeoffs, you haven’t looked hard enough. I should trademark that or something! 🙂 While a lot of folks are thinking about code quality and supply chain are common concerns, there are a lot of little “side trails” organizations do not tend to think about. One such was recently covered in a paper on underhanded code, which is code designed to pass a standard review which be used to harm the system later on. For instance, you might see at some spot—

if (buffer_size=REALLYLONGDECLAREDVARIABLENAMEHERE) {
/* do some stuff here */
} /* end of if */

Can you spot what the problem might be? In C, the = is different than the ==. Which should it really be here? Even astute reviewers can easily miss this kind of detail—not least because it could be an intentional construction. Using a strongly typed language can help prevent this kind of thing, like Rust (listen to this episode of the Hedge for more information on Rust), but nothing beats having really good code formatting rules, even if they are apparently arbitrary, for catching these things.

The paper above lists these—

  • Use syntax highlighting and typefaces that clearly distinguish characters. You should be able to easily tell the difference between a lowercase l and a 1.
  • Require all comments to be on separate lines. This is actually pretty hard in C, however.
  • Prettify code into a standard format not under the attacker’s control.
  • Use compiler warnings in static analysis.
  • Forbid unneeded dangerous constructions
  • Use runtime memory corruption detection
  • Use fuzzing
  • Watch your test coverage

Not all of these are directly applicable for the network engineer dealing with automation, but they do provide some good pointers, or places to start. A few more…

Yoda assignments are named after Yoda’s constant placement of the subject after the verb (or in a split infinitive)—”succeed you will…” It’s not technically wrong in terms of grammar, but it is just hard enough to understand that it makes you listen carefully and think a bit harder. In software development, the variable taking the assignment should be on the left, and the thing being assigned should be on the right. Reversing these is a Yoda assignment; it’s technically correct, but it’s harder to read.

Arbitrary standardization is useful when there are many options that ultimately result in the same outcome. Don’t let options proliferate just because you can.

Use macros!

There are probably plenty more, but this is an area where we really are not paying attention right now.

Random Thoughts

This week is very busy for me, so rather than writing a single long, post, I’m throwing together some things that have been sitting in my pile to write about for a long while.

From Dalton Sweeny:

A physicist loses half the value of their physics knowledge in just four years whereas an English professor would take over 25 years to lose half the value of the knowledge they had at the beginning of their career. . . Software engineers with a traditional computer science background learn things that never expire with age: data structures, algorithms, compilers, distributed systems, etc. But most of us don’t work with these concepts directly. Abstractions and frameworks are built on top of these well studied ideas so we don’t have to get into the nitty-gritty details on the job (at least most of the time).

This is precisely the way network engineering is. There is value in the kinds of knowledge that expire, such as individual product lines, etc.—but the closer you are to the configuration, the more ephemeral the knowledge is. This is one of the entire points of rule 11 is your friend. Learn the foundational things that make learning the ephemeral things easier. There are only four problems (really) in moving data from one place to another. There are only around four solutions for each of those problems. Each of those solutions is bounded into a small set (again, about four for each) sub-solutions, or ways of implementing the solution, etc.

I’m going to spend some time talking about this in the Livelesson I’m currently recording, so watch this space for an announcement sometime early next year about publication.

From Ivan P:

What I’m pointing out in this rant is the reverse reasoning along the lines “vendor X is doing something, which confirms there’s a real market need for it”. I’ve been in IT too long, and seen how the startup/VC sausage is made, to believe that fairy tale… and even when it’s true, it doesn’t necessarily imply that you need whatever vendor X is selling.

There are two ways to look at this. Either vendors should lead the market in building solutions, or they should follow whatever the customer wants. From my perspective, one of the problems we have right now is everything is a massive mish-mash of these two things. The operator’s design team thinks of a neat way to do X, and then promises the account team a big check if its implemented. It doesn’t matter that X could be solved some other way that might be simpler, etc.—all that matters is the check. In this case, the vendor stops challenging the customer to build things better, and starts just acting like a commodity provider, rather than an innovative partner.

The interaction between the customer and the vendor needs to be more push-pull than it currently is—right now, it seems like either the operator simply dictates terms to the vendor, or the vendor pretty much dictates architecture to the operator. We need to find a middle ground. The vendor does need to have a solid solution architecture, but the architecture needs to be flexible, as well, and the blocks used to build that architecture need to be usable in ways not anticipated by the vendor’s design folks.

On the other hand, we need to stop chasing features. This isn’t just true of vendors, this is true of operators as well. You get feature lists because that’s what you ask for. Often, operators ask for feature lists because that’s the easiest thing to measure, or because they already have a completely screwed up design they are trying to brownfield around. The worst is—“we have this brownfield that we just can’t get rid of, so we want to build yet another overlay on top, which will make everything simpler.” After about the twentieth overlay a system crash becomes a matter of when rather than if.

Everyone Must Learn to Code

The word on the street is that everyone—especially network engineers—must learn to code. A conversation with a friend and an article passing through my RSS reader brought this to mind once again—so once more into the breach. Part of the problem here is that we seem to have a knack for asking the wrong question. When we look at network engineer skill sets, we often think about the ability to configure a protocol or set of features, and then the ability to quickly troubleshoot those protocols or features using a set of commands or techniques.

This is, in some sense, what various certifications have taught us—we have reached the expert level when we can configure a network quickly, or when we can prove we understand a product line. There is, by the way, a point of truth in this. If you claim your expertise is with a particular vendor’s gear, then it is true that you must be able to configure and troubleshoot on that vendor’s gear to be an expert. There is also a problem of how to test for networking skills without actually implementing something, and how to implement things without actually configuring them. This is a problem we are discussing in the new “certification” I’ve been working on, as well.

This is also, in some sense, what the hiring processes we use have taught us. Computers like to classify things in clear and definite ways. The only clear and definite way to classify networking skills is by asking questions like “what protocols do you understand how to configure and troubleshoot?” It is, it seems, nearly impossible to test design or communication skills in a way that can be easily placed on a resume.

Coding, I think, is one of those skills that is easy to appear to measure accurately, and it’s also something the entire world insists is the “coming thing.” No coding skills, no job. So it’s easy to ask the easy question—what languages do you know, how many lines of code have you written, etc. But again, this is the wrong question (or these are the wrong questions).

What is the right question? In terms of coding skills, more along the lines of something like, “do you know how to build and use tools to solve business problems?” I phrase it this way because the one thing I have noticed about every really good coder I have known is they all spend as much time building tools as they do building shipping products. They build tools to test their code, or to modify the code they’ve already written en masse, etc. In fact, the excellent coders I know treat functions like tools—if they have to drive a nail twice, they stop and create a hammer rather than repeating the exercise with some other tool.

So why is coding such an important skill to gain and maintain for the network engineer? This paragraph seems to sum it up nicely for me—

“Coding is not the fundamental skill,” writes startup founder and ex-Microsoft program manager Chris Granger. What matters, he argues, is being able to model problems and use computers to solve them. ”We don’t want a generation of people forced to care about Unicode and UI toolkits. We want a generation of writers, biologists, and accountants that can leverage computers.”

It’s not the coding that matters, it’s “being able to model problems and use computers to solve them.” This is the essence of tool building or engineering—seeing the problem, understanding the problem, and then thinking through (sometimes by trial and error) how to build a tool that will solve the problem in a consistent, easy to manage way. I fear that network engineers are taking their attitude of configuring things and automating it to make the configuration and troubleshooting faster. We seem to end up asking “how do I solve the problem of making the configuration of this network faster,” rather than asking “what business problem am I trying to solve?”

To make effective use of the coding skills we’re telling everyone to learn, we need to go back to basics and understand the problems we’re trying to solve—and the set of possible solutions we can use to solve those problems. Seen this way, the routing protocol becomes “just another tool,” just like a function call, that can be used to solve a specific set of problems—instead of a set of configuration lines that we invoke like a magic incantation to make things happen.

Coding skills are important—but they require the right mindset if we’re going to really gain the sorts of efficiencies we think are possible.

The White Board and the Simulation

In the argument between OSPF and BGP in the data center fabric over at Justin’s blog, I am decidedly in the camp of IS-IS. Rather than lay my reasons out here, however (a topic for another blog post?), I want to focus on something else Justin said that I think is incredibly important for network engineers to understand.

I think whiteboards are the most important tool for network design currently available, which makes me sad. I wish that wasn’t true, I want much better tools. I can’t even tell you the number of disasters averted by 2-3 great network engineers arguing over a whiteboard.

I remember—way back—when I was working on the problems around making a link-state protocol work well in a Mobile Ad Hoc Network (MANET), we had two competing solutions presented to the IETF. The first solution was primarily based on whiteboarding through various options and coming up with one that should reduce flooding to an acceptable level. The second was less optimal on the whiteboard but supported by simulations showing it should reduce flooding more effectively.

Which solution “won?” I don’t know what “winning” might mean here, but the solution designed on the whiteboard has been widely deployed and is now showing up in other places—take a look at distoptflood and the flooding optimizations in RIFT. Both are like what we designed all those years ago to make OSPF work in a MANET.

Does this mean simulations, labbing, and testing are useless? No.

Each of these tools brings a different set of strengths to the table when trying to solve a problem. For instance, there is no way to optimize the flooding in a protocol unless you really know how flooding works, what information you have available, where things might fail, what features are designed into the protocol to prevent or work around failures, etc. These things are what you need to put together and understand on a whiteboard.

You cannot think of every possible situation that needs to be simulated, nor can you simulate every possible order of operation, or every possible timing problem that might occur. If you know the protocol, however, you can cover most of this ground and design in fail-safes. Simulations are necessary, but not sufficient to network and protocol design.

On the other side of the coin, Justin points out the stack they were using was known to have a weak BGP implementation and a strong OSPF implementation. This, and other scale and timing issues, will only show up under a simulation. For instance, you cannot answer the question how large can the LSDB become before the processor keels over and dies on a whiteboard—it’s just not possible. The whiteboard is necessary, but not sufficient to network and protocol design.

The danger is that we attempt to replace one with the other—that we either ignore the value of simulation because it’s hard (or even impossible), or we ignore the power of the whiteboard because we don’t understand the protocols well enough to stand at the whiteboard and argue. Every team needs to have at least one person who can “see” how the network will converge just because they understand the protocol. And every team needs to have at least one person who can throw together a simulation and show how the network will converge in the same situation.

Whiteboards and simulations are both crucial tools. Learn the protocols and design well enough to use the one and find or build the tools needed to use the other. Missing either one is going to leave a blind spot in your abilities as an engineer.

Which one are you missing in your skill set right now? If you know the answer to that question, then you know at least one thing you need to learn “next.”

Deterministic Networking and New IP

For those not following the current state of the ITU, a proposal has been put forward to (pretty much) reorganize the standards body around “New IP.” Don’t be confused by the name—it’s exactly what it sounds like, a proposal for an entirely new set of transport protocols to replace the current IPv4/IPv6/TCP/QUIC/routing protocol stack nearly 100% of the networks in operation today run on. Ignoring, for the moment, the problem of replacing the entire IP infrastructure, what can we learn from this proposal?

What I’d like to focus on is deterministic networking. Way back in the days when I was in the USAF, one of the various projects I worked on was called PCI. The PCI network was a new system designed to unify the personnel processing across the entire USAF, so there were systems (Z100s, 200s, and 250s) to be installed in every location across the base where personnel actions were managed. Since the only wiring we had on base at the time was an old Strowger mainframe, mechanical crossbars at a dozen or so BDFs, and varying sizes of punch-downs at BDFs and IDFs, everything for this system needed to be punched- or wrapped-down as physical circuits.

It was hard to get anything like real bandwidth over paper-wrapped cables with lead shielding that had been installed in the 1960s (or before??), wrap-downs, and 66 blocks. The max we could get in terms of bandwidth was about a T1, and often less. We needed more bandwidth than this, so we installed inverse multiplexers that would combine the bandwidth across multiple physical circuits to something large enough for the PCI system to run on.

The problem we ran into with these inverse multiplexers was they would sometimes fall out of synchronization, meaning frames would be delivered out of order—one of the two cardinal violations of deterministic networks. Since PCI was designed around a purely deterministic networking model, these failures caused havoc with HR. Not a good thing.

The second and third cardinal rules of deterministic networking are that there will be no jitter, and the network will deliver all packets accepted by the network. To make these rules work, there must be some form of entrance gating (circuit setup, generally speaking, with busy signals), fixed packet sizes, and strict quality of service.

In contrast, the rules of packet-switched (non-deterministic) networking are: all packets are accepted, packets can be (almost) any size, there is no guarantee any particular packet will be delivered, and there’s no way to know what the jitter and delay on delivering any particular packet might be.

Some kinds of payloads just need deterministic semantics, while others just need deterministic semantics. The problem, then, is how to build a single network that supports both kinds of semantics. Solving this problem is where thinking through the situation using a problem-solution mindset can help you, as a network engineer (or protocol designer, or software creator) understand what can be done, what cannot be done, and what the limitations are going to be no matter what solution you choose.

There are, at base, only three solutions to this problem. The first is to build a network that supports both deterministic and packet-switched traffic from the control planes to the switching path. The complexity of doing something like this would ultimately outweigh any possible benefit, so let’s leave that solution aside. The second is to emulate a deterministic network on top of a packet-switched network, and the third is to emulate a packet-switched network on top of a deterministic network. Consider the last option first—emulating a packet-switched network on top of a deterministic network. For those who are old enough, this is IP-over-ATM. It didn’t work. The inefficiencies of trying to stuff variably sized packets into the fixed-size frames required to create a deterministic network were so significant that it just … didn’t work. The control planes were difficult to deploy and manage—think ATM LANE, for instance—and the overall network was just really complicated.

Today, what we mostly do is emulate deterministic networks over packet-switched networks. This design allows traffic that does well in a packet-switched environment to run perfectly fine, and traffic that likes “some” deterministic properties, like voice, to work fine with a little effort on the part of the network designer and operator. Building something with all the properties of a genuinely deterministic network on top of a packet-switched network, however, is difficult.

To get there, you need a few things. First, you need some way to strictly control/steer flows, so the mix of packet and deterministic traffic is going allow you to meet deterministic requirements on every link through the network. Second, you need some excellent QoS mechanism that knows how to provide deterministic results while mixing in packet-switched traffic “underneath.” Third, you need to overprovision enough to take up any “slack” and variability, as well as to account for queuing and clock on/clock off delays through switching devices.

The truth is we can come pretty close—witness the ability of IP networks to carry video streaming and what looks like traditional voice today. Can it get better? I’m confident we can build systems that are ever closer to emulating a truly deterministic network on top of a packet-switched network—so long as we mind the tradeoffs and are willing to throw capacity and hardware at the problem.

Can we ever truly emulate packet-switched networks on top of deterministic networks? I don’t see how. It’s not just that we’ve tried and failed; it’s that the math just doesn’t ever seem to “work right” in some way or another.

So while the “new IP” proposal brings up an interesting problem—future applications may need more deterministic networking profiles—it doesn’t explain why we should believe either building a completely parallel deterministic network or trying to flip the stack to emulate a packet-switched network on top of a deterministic network, makes more sense than what we are doing today.

Looking at this from a problem/solution perspective helps clarify the situation, and produce a conclusion about which path is best, without even getting into specific protocols or implementations. Really understanding the problem you are trying to solve, even at an abstract level, and then working through all the possible solutions, even ones that might not have been invented yet (although I can promise you then have been invented), can help you get your mind around the engineering possibilities.

It is probably not how you’re accustomed to looking at network design, protocol selection, etc. But it’s one you should start using.