WRITTEN – Page 11 – rule 11 reader

Technologies that Didn’t: Network Operating Systems

For those with a long memory—no, even longer than that—there were once things called Network Operating Systems (NOS’s). These were not the kinds of NOS’s we have today, like Cisco IOS Software, or Arista EOS, or even SONiC. Rather, these were designed for servers. The most common example was Novell’s Netware. These operating systems were the “bread and butter” of the networking world for many years. I was a Certified Netware Expert (CNE) version 4.0, and then 4.11, before I moved into the routing and switching world. I also deployed Banyan’s Vines, IBM’s OS/2, and a much simpler system called LANtastic, among others.

What were these pieces of software? They were largely built around providing a complete environment for the network user. These systems began with file sharing and directory services and included a small driver that would need to be installed on each host accessing the file share. This small driver was actually a network stack for a proprietary set of protocols. For Vines, this was VIP; for Netware, it was IPX. Over time, these systems began to include email, and then, as a natural outgrowth of file sharing and email, directory services. For some time, there was a serious race on to push ever more features into these network operating systems. For instance, a Vines server could not only act as an email server, a file server, and a directory server, it could also act as a router, connecting two Ethernet segments and pushing traffic between them.

What happened? Why and how did these kinds of systems disappear—almost overnight it seems? After all, they provided a lot of very interesting services. You could use one of these systems as a corporate directory, adding each person’s contact information directly into the system itself. Once the person was there, you could assign them rights to file shares, individual files, and even services running on one of the servers. For instance, you could build an application on a framework within Vines that would run across multiple Vines server—the distribution of the data and the application were all handled in the Vines operating system itself, so long as you built it to their framework—and then simply give people access to it. Lotus Notes, which is still in use today from what I understand was an overlay service of the same style. You didn’t need to worry about access control, the difference between authentication and authorization, etc.; these were all built into the system

Why don’t we see these in widespread use today? The “official” reason, if there is such a thing, is the standardization of the IP protocol stack, and its widespread deployment, caused all of these operating systems to be replaced by a federation of other protocols and applications. For instance, FTP opened up the ability to upload and download files across an IP network, and SMTP standardized the various email clients so email gateways were no longer needed.

A more unofficial answer might be this: these systems tried to do too much. Rather than being a series of smaller systems, each of which solved a particular problem, each of these systems tried to solve every problem, from access control to routing. These systems became bloated and difficult to operate over time. The resource tree, which was grounded in X.500, in Netware 4.11 was a thing of beauty, if you like staring at Mandelbrot patterns. If there were some access control problem, it could take hours to work through the various layers of permissions, and how each was being inherited from the level above.

Further, these large scale, monolithic servers eventually could not keep up with the smaller tools that were being iteratively improved in the IP and OSI protocol suites. As networks grew, and routing became more important, these operating systems struggled to keep up with complex wide area networks.

Network operating systems are another story of complex, multifaceted, monolithic solutions to a lot of different problems. As with all such solutions, smaller, simpler systems simply overrun the capabilities of the monolithic systems through quick iteration of a smaller problem space. While these systems started out simple, they quickly took on too much, and ended up being difficult to deploy and maintain.

Posted in WRITTEN

Hints and Principles: Applied to Networks

While software design is not the same as network design, there is enough overlap for network designers to learn from software designers. A recent paper published by Butler Lampson, updating a paper he wrote in 1983, is a perfect illustration of this principle. The paper is caleld Hints and Principles for Computer System Design. I’m not going to write a full review here–you should really go read the paper for yourself–but rather just point out some useful bits of the paper.

The first really useful point of this paper is Lampson breaks down the entire field of software design into three basic questions: What, How, and When (or who)? Each of these corresponds to the goals, techniques, and processes used to design and develop software. These same questions and answers apply to network design–if you are missing one of these three areas, then you are probably missing some important set of questions you have not answered yet. Each of these is also represented by an acronym: what? is STEADY, how? is AID, and when? is ART. Let’s look at a couple of these in a little more detail to see how Lampson’s system works.

STEADY stands for simple, timely, efficient, adaptable, dependable, and yummy. Simple is just what it sounds like –reduce complexity. I’m not entirely on board with Lampson’s description of simplicity, which seems to focus on abstraction–abstraction is one useful tool, but anyone who reads my work regularly knows I’m rather more careful about abstraction than most because it involves often-unexamined tradeoffs. Timely primarily relates to “is there a market for this,” in software design; for networks it might be better put as “does the business need this now or later?” Efficient is one of those tradeoffs involved in abstraction–what I might call one of the various ways of optimizing a system. Adaptable means just what it sounds like–are you creating technial debt that must be resolved later? Dependable could be translated to resilience in network design, but it would also relate to many aspects of security, and even the jitter and delay elements in application support.

Yummy is one many network engineers will not be familiar with, but is worth considering. If I’m reading Lampson right here, another way to say this might be “easy to consume.” Why do you want your customers to be able to consume the network easily? Because you do not want them running off and using the cloud (for instance) because they find committing and understanding resources in your network so difficult. We have, for far to long, assumed that “easy to consume” in the network design world means “just plug it into the wall.” It’s not that simple.

The second one, AID, stands for approximate, incremental, and divide & conquer. These are, again, easily adaptable to network design. You don’t need to make the design perfect the first time. In fact, as a young artist one thing that was drilled into my head was that the perfect was the enemy of the good–it’s better to get it approximately right, right now, than perfectly right ten years down the road (when no-one cares any longer). Incremental speaks to modularization, scale-out, and lifecycle management, for instance.

While not every principle here can be applied, a lot of them can. Having them listed out in an easy-to-remember format like this is a great design aid–learn these, and use them.

Posted in DESIGN, WRITTEN

Underhanded Code and Automation

So, software is eating the world—and you thought this was going to make things simpler, right? If you haven’t found the tradeoffs, you haven’t looked hard enough. I should trademark that or something! 🙂 While a lot of folks are thinking about code quality and supply chain are common concerns, there are a lot of little “side trails” organizations do not tend to think about. One such was recently covered in a paper on underhanded code, which is code designed to pass a standard review which be used to harm the system later on. For instance, you might see at some spot—

if (buffer_size=REALLYLONGDECLAREDVARIABLENAMEHERE) {
/* do some stuff here */
} /* end of if */

Can you spot what the problem might be? In C, the = is different than the ==. Which should it really be here? Even astute reviewers can easily miss this kind of detail—not least because it could be an intentional construction. Using a strongly typed language can help prevent this kind of thing, like Rust (listen to this episode of the Hedge for more information on Rust), but nothing beats having really good code formatting rules, even if they are apparently arbitrary, for catching these things.

The paper above lists these—

Use syntax highlighting and typefaces that clearly distinguish characters. You should be able to easily tell the difference between a lowercase l and a 1.
Require all comments to be on separate lines. This is actually pretty hard in C, however.
Prettify code into a standard format not under the attacker’s control.
Use compiler warnings in static analysis.
Forbid unneeded dangerous constructions
Use runtime memory corruption detection
Use fuzzing
Watch your test coverage

Not all of these are directly applicable for the network engineer dealing with automation, but they do provide some good pointers, or places to start. A few more…

Yoda assignments are named after Yoda’s constant placement of the subject after the verb (or in a split infinitive)—”succeed you will…” It’s not technically wrong in terms of grammar, but it is just hard enough to understand that it makes you listen carefully and think a bit harder. In software development, the variable taking the assignment should be on the left, and the thing being assigned should be on the right. Reversing these is a Yoda assignment; it’s technically correct, but it’s harder to read.

Arbitrary standardization is useful when there are many options that ultimately result in the same outcome. Don’t let options proliferate just because you can.

Use macros!

There are probably plenty more, but this is an area where we really are not paying attention right now.

Posted in SECURITY, SKILLS, WRITTEN

Link State in DC Fabrics

If you don’t normally read IPJ, you should. Melchoir and I have an article up in the latest edition on link state in DC fabrics.

To make a case for linkstate protocols in DC fabric underlays, an extensive examination of the positive and negative aspects of BGP—and the other available protocols—is essential. Ultimately, it is up to individual operators to decide which protocol is “the best” for their application, a decision based on business and operational—as well as technical—reasons.

Read the whole thing here.

Posted in ON THE NET, WRITTEN

You Cannot Increase the Speed of Light (RFC1925 Rule 2)

According to RFC1925, the second fundamental truth of networking is: No matter how hard you push and no matter what the priority, you can’t increase the speed of light.

However early in the world of network engineering this problem was first observed (see, for instance, Tanenbaum’s “station wagon example” in Computer Networks), human impatience is forever trying to overcome the limitations of the physical world, and push more data down the pipe than mother nature intended (or Shannon’s theory allows).

One attempt at solving this problem is the description of an infinitely fat pipe (helpfully called an “infan(t)”) described in RFC5984. While packets would still need to be clocked onto such a network, incurring serialization delay, the ability to clock an infinite number of packets onto the network at the same moment in time would represent a massive gain in a network’s ability, potentially reaching speeds faster than the speed of light. The authors of RFC5984 describe several attempts to build such a network, including black fiber, on which the lack of light implies data transmission. This is problematic, however, because a lack of information can be interpreted differently depending on the context. A pregnant pause has far different meaning than a shocked pause, for instance, or just a plain pause.

The team experimenting with faster than light communication also tried locking netcats up in boxes, but this seemed to work and not work at the same time. Finally, the researchers settled on ESP based forwarding, in which two people with a telepathic link transmit data over long distances. They compute the delay of such communication at around 350ms, regardless of the distance involved. This is clearly a potential faster than speed-of-light communication medium.

Another plausible option for building infinitely fat pipes is to broadcast everything. If you could reach an entire region in some way at once, it might be possible to build a full mesh of hosts, each transmitting to every other host in the region at the same time, ultimately constituting an infinitely fat pipe. Such a system is described in RFC6217, which describes the transmission of broadcast packets across entire regions using air as a medium. This kind of work is a logical extension of the stretched Ethernet segments often used between widely separated data centers and campuses, only using a more easily accessed medium (the air). The authors of this RFC note the many efficiencies gained from using broadcast only transmission modes, such as not needing destination addresses, the TCP three-way handshake process, and acknowledgements (which reportedly consume an inordinate amount of bandwidth).

Foreseeing the time when faster than speed-of-light networking would be possible, R. Hinden wrote a document detailing some of the design considerations for such networks which was published as RFC6921. This document is primarily concerned with the ability of the TCP three-way handshake to support an environment where the network’s speed of transmission is so much faster than the speed at which packets are processed or clocked onto the network that an acknowledgement is received before the original packet is transmitted. R. Hinden suggests that it might be possible to use packet drops in normal networks to emulate this behavior, and find some way to solve it in case faster than speed-of-light networks become generally available—such as the ESP network described in RFC5984.

More recent, and realistic, work in faster than speed-of-light networking has been undertaken by the proposed Quantum Networking Research Group in the IRTF. You can read the proposed architecture for a quantum Internet here.

Posted in CULTURE, WRITTEN

Reducing Complexity through Interaction Surfaces

A recent paper on network control and management (which includes Jennifer Rexford on the author list—anything with Jennifer on the author list is worth reading) proposes a clean slate 4d approach to solving much of the complexity we encounter in modern networks. While the paper is interesting, it’s very unlikely we will ever see a clean slate design like the one described, not least because there will always be differences between what the proper splits are—what should go where.

There is one section of the paper that eloquently speaks to current architecture, however. The authors describe a situation where routing and packet filters are used together to prevent one set of hosts from reaching another set of hosts. Changes in the network, however, cause the packet filters to be bypassed, opening up communications between these two sets of hosts.

This is exactly the problem we so often face in network engineering today—overlapping systems used to solve a single problem do not pay attention to the same signals or information to do their jobs. So here’s a thought about an obvious way to reduce the complexity of your network—try to use one tool to do one job. Before the days of automation, this was much harder to do. There was no way to distribute QoS configurations, for instance, or access lists, much less what might be considered an “easy way.” Because of this, it made some kind of sense to use routing protocols as a sort of distributed database and policy engine to move filters and the like around.

Today, however, we have automation. Because of this, it makes more sense to use automation to manage as much data plane policy as you can, leaving the routing protocol to do its job—provide reachability across an ever-changing network. There are still things, like traffic steering and prefix distribution rules, which should stay inside routing. But when you put routing filters in place to solve a data plane problem, it might be worth thinking about whether that is the right thing to do any longer.

Automation, in this case, can change everything.

Posted in DESIGN, WRITTEN

Random Thoughts

This week is very busy for me, so rather than writing a single long, post, I’m throwing together some things that have been sitting in my pile to write about for a long while.

From Dalton Sweeny:

A physicist loses half the value of their physics knowledge in just four years whereas an English professor would take over 25 years to lose half the value of the knowledge they had at the beginning of their career. . . Software engineers with a traditional computer science background learn things that never expire with age: data structures, algorithms, compilers, distributed systems, etc. But most of us don’t work with these concepts directly. Abstractions and frameworks are built on top of these well studied ideas so we don’t have to get into the nitty-gritty details on the job (at least most of the time).

This is precisely the way network engineering is. There is value in the kinds of knowledge that expire, such as individual product lines, etc.—but the closer you are to the configuration, the more ephemeral the knowledge is. This is one of the entire points of rule 11 is your friend. Learn the foundational things that make learning the ephemeral things easier. There are only four problems (really) in moving data from one place to another. There are only around four solutions for each of those problems. Each of those solutions is bounded into a small set (again, about four for each) sub-solutions, or ways of implementing the solution, etc.

I’m going to spend some time talking about this in the Livelesson I’m currently recording, so watch this space for an announcement sometime early next year about publication.

From Ivan P:

What I’m pointing out in this rant is the reverse reasoning along the lines “vendor X is doing something, which confirms there’s a real market need for it”. I’ve been in IT too long, and seen how the startup/VC sausage is made, to believe that fairy tale… and even when it’s true, it doesn’t necessarily imply that you need whatever vendor X is selling.

There are two ways to look at this. Either vendors should lead the market in building solutions, or they should follow whatever the customer wants. From my perspective, one of the problems we have right now is everything is a massive mish-mash of these two things. The operator’s design team thinks of a neat way to do X, and then promises the account team a big check if its implemented. It doesn’t matter that X could be solved some other way that might be simpler, etc.—all that matters is the check. In this case, the vendor stops challenging the customer to build things better, and starts just acting like a commodity provider, rather than an innovative partner.

The interaction between the customer and the vendor needs to be more push-pull than it currently is—right now, it seems like either the operator simply dictates terms to the vendor, or the vendor pretty much dictates architecture to the operator. We need to find a middle ground. The vendor does need to have a solid solution architecture, but the architecture needs to be flexible, as well, and the blocks used to build that architecture need to be usable in ways not anticipated by the vendor’s design folks.

On the other hand, we need to stop chasing features. This isn’t just true of vendors, this is true of operators as well. You get feature lists because that’s what you ask for. Often, operators ask for feature lists because that’s the easiest thing to measure, or because they already have a completely screwed up design they are trying to brownfield around. The worst is—“we have this brownfield that we just can’t get rid of, so we want to build yet another overlay on top, which will make everything simpler.” After about the twentieth overlay a system crash becomes a matter of when rather than if.

Posted in CULTURE, DESIGN, SKILLS, WRITTEN