Hints and Principles: Applied to Networks

While software design is not the same as network design, there is enough overlap for network designers to learn from software designers. A recent paper published by Butler Lampson, updating a paper he wrote in 1983, is a perfect illustration of this principle. The paper is caleld Hints and Principles for Computer System Design. I’m not going to write a full review here–you should really go read the paper for yourself–but rather just point out some useful bits of the paper.

The first really useful point of this paper is Lampson breaks down the entire field of software design into three basic questions: What, How, and When (or who)? Each of these corresponds to the goals, techniques, and processes used to design and develop software. These same questions and answers apply to network design–if you are missing one of these three areas, then you are probably missing some important set of questions you have not answered yet. Each of these is also represented by an acronym: what? is STEADY, how? is AID, and when? is ART. Let’s look at a couple of these in a little more detail to see how Lampson’s system works.

STEADY stands for simple, timely, efficient, adaptable, dependable, and yummy. Simple is just what it sounds like –reduce complexity. I’m not entirely on board with Lampson’s description of simplicity, which seems to focus on abstraction–abstraction is one useful tool, but anyone who reads my work regularly knows I’m rather more careful about abstraction than most because it involves often-unexamined tradeoffs. Timely primarily relates to “is there a market for this,” in software design; for networks it might be better put as “does the business need this now or later?” Efficient is one of those tradeoffs involved in abstraction–what I might call one of the various ways of optimizing a system. Adaptable means just what it sounds like–are you creating technial debt that must be resolved later? Dependable could be translated to resilience in network design, but it would also relate to many aspects of security, and even the jitter and delay elements in application support.

Yummy is one many network engineers will not be familiar with, but is worth considering. If I’m reading Lampson right here, another way to say this might be “easy to consume.” Why do you want your customers to be able to consume the network easily? Because you do not want them running off and using the cloud (for instance) because they find committing and understanding resources in your network so difficult. We have, for far to long, assumed that “easy to consume” in the network design world means “just plug it into the wall.” It’s not that simple.

The second one, AID, stands for approximate, incremental, and divide & conquer. These are, again, easily adaptable to network design. You don’t need to make the design perfect the first time. In fact, as a young artist one thing that was drilled into my head was that the perfect was the enemy of the good–it’s better to get it approximately right, right now, than perfectly right ten years down the road (when no-one cares any longer). Incremental speaks to modularization, scale-out, and lifecycle management, for instance.

While not every principle here can be applied, a lot of them can. Having them listed out in an easy-to-remember format like this is a great design aid–learn these, and use them.

Underhanded Code and Automation

So, software is eating the world—and you thought this was going to make things simpler, right? If you haven’t found the tradeoffs, you haven’t looked hard enough. I should trademark that or something! 🙂 While a lot of folks are thinking about code quality and supply chain are common concerns, there are a lot of little “side trails” organizations do not tend to think about. One such was recently covered in a paper on underhanded code, which is code designed to pass a standard review which be used to harm the system later on. For instance, you might see at some spot—

if (buffer_size=REALLYLONGDECLAREDVARIABLENAMEHERE) {
/* do some stuff here */
} /* end of if */

Can you spot what the problem might be? In C, the = is different than the ==. Which should it really be here? Even astute reviewers can easily miss this kind of detail—not least because it could be an intentional construction. Using a strongly typed language can help prevent this kind of thing, like Rust (listen to this episode of the Hedge for more information on Rust), but nothing beats having really good code formatting rules, even if they are apparently arbitrary, for catching these things.

The paper above lists these—

  • Use syntax highlighting and typefaces that clearly distinguish characters. You should be able to easily tell the difference between a lowercase l and a 1.
  • Require all comments to be on separate lines. This is actually pretty hard in C, however.
  • Prettify code into a standard format not under the attacker’s control.
  • Use compiler warnings in static analysis.
  • Forbid unneeded dangerous constructions
  • Use runtime memory corruption detection
  • Use fuzzing
  • Watch your test coverage

Not all of these are directly applicable for the network engineer dealing with automation, but they do provide some good pointers, or places to start. A few more…

Yoda assignments are named after Yoda’s constant placement of the subject after the verb (or in a split infinitive)—”succeed you will…” It’s not technically wrong in terms of grammar, but it is just hard enough to understand that it makes you listen carefully and think a bit harder. In software development, the variable taking the assignment should be on the left, and the thing being assigned should be on the right. Reversing these is a Yoda assignment; it’s technically correct, but it’s harder to read.

Arbitrary standardization is useful when there are many options that ultimately result in the same outcome. Don’t let options proliferate just because you can.

Use macros!

There are probably plenty more, but this is an area where we really are not paying attention right now.

Link State in DC Fabrics

If you don’t normally read IPJ, you should. Melchoir and I have an article up in the latest edition on link state in DC fabrics.

To make a case for linkstate protocols in DC fabric underlays, an extensive examination of the positive and negative aspects of BGP—and the other available protocols—is essential. Ultimately, it is up to individual operators to decide which protocol is “the best” for their application, a decision based on business and operational—as well as technical—reasons.

Read the whole thing here.

The Hedge 55: Nick Carter and Flock Networks

Nick Carter joins Tom and I to discuss Flock Networks. What is Flock Networks?

The Flock Networks IP routing suite is designed for unparalleled performance and massive scale. It has been implemented from scratch so it can excel on modern hardware. Each component of the IP routing suite is able to run in parallel, without being slowed down or interrupted by any other component.

download

You Cannot Increase the Speed of Light (RFC1925 Rule 2)

According to RFC1925, the second fundamental truth of networking is: No matter how hard you push and no matter what the priority, you can’t increase the speed of light.

However early in the world of network engineering this problem was first observed (see, for instance, Tanenbaum’s “station wagon example” in Computer Networks), human impatience is forever trying to overcome the limitations of the physical world, and push more data down the pipe than mother nature intended (or Shannon’s theory allows).

One attempt at solving this problem is the description of an infinitely fat pipe (helpfully called an “infan(t)”) described in RFC5984. While packets would still need to be clocked onto such a network, incurring serialization delay, the ability to clock an infinite number of packets onto the network at the same moment in time would represent a massive gain in a network’s ability, potentially reaching speeds faster than the speed of light. The authors of RFC5984 describe several attempts to build such a network, including black fiber, on which the lack of light implies data transmission. This is problematic, however, because a lack of information can be interpreted differently depending on the context. A pregnant pause has far different meaning than a shocked pause, for instance, or just a plain pause.

The team experimenting with faster than light communication also tried locking netcats up in boxes, but this seemed to work and not work at the same time. Finally, the researchers settled on ESP based forwarding, in which two people with a telepathic link transmit data over long distances. They compute the delay of such communication at around 350ms, regardless of the distance involved. This is clearly a potential faster than speed-of-light communication medium.

Another plausible option for building infinitely fat pipes is to broadcast everything. If you could reach an entire region in some way at once, it might be possible to build a full mesh of hosts, each transmitting to every other host in the region at the same time, ultimately constituting an infinitely fat pipe. Such a system is described in RFC6217, which describes the transmission of broadcast packets across entire regions using air as a medium. This kind of work is a logical extension of the stretched Ethernet segments often used between widely separated data centers and campuses, only using a more easily accessed medium (the air). The authors of this RFC note the many efficiencies gained from using broadcast only transmission modes, such as not needing destination addresses, the TCP three-way handshake process, and acknowledgements (which reportedly consume an inordinate amount of bandwidth).

Foreseeing the time when faster than speed-of-light networking would be possible, R. Hinden wrote a document detailing some of the design considerations for such networks which was published as RFC6921. This document is primarily concerned with the ability of the TCP three-way handshake to support an environment where the network’s speed of transmission is so much faster than the speed at which packets are processed or clocked onto the network that an acknowledgement is received before the original packet is transmitted. R. Hinden suggests that it might be possible to use packet drops in normal networks to emulate this behavior, and find some way to solve it in case faster than speed-of-light networks become generally available—such as the ESP network described in RFC5984.

More recent, and realistic, work in faster than speed-of-light networking has been undertaken by the proposed Quantum Networking Research Group in the IRTF. You can read the proposed architecture for a quantum Internet here.

Reducing Complexity through Interaction Surfaces

A recent paper on network control and management (which includes Jennifer Rexford on the author list—anything with Jennifer on the author list is worth reading) proposes a clean slate 4d approach to solving much of the complexity we encounter in modern networks. While the paper is interesting, it’s very unlikely we will ever see a clean slate design like the one described, not least because there will always be differences between what the proper splits are—what should go where.

There is one section of the paper that eloquently speaks to current architecture, however. The authors describe a situation where routing and packet filters are used together to prevent one set of hosts from reaching another set of hosts. Changes in the network, however, cause the packet filters to be bypassed, opening up communications between these two sets of hosts.

This is exactly the problem we so often face in network engineering today—overlapping systems used to solve a single problem do not pay attention to the same signals or information to do their jobs. So here’s a thought about an obvious way to reduce the complexity of your network—try to use one tool to do one job. Before the days of automation, this was much harder to do. There was no way to distribute QoS configurations, for instance, or access lists, much less what might be considered an “easy way.” Because of this, it made some kind of sense to use routing protocols as a sort of distributed database and policy engine to move filters and the like around.

Today, however, we have automation. Because of this, it makes more sense to use automation to manage as much data plane policy as you can, leaving the routing protocol to do its job—provide reachability across an ever-changing network. There are still things, like traffic steering and prefix distribution rules, which should stay inside routing. But when you put routing filters in place to solve a data plane problem, it might be worth thinking about whether that is the right thing to do any longer.

Automation, in this case, can change everything.