If you haven’t found the tradeoff…

This week, I ran into an interesting article over at Free Code Camp about design tradeoffs. I’ll wait for a moment if you want to go read the entire article to get the context of the piece… But this is the quote I’m most interested in:

Just like how every action has an equal and opposite reaction, each “positive” design decision necessarily creates a “negative” compromise. Insofar as designs necessarily create compromises, those compromises are very much intentional. (And in the same vein, unintentional compromises are a sign of bad design.)

In other words, design is about making tradeoffs. If you think you’ve found a design with no tradeoffs, well… Guess what? You’ve not looked hard enough. This is something I say often enough, of course, so what’s the point? The point is this: We still don’t really think about this in network design. This shows up in many different places; it’s worth taking a look at just a few.

Hardware is probably the place where network engineers are most conscious of design tradeoffs. Even so, we still tend to think sticking a chassis in a rack is a “future and requirements proof solution” to all our network design woes. With a chassis, of course, we can always expand network capacity with minimal fuss and muss, and since the blades can be replaced, the life cycle of the chassis should be much, much, longer than any sort of fixed configuration unit. As for port count, it seems like it should always be easier to replace line cards than to replace or add a box to get more ports, or higher speeds.

Cross posted at CircleID

But are either of these really true? While it might “just make sense” that a chassis box will last longer than a fixed configuration box, is there real data to back this up? Is it really a lower risk operation to replace the line cards in a chassis (including the brains!) with a new set, rather than building (scaling) out? And what about complexity—is it better to eat the complexity in the chassis, or the complexity in the network? Is it better to push the complexity into the network device, or into the network design? There are actually plenty of tradeoffs consider here, as it turns out—it just sometimes takes a little out of the box thinking to find them.

What about software? Network engineers tend to not think about tradeoffs here. After all, software is just that “stuff” you get when you buy hardware. It’s something you cannot touch, which means you are better off buying software with every feature you think you might ever need. There’s no harm in this right? The vendor is doing all the testing, and all the work of making certain every feature they include works correctly, right out of the box, so just throw the kitchen sink in there, too.

Or maybe not. My lesson here came through an experience in Cisco TAC. My pager went off one morning at 2AM because an image designed to test a feature in EIGRP had failed in production. The crash traced back to some old X.25 code. The customer didn’t even have X.25 enabled anyplace in their network. The truth is that when software systems get large enough, and complex enough, the laws of leaky abstractions, large numbers, and unintended consequences take over. Software defined is not a panacea for every design problem in the world.

What about routing protocols? The standards communities seem focused on creating and maintaining a small handulf of routing protocols, each of which is capable of doing everything. After all, who wants to deploy a routing protocol only to discover, a few years later, that it cannot handle some task that you really need done? Again, maybe not. BGP itself is becoming a complex ecosystem with a lot of interlocking parts and pieces. What started as a complex idea has become more complex over time, and we now have engineers who (seriously) only know one routing protocol—because there is enough to know in this one protocol to spend a lifetime learning it.

In all these situations we have tried to build a universal where a particular would do just fine. There is another side to this pendulum, of course—the custom built network of snowflakes. But the way to prevent snowflakes is not to build giant systems that seem to have every feature anyone can ever imagine.

The way to prevent landing on either extreme—in a world where every device is capable of anything, but cannot be understood by even the smartest of engineers, and a world where every device is uniquely designed to fit its environment, and no other device will do—is consider the tradeoffs.

If you haven’t found the tradeoffs, you haven’t looked hard enough.

A corollary rule, returning to the article that started this rant, is this: unintentional compromises are a sign of bad design.

IS-IS Multi Instance: RFC8202

Multi-Instance IS-IS
One of the nice things about IS-IS is the ability to run IPv6 and IPv4 in the same protocol, over a single instance. So long as the two topologies are congruent, deploying v6 as dual stack is very simply. But what if your topologies are not congruent? The figure below illustrates the difference.

In this network, there are two topologies, and each topology has two different set of level 1/level 2 flooding domain boundaries. If topology 1 is running IPv4, and topology 2 is running IPv4, it is difficult to describe such a pair of topologies with “standard” IS-IS. The actual flooding process assumes the flooding domain boundaries are on the same intermediate systems, or that the two topologies are congruent.

One way to solve this problem today is to use IS-IS multi-topology, which allows the IPv6 and IPv4 routing information to be carried in separate TLVs so two different Link State Databases (LSDBs), so each IS can compute a different Shortest Path Tree (SPT), one for IPv4, and another for IPv6. Some engineers might find the concept of multi-topology confusing, and it seems like it might be overkill for other use cases. For instance, perhaps you do not care about incongruent topologies, but you do care about carrying IPv6 in a separate instance of IS-IS just to see how it all works, with the eventual goal of combining IPv4 and IPv6 into a single instance of IS-IS. Or perhaps there is some new feature you want to try on a production network in a different instance of IS-IS, without impacting the IS-IS instance that provides the production routing information. There are a number of use cases, of course—you can probably imagine a few.

How can these kind of use cases be solved in IS-IS? In EIGRP, for instance, the Autonomous System (AS) number is used as the EIGRP protocol port number on the wire, allowing multiple EIGRP processes to run in parallel on the same link (even though this capability has never been implemented, as far as I know). Some sort of parallel capability would need to be created for IS-IS; this is what RFC8202, IS-IS Multi-Instance, provides. Not only does RFC8202 provide this capability, it also illustrates an interesting use case for using TLVs in protocol design, rather than fixed length fields.

For readers not familiar with these concepts, fixed length field protocols marshal data into fields of a fixed length, each of which represents a single piece of information. The metadata required to interpret the data is carried entirely in the protocol specification; the protocol itself does not carry any information about the information, only the information itself. The positive attribute of working with fixed length fields is the amount of information carried on the wire is minimized. The negative is that any change in the protocol requires deploying a new version throughout the network. It is difficult to “ignore” bits that are carried without introducing failures. Further, in a fixed length field format, as new information is pushed into the protocol, either new packet formats must be created and handled, or the length of any given packet must be increased.

Type/Length/Value (TLV) formats carry the kind of information in the specification, but they carry information about the kind of information being carried, and the size of the information being carried, in the protocol itself. This means the packet format is larger, but the protocol is more flexible.

In the case of RFC8202, adding this kind of multi-instance capability in a fixed length field formatted protocol would require a shift in the packet format. In a TLV based protocol, like IS-IS, you can add new features can be added by adding a new TLV; this is precisely what RFC8202 does. To provide multi-instance capability, RFC8202 adds a new multi-instance TLV to the IS-IS PDU, which is the “outer packet format” used to carry every other kind of IS-IS information, including hellos, link state information, etc. This new TLV carries an instance ID, which differentiates each instance of IS-IS.

The instance IDs must be configured the same on each IS so they match in order to build adjacencies. Point to point and broadcast operation works the same as “standard” IS-IS, including Designated Intermediate System operation on each instance, etc. IS-IS would be implemented on each IS so each instance will have a separate LSDB, an a separate SPT would be computed across each of these LSDBs. The other key factor will be implementing multiple routing tables, and then finding some way to route traffic using the correct routing table. In the case of IPv4 and IPv6, this is fairly simple to sort out, but it would be more complex in other cases.

RFC8202 adds a new and interesting capability to IS-IS—it may take some time for vendors to implement and deploy this new capability, but this should make IS-IS more flexible in real world situations where multiple interior gateway protocol on a single network.

History of Networking: Radia Perlman and Spanning Tree

On this episode of the History of Networking over at the Network Collective, we interviewed Radia Perlman about the origin of Spanning Tree. She is really delightful, and we plan on bringing her back on in the future to talk about other topics in the history of networking technology.

Forthcoming: Computer Networking Problems and Solutions

The new book should be out around the 29th of December, give or take a few days. For readers interested in what Ethan and I (and Ryan, and Pete Welcher, and Jordan Martin, and Nick Russo, and… the entire list is in the front matter), the general idea is essentially grounded in RFC1925, rule 11. There is really only a moderately sized set of problems computer system needs to solve in order to carry data from one application to another. For instance, in order to transport data across a network, you need to somehow format the data so everyone can agree on how to write and read it, ensure the data is carried without errors, ensure neither the sender nor the receiver overrun or underrun one another, and find some way to allow multiple applications (hosts, etc.), to talk over the same media. These four problems have somewhat proper names, of course: marshaling, which involves dictionaries and grammars; error control; flow control; and multiplexing. So the first step in understanding network engineering is to figure out what the problems are, and how to break them apart.

Once you understand the problems, then you can start thinking about solutions. As it turns out, across the entire history of network engineering, there are only a handful of useful solutions for each of these problems. There are, for instance, only two ways to solve the error control problem; you can either find some way to carry enough information to correct any errors on the fly, or you can detect errors and then discard the data, or request a retransmission. There are lot of different ways to implement either one of these solutions, but this is all implementation detail.

For each solution, some implementation is then chosen and considered in some detail (sometimes more detail, sometimes less). The idea is not to provide a definitive guide to every solution, but rather just to give the reader a sense of how the solutions outlined have been implemented in either past or current protocols.

Overall, then, the book teaches a simple workflow:

  • What problems are being solved here?
  • What is the range of possible solutions?
  • Which solution am I looking at in this particular case, and what are the common tradeoffs for this solution?

The result is supposed to teach readers how to think about these problems as much as it is what to think. Of course the book is not going to follow this model perfectly, nor is it going to teach every possible technology. And I’m certain there are many problems, and some solutions, we have either missed or simply left out because of space and time constraints.

Is this a new introduction? I have called it this in the past, but people who have read it say calling this an introduction is a mis-characterization. Rather, this book is more about the fundamentals you need to work as an effective network engineer. It is about finding the patterns, and then using the patterns to ask the right questions.

Some has asked me: who do you think will really read this book? While the book is designed to be usable as a college textbook, hopefully the style of writing, the level of explanations, and the range of material will be attractive to every network engineer out there. There will always be things you didn’t learn about, and yet you need to know the fundamentals. Knowing the fundamentals this way should allow you to understand each space enough to either develop a lot more knowledge quickly, or to simply know when something does not sound right.

There are four sections in the book. The first covers data transport, an area I have not written about before. The second covers the control plane; this is, I would guess, the deepest part of the book technically, at least partially because this is where I spend most of my time. The third section of the book considers different aspect of network design. The final section moves away from the problem/solution/example format, and discusses current trends in network engineering. The total page count is probably close to 600, but I’ve not seen the proofs, so I’m not entirely certain.

The goal here is not to become a millionaire (although my wife would really like that!). Rather, it is to present a new way of thinking about the problem. A way that can be helpful in managing the firehose network engineers deal with, and a way that will help engineers truly become engineers.

Again, look for a publication date around the 29th of December, give or take a few days. I am certain it will not roll over into the new year, unless some major disaster strikes, but it might be published a few days earlier than the currently planned date. I’ll post here when it is available, so you can just watch this space if you’d like.

Adminstravia 20171009

Where’s Russ?

This is my second week of PhD seminars this fall—the only time in this program I intend to take two seminars back to back. One of the two was, in fact, very deep philosophy, so I was pretty taxed trying to pull the material together.

At the same time, the book has passed through technical review, and is now in author review. I hope it soon be in proofs. The combination of these two things, the book and the PhD work, along with multiple other things, is what caused me to call a pause in blogging for these two weeks. The date to watch is the 29th of December. It might be released earlier, but it is hard to tell right now. I will do a post a little later this week describing the book for those who are interested.

Tonight (Monday) I will be recording a new Network Collective show on the Intermediate System to Intermediate System (IS-IS) protocol, and we have a long list of History of Networking guests to bring on. The history material has turned out to be absolutely fascinating; I am thankful we have the connections available, and the recording venue, and someone to do the hard work of editing and posting this material. This is the kind of stuff that will help generations of network engineers to understand their craft.

This week, then, I will be back to blogging. Overall, I intend to be a little lighter on content here than I have been. I am working on several other projects that I would like to get moving in the next six months to a year, and I often think I am just pushing too much content in this one place. Writing is a habit for me; sometimes it is just too easy to push content here, rather than finding some other place it could be useful. Whenever I post someplace else, of course, I try to build a backlink to that content, so you can follow everything I am writing.

So, expect around ten posts per week from this point forward, perhaps twelve. Up until now, I have been posting at least twelve, and often fifteen, times per week.

Now, back to our regularly scheduled content.