network design

Giving the Monkey a Smaller Club

Over at the ACM blog, there is a terrific article about software design that has direct application to network design and architecture.

The problem is that once you give a monkey a club, he is going to hit you with it if you try to take it away from him.

What do monkeys and clubs have to do with software or network design? The primary point of interaction is security. The club you intend to make your network operator’s life easier is also a club an attacker can use to break into your network, or damage its operation. Clubs are just that way. If you think of the collection of tools as not just tools, but also as an attack surface, you can immediately see the correlation between the available tools and the attack surface. One way to increase security is to reduce the attack surface, and one way to reduce the attack surface is tools, reduce the number of tools—or the club.

The best way to reduce the attack surface of a piece of software is to remove any unnecessary code.

Consider this: the components of any network are actually made up of code. So to translate this to the network engineering world, you can say:

The best way to reduce the attack surface of a network is to remove any unnecessary components.

What kinds of components? Routing protocols, transport protocols, and quality of service mechanisms come immediately to mind, but the number and kind of overlays, the number and kind of virtual networks might be further examples.

There is another issue here that is not security related specifically, but rather resilience related. When you think about network failures, you probably think of bugs in the code, failed connectors, failed hardware, and other such causes. The reality is far different, however—the primary cause of network failures in real life is probably user error in the form of misconfiguration (or misconfiguration spread across a thousand routers through the wonders of DevOps!). The Mean Time Between Mistakes (MTBM) is a much larger deal than most realize. Giving the operator too many knobs to solve a single problem is the equivalent of giving the monkey a club.

Simplicity in network design has many advantages—including giving the monkey a smaller club.

Responding to Readers: Automated Design?

Deepak responded to my video on network commodization with a question:

What’s your thoughts on how Network Design itself can be Automated and validated. Also from Intent based Networking at some stage Network should re-look into itself and adjust to meet design goals or best practices or alternatively suggest the design itself in green field situation for example. APSTRA seems to be moving into this direction.

The answer to this question, as always, is—how many balloons fit in a bag? 🙂 I think it depends on what you mean when you use the term design. If we are talking about the overlay, or traffic engineering, or even quality of service, I think we will see a rising trend towards using machine learning in network environments to help solve those problems. I am not convinced machine learning can solve these problems, in the sense of leaving humans out of the loop, but humans could set the parameters up, let the neural network learn the flows, and then let the machine adjust things over time. I tend to think this kind of work will be pretty narrow for a long time to come.

There will be stumbling blocks here that need to be solved. For instance, if you introduce a new application into the network, do you need to re-teach the machine learning network? Or can you somehow make some adjustments? Or are you willing to let the new application underperform while the neural network adjusts? There are no clear answers to these questions, and yet we are going to need clear answers to them before we can really start counting on machine learning in this way.

If, on the other hand, you think of design as figuring out what the network topology should look like in the first place, or what kind of bandwidth you might need to build into the physical topology and where, I think machine learning can provide hints, but it is not going to be able to “design” a network in this way. There is too much intent involved here. For instance, in your original question, you noted the network can “look into itself” and “make adjustments” to better “meet the original design goals.” I’m not certain those “original design goals” are ever going to come from machine learning.

If this sounds like a wishy-washy answer, that’s because it is, in the end… It is always hard to make predictions of this kind—I’m just working off of what I know of machine learning today, compared to what I understand of the multi-variable problem of network designed, which is then mushed into the almost infinite possibilities of business requirements.

What Kind of Design?

In this short video I work through two kinds of design, or two different ways of designing a network. Which kind of designer are you? Do you see one as better than the other? Which would you prefer to do, are you right now?

What is a Failure Domain?

“No, I wouldn’t do that, it will make the failure domain too large…”
“We need to divide this failure domain up…”

Okay, great—we all know we need to use failure domains, because without them our networks will be unstable, too complex, and all that stuff, right? But what, precisely, is a failure domain? It seems to have something to do with aggregation, because just about every network design book in the world says things like, “aggregating routes breaks up failure domains.” It also seems to have something to do with flooding domains in link state protocols, because we’re often informed that you need to put in flooding domain boundaries to break up large failure domains. Maybe these two things contain a clue: what is common between flooding domain boundaries and aggregating reachability information?

Hiding information.

But how does hiding information create failure domain boundaries?


If Router B is aggregating 2001:db8:0:1::/64 and 2001:db8:0:2::/64 to 2001:db8::/61, then changes in the more specific routes will be hidden from Router A. This hiding of information means a failure of one of these two more specific routes does not cause Router A to recalculate what it knows about reachability in the network. Hence a failure at 200:db8:0:1::/64 doesn’t impact Router A—which means Router A is in a different failure domain than 2001:db8:0:1::/64. Based on this, we can venture a simple definition:

A failure domain is any group of devices that will share state when the network topology changes.

This definition doesn’t seem to work all the time, though. For example, what if the metric of the 2001:db8::/61 aggregate at Router B depends on the higher cost more specific among the routes covered (or hidden)? If the aggregate metric is taken from the 2001:db8:0:1::/64 route attached to Router C, then when that link fails, the aggregate cost will also change, and Router A will need to recalculate reachability. This situation, however, doesn’t change our definition of what a failure domain is, it just alerts us that failure domains can “leak” information if they’re not constructed carefully. In fact, we can trace this back to the law of leaky abstractions— hiding information is just a form of abstraction, and all abstractions leak information in some way to at least one other subsystem within the larger system.

Another, harder, example, might be that of the flooding domain boundary in a link state protocol. Assume, for a moment, that Router A is in Level 2, Routers C and D are in Level 1, and Router B is in both Level 1 and Level 2. Further assume no route aggregation is taking place. What will happen when 2001:db8:0:1::/64 fails? As Router B is advertising 2001:db8:0:1::/64 as if it were directly connected, Router A will see the destination disappear, but it will not see the network topology change. The state of the topology seems to be in one failure domain, while the state of reachability seems to be in another, overlapping, failure domain. This appearance is, in fact, a reflection of reality. Failure domains can—and do—overlap in this way all the time. There’s nothing wrong with overlapping failure domains, so long as you recognize they exist, and therefore you actually look (and plan) for them.

Finally, consider what happens if some link attached to Router A fails. Unless routes are being intentionally leaked into the Lelvel 1 flooding domain at Router B, Router C won’t see any changes to the network, either in topology or reachability. After all Router C is just depending on Router B’s attached bit to build a default route it uses to reach any destination outside the local flooding domain. This means failure domains can be assymetric. What breaks a failure domain for one router doesn’t always break it for another. Again, this is okay, so long as you’re aware of this situation, and recognize it when and where it happens.

So given these caveats, the definition of a failure domain above seems to work well. We can refine it a little, but the general idea of a failure domain as a set of devices that will (or must) react to a change in the state of the network is a good place to start.

The Design Mindset (5)

So far, in our investigation of the design mindset, we’ve—

We also considered the problem of interaction surfaces in some detail along the way. This week I want to wrap this little series up by considering the final step in design, act. Yes, you finally get to actually buy some stuff, rack it up, cable it, and then get to the fine joys of configuring it all up to see if it works. But before you do… A couple of points to consider.

It’s important, when acting, to do more than just, well, act. It’s right at this point that it’s important to be metacongnitive—to think about what we’re thinking about. Or, perhaps, to consider the process of what we’re doing as much as actually doing it. To give you two specific instances…

First, when you’re out there configuring all that new stuff you’ve been unpacking, racking/stacking, and cabling, are you thinking about how to automate what you’re doing? If you have to do it more than once, then it’s probably a candidate for at least thinking about automating. If you have to do it several hundred times, then you should have spent that time automating it in the first place. But just don’t think automation—there’s nothing wrong with modifying your environment to make your production faster and more efficient. I have sets of customized tool sets, macros, and work flows I’ve built in common software like MS Word and Corel Draw that I’ve used, modified, and carried from version to version over the years. It might take me several hours to build a new ribbon in a word processor, or write a short script that does something simple and specific—but spending that time, more often than not, pays itself back many times over as I move through getting things done.

In other words, there is more to acting than just acting. You need to observe what you’re doing, describe it as a process, and then treat it as a process. As Deming once said—If you can’t describe what you are doing as a process, you don’t know what you’re doing.

Second, are you really thinking about what you’ll need to measure for the next round of observation? This is a huge problem in our data driven world—

Perhaps the greatest challenge facing the big data world is the recognition that data analysis is not the same thing as question answering.

Being data driven is important, but we can get so lost in being doing what we’re doing that we forget what we actually set out to do. We get caught up in the school of fish, and lose sight of the porpoise. Remember this: when you’re acting, always think about what you’re going to be doing next, which is observing. The more you work being able to observe, think about what you’re going to need to observe and why.

The Design Mindset (4)—Interaction Surfaces

Before talking the final point in the network design mindset, ,act, I wanted to answer an excellent question from the comments from the last post in this series: what is surface?

The concept of interaction surfaces is difficult to grasp primarily because it covers such a wide array of ideas. Let me try to clarify by giving a specific example. Assume you have a single function that—

  • Accepts two numbers as input
  • Adds them
  • Multiplies the resulting sum by 100
  • Returns the result

This single function can be considered a subsystem in some larger system. Now assume you break this single function into two functions, one of which does the addition, and the other of which does the multiplication. You’ve created two simpler functions (each one only does one thing), but you’ve created an interaction surface between the two functions—you’ve created two interacting subsystems within the system where there only used to be one. This is a really simple example, I know, but consider a few more that might help.

  • The routing information carried in OSPF is split up into external routes being carried in BGP, and internal routes being carried in OSPF. You’ve gone from one system with more state to two systems with less state, but you’ve created an interaction surface between the two protocols—they must now work together to build a complete forwarding table.
  • A single set of hosts with different access policies are split onto multiple virtual topologies on the same physical network. You’ve simplified the amount of state in filtering, but you’ve created an interaction surface between the two virtual topologies, between the two topologies and the control plane, and you’ve exposed new shared risk groups where a single physical failure can cause multiple logical ones. Hence you’ve traded state in one control plane for interaction surfaces between multiple control planes.

Even two routers communicating within a single control plane can be considered an interaction surface. This breadth of definition is what makes it so very difficult to define what an interaction surface is. To understand how interaction surfaces cause technical debt, I want to point you to a recent paper on machine learning and technical debt.

In this paper, we focus on the system-level interaction between machine learning code and larger systems as an area where hidden technical debt may rapidly accumulate. At a system-level, a machine learning model may subtly erode abstraction boundaries. It may be tempting to re-use input signals in ways that create unintended tight coupling of otherwise disjoint systems. Machine learning packages may often be treated as black boxes, resulting in large masses of “glue code” or calibration layers that can lock in assumptions. Changes in the external world may make models or input signals change behavior in unintended ways, ratcheting up maintenance cost and the burden of any debt. Even monitoring that the system as a whole is operating as intended may be difficult without careful design.

Most systems are designed for a specific “world,” or set of circumstances at a specific point in time. As this “world” changes (over time), subsystems are sheared off and replaced, requirements are changed for each individual subsystem, and external interfaces the original designer counted on are changed and/or replaced to meet updated requirements.

Interaction surfaces aren’t a bad thing; they help us divide and conquer in any given problem space, from modeling to implementation. At the same time, interaction surfaces are all to easy to introduce without thought—hence their deep connection to technical debt.

If you want to learn more about interaction surfaces, you should really pick up my book on network complexity, here.

Next time, I’ll (hopefully) finish this series on the design mindset.

The CORD Architecture

Edge provider networks, supporting DSL, voice, and other services to consumers and small businesses, tend to be more heavily bound by vendor specific equipment and hardware centric standards. These networks are built around the more closed telephone standards, rather than the more open internetworking standards, and hence they tend to be more expensive to operate and manage. As one friend said about a company that supplies this equipment, “they just print money.” The large edge providers, such as AT&T and Verizon, however, are not endless pools of money. These providers are in a squeeze between the content providers, who are taking large shares of revenue, and consumers, who are always looking for a lower price option for communications and new and interesting services.

If this seems like an area that’s ripe for virtualization to you, you’re not alone. AT&T has been working on a project called CORD for a few years in this area; they published a series of papers on the topic that make for interesting reading:

On the last site, there is an actual reference implementation document that walks through much of the hardware they’ve selected. The documents certainly push every “modern” idea in the stack, including OpenStack, OpenFlow, Docker containers, and commodity/white box hardware.

My impressions?

First, I’m not convinced Openflow is going to represent the best set of tradeoffs possible at scale, even if it can truly scale to tens of thousands of devices. No matter how magical centralizing the control plane might seem in terms of simplicity and ease of management, the control plane is, and always will be, akin to a database, and hence will be subject to the rules of CAP theorem. Telco operators are, of course, still more comfortable in the centralized management end of things, so they might be willing (and potentially even able) to make the trade offs required to centralize the control plane. This isn’t going to set a wide pattern for the rest of the world, where a hybrid model of some kind is still going to be a better fit.

Second, nothing in the paper discusses the problems of hardware abstraction and common management among the various white boxes. If there’s one thing I’ve seen up close and personal since moving to a hyper scaler, it’s that one of the more difficult problems to face in the wild is that you either lose performance with a single common interface across a range of chipsets, or you need to find a way to manage the multiple chipset interfaces, including having a plan for future changes. There is a practical limit to the number of chipsets you can support either way, and a practical limit on the number of devices you can run an open software package on efficiently. These are just the realities of life intruding on the whole “white box” game—you’re moving from buying everything from Compaq to caring about who makes the chipset inside. I don’t know if this piece of the puzzle is being glossed over, or if they’ve already faced this reality in the hardware reference platform choice (how much of the hardware platform choice is being guided by this problem).

Third, I wonder how much efficiency in processing and network utilization they’re giving up to get rid of these racks of proprietary equipment. Again, there’s little mention of the problem in the papers I’ve read so far, but clearly there’s going to be some additional bandwidth usage and trombone routing across these fabrics. What is the impact on services, quality of service, and other “stuff?” It would be interesting to see how these questions are worked out in real deployments.

Finally, the information provided in the papers all point to a small spine & leaf at the pod level. You can be certain these are being pulled onto larger spine and leaf fabrics in local points of presence, data centers, or foglets—whatever you want to call the things any more—but there’s little mention of the overall network architecture in the public information I’ve seen. Providers will be providers, after all; the network architecture, overall, is still considered a fairly strategic piece of information.

Nonetheless, if you want a broad idea of how NFV, white box, and other interesting ideas are proposed to play out in the world of large scale edge providers, this is an interesting area to read in.

The Design Mindset (3)

So you’ve spent time asking what, observing the network as a system, and considering what has actually been done in the past. And you’ve spent time asking why, trying to figure out the purpose (or lack of purpose) behind the configuration and design choices made in the past. You’ve followed the design mindset to this point, so now you can jump in and make like a wrecking ball (or a bull in a china shop), changing things so they’re better, and the new requirements you have can fit right in. Right?


As an example, I want to take you back to another part of a story I told here about my early days in the networking world. Before losing the war over Banyan Vines, I actually encountered an obstacle that should have been telling—but I was too much of a noob at the time to recognize it for the warning it really was. At the time, I had written a short paper comparing Vines to Netware; the paper was, perhaps, ten pages long, and I thought it did a pretty good job of comparing the two network operating systems. Heck, I’d even put together a page showing how Vines was a better fit with the (then mandated) OSI model (don’t ask how you mandate a model, it’s an even longer story).

I proudly ran off 20 or 30 copies of my paper, and passed them around to various folks. At a follow-up meeting, one of the more experienced folks said: “This is a great paper, if you’re just trying to justify your choice. If you’re actually trying to compare the two solutions, however, it’s not so great.” Ouch.

The problem is this is precisely what we do most of the time. Just like the “what versus why” in the first two steps, we tend to have solutions we’re comfortable with, we read about in that rag in the airport lounge, we heard about at some conference, or we really want to learn because it’s new and shiny (or it helps us study for a certification—I once worked on a network where the admins changed routing protocols because they wanted to study for the CCIE). We all worry about our skill set going stale, and we’ve all thought, at one time or another, that if we don’t deploy this new technology, we won’t have the skill set we need to find that next job. The fear of being left out drives far too many of our decisions.

How do you prevent yourself from going down one of these side alleys and making a bad decision? Even worse, how do you prevent incurring technical debt by reaching too far in the other direction—not being bold about getting rid of the old clutter that’s ossified into your network over the years and replacing it with newer ideas?

I’m going to suggest, as always, that we return to the complexity model, carefully asking about each side of the triangle to find the answers we need to decide what we should do. Specifically, ask—

  • What state (the amount of state and the speed at which it changes) will this new technology add to the network overall, and to already existing systems in the network?
  • What is being optimized for (here you need to go back to the business drivers), and why?
  • What new interaction surfaces am I creating, or which interaction surfaces am I making deeper? Where am I increasing the dependencies between two existing systems, and where am I adding new ones? Make certain you look for leaky abstractions here, as all abstractions leak in some way.

Remembering this simple rule of thumb might help: If you’ve not found the tradeoffs, you’ve not looked hard enough.

The Design Mindset (2)

In a comment from last week’s post on the design mindset, which focuses on asking what through observation, Alan asked why I don’t focus on business drivers, or intent, first. This is a great question. Let me give you three answers before we actually move on to asking why?

Why can yuor barin raed tihs? Because your mind has a natural ability to recognize patterns and “unscramble” them. In reality, what you’re doing is seeing something that looks similar to what you’ve seen before, inferring that’s what is meant now, and putting the two together in a way you can understand. It’s pattern recognition at it’s finest—you’re already a master at this, even if you think you’re not. This is an important skill for assessing the world and reacting in (near) real time; if we didn’t have this skill, we wouldn’t be able to tolerate the information inflow we actually receive on a daily basis.

The danger is, of course, that you’re going to see a pattern you think you recognize and skip to the next thing to look at without realizing that you’ve mismatched the pattern. These pattern mismatches can be dangerous in the real world—like the time I bumped against an engine part that was so hot it felt cool, leaving me with a permanent scar on my leg. So the point of “observe first” is to deal with reality as it is on the ground, rather than seeing the pattern, inferring the intent, and moving on to the “next thing.”

Once you’ve observed, it’s time to try and understand why. When you’re asking why, you don’t ever want to stop with the obvious answer. Instead, you want to be like the pesky eight year old who’s discovered that “why” is the ultimate question to drive your parents nuts.

“Why is this aggregation configured here?”
“Because we needed to break up the failure domain.”
“Why did you need to break up the failure domain just here?”
“Because we thought it was too big.”
“Why did you think the failure domain was too big?”
“Because we had a convergence problem once around that area.”
“Why did you care about the speed at which the network converges?”
“Because we have this application, you see… And if you don’t stop asking why, I’m going to slap you silly!”

Why is a multilevel question; ultimately you want to get back to the actual business driver for any particular item of configuration. In the end, if you can’t connect a configuration to a business driver (and don’t settle for, “it’s a best common practice,” by the way), then you need to set that bit of configuration or reality aside in a special pool to be considered later. Using this process, you’re likely to find a lot of stuff that might not need to be there. By making the connections, you might be able to find another way to look at the problem that will help you radically simplify the design.

What’s often hiding behind the why that can’t be connected to a specific business driver is either “because we could” or “because we know that technology.” The time that I worked through converting a network from OSPF to IS-IS because several folks on the networking staff were studying for the CCIE comes to mind…

The complexity model can, as always, help guide your why questions—specifically focusing on optimization, as this is where you’re most likely going to match network design with actual business requirements. Within the complexity model, of course, you’re going the be trading off optimization against state and surface, so the process is going to look something like this most of the time:


Business drivers often lead to primarily optimization requirements, to which the designer can respond either by increasing the amount or speed of state in the network, or by adding overlays and other systems, which in turn increases the surfaces in the network. At some point, someone cries “uncle!,” and says, “it’s time to reduce complexity here, because this network is eating our OPEX!” This is where really understanding why starts to prove useful, because it allows you to start seeing where optimization can be realistically traded off against simplicity by rethinking the relationship between optimization, state, and surface.

We’ll consider this more deeply when we get to the decision phase in the next post.

The Design Mindset (1)

How does a network designer, well, actually design something? What process do you use as a designer to get from initial contact with a problem to building a new design to deploying a solution? What is the design mindset? I’ve been asking myself just this question these last few months, going through old documentation to see if I can find a pattern in my own thinking that I could outline in a way that’s more definite than just “follow my example.” What I discovered is my old friends the OODA loop and the complexity model are often in operation.

So, forthwith, a way to grab hold of a designer mindset, played out in an unknown number of posts.

Begin with observe. Observation is the step we often skip, because we’ve either worked on the network for so long “we don’t need to,” or we’re “so experienced we know what to look for.” This is dangerous. Let me give you an example.

A long time ago, in a small shire on the borders of reality (it seems now), I worked on a piece of equipment we called the funnyman. Specifically, this was the FNM-1, which was used to detect runway visibility range (the distance from which you could see a light set to a specific intensity, or brightness). I don’t want to dig too much into how this worked (it did involve drum memory, though, which the RVR-400, a similar unit, replaced with diode memory, so-to-speak; the first tech order in the series is here), but there was a reason we called it the funnyman. You would trace through any given circuit (discrete component digital logic, yay!), until you came to an inverter. “Well,” you’d say to yourself, “I know what an inverter does, so I know what the signal should look like on the other end of this path.” So you check, and find it’s not. How did that happen? Because, grasshopper, you didn’t really observe. If you trace the circuit a bit farther, you’d find a second inverter (and sometimes even a third, something we called the brother-in-law effect, because we figured the brother-in-law of the guy who designed this thing must have supplied the components).

But what should the designer looking into a design problem observe? That’s precisely the right question: what? Good questions might be something like—

  • What protocol has been deployed?
  • Where is information being hidden?
  • Where are the failure domains?
  • Where is redundancy introduced in the network?
  • Where are there failure points?
  • What quality of service is applied where?

Do not, at this stage in the game, even think about asking why any of this is. You need to clearly separate “what” from “why;” once you start down the “why” question, you’re going to get lost in a narrow silo of thought, which is going to cut your observation short, and hence you’ll miss the second (or third) inverter in that circuit. It’s very important to truly observe what has been done carefully, and as fully as possible.

Let me give you a hint about what to ask “what” about: the complexity model.

You should specifically ask—

  • What is generating the state?
  • How fast is the state changing?
  • What surfaces?
  • What is being optimized for?

If you ask about each of the three items in the complexity model—state, surface, optimization—it will help you uncover much more of the what you need to be observing. The first step in the design mindset, then, is to observe and ask what.

P.S. If you really want to get your CCDE, you need to read this series of posts.

Slicing and Dicing Flooding Domains (2)

The first post in this series is here.

Finally, let’s consider the first issue, the SPF run time. First, if you’ve been keeping track of the SPF run time in several locations throughout your network (you have been, right? Right?!? This should be a regular part of your documentation!), then you’ll know when there’s a big jump. But a big jump without a big change in some corresponding network design parameter (size of the network, etc.), isn’t a good reason to break up a flooding domain. Rather, it’s a good reason to go find out why the SPF run time changed, which means a good session of troubleshooting what’s probably an esoteric problem someplace.

Assume, however, that we’re not talking about a big jump. Rather, the SPF run time has been increasing over time, or you’re just looking at a particular network without any past history. My rule of thumb is to start really asking questions when the SPF run time gets to around 100ms. I don’t know where that number came from—it’s a “seat of the pants thing,” I suppose. Most networks today seem to run SPF in less than 10ms, though I’ve seen a few that seem to run around 30ms, so 100ms seems excessive. I know a lot of people do lots of fancy calculations here (the speed of the processor and the percentage of processor used for other things and the SPF run time and…), but I’m not one for doing fancy stuff when a simple rule of thumb seems to work to alert me to problems going into a situation.

But before reaching for my flooding domain slicing tools because of a 100ms SPF run time, I’m going to try and bring the time down in other ways.

First, I’m going to make certain incremental and partial SPF are enabled. There’s little to no cost here, so just do it. Second, I’m going to look at using exponential timers to batch up large numbers of changes. Third, I’m going to make certain I’m removing all the information I can from the link state database—see the answer to the third question on the LSDB size, above.

If you’ve done all this—keeping in mind that you need to consider the trade offs (if you don’t see the trade offs, you’re not looking hard enough), then I would consider splitting the flooding domain. If it sounds like I would never split a flooding domain for purely performance or technical reasons, you’ve come to the right conclusion on reading these two posts.

All that said, let me tell you the real reasons I would split a flooding domain.

First, just to make my life easier when troubleshooting the network. The router has a lot larger capacity for looking through screens full of link state information than I do. At 2AM, when the network is down, any little advantage I can give myself to troubleshoot the network faster is worth considering.

Second, again, to make my life easier in the troubleshooting process. Go back and think about the OODA loop. Where can I observe the network to best understand what’s going on? If you thought, “at the flooding domain boundary,” you earn a gold star. You can pick it up at the local office supply store.

Third, to break apart the network in case of a real failure—to provide a “firewall” (in the original sense of the word, rather than the appliance sense) to keep one part of the network from going down when another part falls apart.

Finally, to provide a “choke point” where you can implement policy.

So in the end—you shouldn’t build the world’s largest flooding domain just because you can, and you shouldn’t build a ton of tiny flooding domains just because you can. The technical reasons for slicing and dicing a flooding domain aren’t really that strong, but don’t discount using flooding domains on a more practical level.

Slicing and Dicing Flooding Domains (1)

This week two different folks have asked me about when and where I would split up a flooding domain (IS-IS) or area (OSPF); I figured a question asked twice in one week is worth a blog post, so here we are…

Before I start on the technical reasons, I’m going to say something that might surprise long time readers: there is rarely any technical reason to split a single flooding domain into multiple flooding domains. That said, I’ll go through the technical reasons anyway.

There are really three things to think about when considering how a flooding domain is performing:

  • SPF run time
  • flooding frequency
  • LSDB size

Let’s look at the third issue first, the database size. This is theoretically an issue, but it’s really only an issue if you have a lot of nodes and routes. I can’t ever recall bumping up against this problem, but what if I did? I’d start by taking the transit links out of the database entirely—for instance, by configuring all the interfaces that face actual host devices as passive interfaces (which you should be doing anyway!), and configuring IS-IS to advertise just the passive interfaces. You can pull similar tricks in OSPF. Another trick here is to make certain point-to-point Ethernet links aren’t electing a DIS or DR; this just clogs the database up with meaningless information.

The second issue, the flooding frequency, is more interesting. Before I split a flooding domain because there is “too much flooding,” I would want to look at several things to make certain I’m not doing a lot of work for nothing. Specifically, I would want to look at:

  • Why am I getting all these LSAs/LSPs? A lot of flooding means a lot of changes, which generally means instability someplace or another. I would either want to be able to justify the instability or stop it, rather than splitting a flooding domain to react to it. Techniques I would look at here include interface dampening (if it’s available) and roping off a flapping network behind a nailed up redistributed route of some sort.
  • If the rate of flooding can only be controlled to some degree, or it’s valid, then I would want to look at how I can configure the network to control the flooding in a way that makes sense. Specifically, I’m going to look at using exponential backoff to manage bursts of flooding events while keeping my convergence time down as much as I can, and I’m going to consider my LSP generation intervals to make certain I account for bursts of changes on a single intermediate system. This is where we get into tradeoffs, however—at some point you need to ask if tuning the timers is easier/simpler than breaking the flooding domain into two flooding domains, particularly if you can isolate the bursty parts of the network from the more stable parts.

There are probably few networks in the world where tuning flooding will not hold the rate of flooding down to a reasonable level.

Continued next week…


Liskov Substitution and Modularity in Network Design

Furthering the thoughts I’ve put into the forthcoming book on network complexity…

One of the hardest things for designers to wrap their heads around is the concept of unintended consequences. One of the definitional points of complexity in any design is the problem of “push button on right side, weird thing happens over on the left side, and there’s no apparent connection between the two.” This is often just a result of the complexity problem in its base form — the unsolvable triangle (fast/cheap/quality — choose two). The problem is that we often don’t see the third leg of the triangle.

The Liskov substitution principle is one of the mechanisms coders use to manage complexity in object oriented design. The general idea is this: suppose I build an object that describes rectangles. This object can hold the width and the height of the rectangle, and it can return the area of the rectangle. Now, assume I build another object called “square” that overloads the rectangle object, but it forces the width and height to be the same (a square is type of rectangle that has all equal sides, after all). This all seems perfectly normal, right?

Now let’s say I do this:

  • declare a new square object
  • set the width to 10
  • set the height to 5
  • read the area

What’s the answer going to be? Most likely 25 — because the order of operations set the height after the width, and internally the object sets the width and height to be equal, so the last value input into either field wins.

What’s the problem? Isn’t this what I should expect? The confusion is this — the square class is based on the rectangle class, so which behavior wins? But the result is pushing a button over here, and ending up with an unexpected result over there. Taking this one step further, what if you modified the rectangle class to include depth, and then added a function that returns volume? A user might expect the square class to represent a perfectly formed cube (all sides equal), based on the it’s behavior in the past — but that’s not what is going to happen. The solution, from a coding perspective, is to build a new class that underlies both the square and the rectangle — to find a more fundamental construct, and use that as a foundation.

In general, you want to find a foundation which will not change no matter what you build on it — in other words, you want to find a foundation that, when substituted for another foundation in the future, will not modify the objects sitting on top of the foundation.

Hopefully, you’ve tracked me this far. I know this is a bit abstract, but it comes back to network design in an important way. The simplest place to see this is in the data center, where you have an underlay and an overlay. To apply Liskov’s substitution principle here, you could say, “I want to build a physical underlay that will allow me to change it in the future without impacting the overlay.” Or, “I want to be able to change the overlay without impacting how the applications run on the fabric.” Now — take this concept and apply it to the entire network, wide area to data center fabric.

You should always strive to build a physical infrastructure that can be replaced without impacting the control plane. You should also strive to build a control plane that can be replaced without impacting the operation of the applications running on the network. Just like you should be able to replace the physical layer under IP, and not impact the operation of TCP on top in any meaningful way.

Now — the real world is always messier than the virtual worlds we build in our minds. Abstractions are always going to leak, and the interaction surface between any pair of underlying and overlying layers is always going to be deeper and broader than you think when you first look at the problem. None of this negates the end goals, however. Keep the interaction surfaces in a design shallow and narrow, and thinking through “what happens if I replace this piece with a new one later on?”

Hierarchical and modular design, by the way, already operate on these sorts of principles (in theory). They’re just rules of thumb, or design patterns, laid on top of the more foundational concepts. The closer we get to the foundational principles in play, the more we can take this sort of thinking and apply it along every interaction surface in a design, and the more we can move from black art to science in designing networks that work.

Architect or Designer?

Are you an architect or designer? What’s the difference? A reader asked this last week in email — my (probably) less than perfect response.

First, we have to dispense with this objection — network people aren’t “architects” in the first place. Nor are they “engineers.” Okay, so… A challenge: what else would you call someone who designs and builds things? When someone says, “You’re not a real architect, because you don’t build buildings, and you’re not held responsible for your work,” I tend to reply, “Why are you talking to me if I don’t exist?”

I’ve probably spent a lot more time than most people thinking about what the difference between design and architecture is, as it was a major issue when the CCDE and CCAr were split into two certifications (long ugly story — but then again, whenever marketing is involved, it normally is). With the help of some psychos (psychometricians, actually, but saying you worked with psychos for seven years to develop certification just sounds cooler somehow), we came up with some differentiators that I think are useful.

The difference is in focus, not task — the designer focuses on a solution to a narrower engineering problem, the architect focuses on a solution set to solve a business problem. To put it a different way, network designers focus on the seam between technologies, while network architects focus on the seam between people (or business) and technology.

To give some practical examples…

The network designer pushes back on technology choices; the network architect pushes back on the business itself. To put this another way, the network designer tries to understand what the business is trying to do; the network architect tries to get in front of where the business is going. This is why the CCDE is a practical in a lab, but the CCAr is an actual encounter with “a customer.” Candidates who don’t push back aren’t supposed to pass the CCAr.

The network designer looks at a technology toolbox and asks, “what fits?” The network architect looks at the underlying problem space, and says, “what sort of tool would fit here?” Once figuring out the shape of the tool, the architect can then go out and find the tool, rather than getting trapped in the current toolbox. For instance, I expect a designer to know when to implement EIGRP and when to implement IS-IS. I expect an architect to look at the problem and say, “this requires a control plane with characteristic x,” then to go find out what those solutions are, rather than assuming a tool he already knows is the right one.

The network designer can explain the idea, the network architect can sell it.

The network designer asks, “how can I build a fabric with 5 9’s of reliability?” The network architect asks, “based on these requirements, we need a fabric with 5 9’s of reliability,” or even, “I know what you read in a magazine on a plane, or what you received from someone in email, but we really don’t need 5 9’s of reliability, and here’s why…”

When looking for an architecture, I’m looking for goals that can be and are broken into smaller parts, and why those goals are important to the business. I don’t want to solve the goals, I want to know what they are. When looking for a design, I’m looking for how to solve the goals.

In reality, most network architects can do design — they wouldn’t be very useful if they couldn’t. But if you want to move from being a designer to being an architect, you need to learn to push back and to challenge, to see the shape of a technology that’s needed rather than just choosing something you know, and you need to ask why rather than how.

To truly cross into the architecture world, the designer needs to cross the boundary between technology and people.