DESIGN – rule 11 reader

Hedge 265: Out of Band Networks

Russ — Fri, 04 Apr 2025 16:32:36 +0000

Out of band management networks were once more common than they are today. Should we go back to building out of band management networks? Should out of band management networks be virtual or physical? How can we sell out of band management networks to the folks paying the bills? Daryll Swer joins Tom Ammon and Russ White to discuss the importance of OOB management.

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-265.mp3

download

Architecture and Process

Russ — Fri, 12 Apr 2024 14:37:52 +0000

Driving through some rural areas east of where I live, I noticed a lot of collections of buildings strung together being used as homes. The process seems to start when someone takes a travel trailer, places it on blocks (a foundation of sorts) and builds a spacious deck just outside the door. Over time, the deck is covered, then screened, then walled, becoming a room.

Once the deck becomes a room, a new deck is built, and the process begins anew. At some point, the occupants decide they need a place to store some sort of equipment, so they build a shed. Later, the shed is connected to the deck, the whole thing becomes an extension of the living space, and a new shed is built.

These … interesting … places to live are homes to the people who live in them. They are often, I assume, even happy homes.

But they are not houses in the proper sense of the word. There is no unifying theme, no thought of how traffic should flow and how people should live. They are a lot like the paths crisscrossing a campus—built where the grass died.

Our networks are like these homes—they are not houses so much as historical records of every new idea and vendor marketing drive. There is no architecture, there are many architectures strung together with a set of tightly wound and closely followed processes.

We need to support some new application or service? Throw a new overlay on top. There was a massive failure last night? Let’s spend hours closely examining our process and find some way to prevent the failure by adding a few new steps.

We never ask if our goals are realistic because we don’t have any goal beyond: “Let’s solve this problem right now.” We never ask if there is some future goal might be better served by using this solution or that—the future will take care of itself.

Why do we fail to attend to architecture?

Architecture is hard, and we often fail to correctly anticipate the future. This perceives architecture as a detailed plan—but there’s no reason it should be. An architecture can be a rough, and slow-changing, outline of how the network is laid out, a set of services the network supports, and a set of technologies the network will use to support those services. An architecture recognizes and defines limits as well as capabilities.

Processes are comforting. When things fail, we can always take comfort in saying: “I followed the process!”

We live in a culture of now. All problems take two hours, two days, two weeks, or too long. There is no history, there is no future, there is only an ever-present now. If I cannot have it now, it is not worth having at all.

These problems are hard to solve because they are cultural rather than technical—and the network engineering world has a strong bias towards “don’t tell me how it works, tell me how to configure it.” We present this as a problem-solving mentality, even though it causes more problems than it solves.

We need to rebalance the way we think about architectures and processes—perhaps we would get better results by combining lightweight architectures with lightweight processes, instead of relying on heavy processes with no architecture to build maintainable networks and sustainable lives.

Simple or Complex?

Russ — Tue, 19 Sep 2023 19:00:22 +0000

A few weeks ago, Daniel posted a piece about using different underlay and overlay protocols in a data center fabric. He says:

There is nothing wrong with running BGP in the overlay but I oppose to the argument of it being simpler.

One of the major problems we often face in network engineering—and engineering more broadly—is confusing that which is simple with that which has lower complexity. Simpler things are not always less complex. Let me give you a few examples, all of which are going to be controversial.

When OSPF was first created, it was designed to be a simpler and more efficient form of IS-IS. Instead of using TLVs to encode data, OSPF used fixed-length fields. To process the contents of a TLV, you need to build a case/switch construction where each possible type a separate bit of code. You must count off the correct length for the type of data, or (worse) read a length field and count out where you are in the stream.

Fixed-length fields are just much easier to process. You build a structure matching the layout of the fixed-length fields in memory, then point this structure at the packet contents in-memory. From there, you can just use the structure’s contents to directly access the data.

Over time, however, as new requirements have been pushed into IGPs, OSPF has become much more complex while IS-IS has remained relatively constant (in terms of complexity). IS-IS went through a bit of a mess when transitioning from narrow to wide metrics, but otherwise the IS-IS we use today is the same protocol we used when I first started working on networks (back in the early 1990s).

OSPF’s simplicity, in the end, did not translate into a less complex protocol.

Another example is the way we transport data in BGP. A lot of people do not know that BGP’s original design allowed for carrying information other than straight reachability in the protocol. BGP speakers can negotiate multiple sessions, with each session carrying a different kind of information. Rather than using this mechanism, however, BGP has consistently been extended using address families—because it is simpler to create a new address family than it is to define a new kind of data parallel with address families.

This has resulted in AFs that are all over the place, magic numbers, and all sorts of complexity. The AF solution is simpler, but ultimately more complex.

Returning to Daniel’s example, running a single protocol for underlay and overlay is simpler, while running two different protocols is less simple. However, I’ve observed—many times—that running different protocols for underlay and overlay is less complex.

Why? Daniel mentions a couple of reasons, such as each protocol has a separate purpose, and we’re pushing features into BGP to make it serve the role of an IGP (which is, in the end, going to cause some major outages—if it hasn’t already).

Consider this: is it easier to troubleshoot infrastructure reachability separately from vrf reachability? The answer is obvious—yes! What about security? Is it easier to secure a fabric when the underlay never touches any attached workload? Again—yes!

We get this tradeoff wrong all the time. A lot of times this is because we are afraid of what we do not know. Ten years ago I struggled to convince large operators to run BGP in their networks. Today no-one runs anyone other than BGP—and they all say “but we don’t have anyone who knows OSPF or IS-IS.” I’ve no idea what happened to old-fashioned network engineering. Do people really only have one “protocol slot” in their brains? Can people really only ever learn one protocol?

Or maybe we’ve become so fixated on learning features that we no longer no protocols?

I don’t know the answer to these questions, but I will say this—over the years I’ve learned that simpler is not always less complex.

Hedge 144: IPv6 Lessons Learned

Russ — Thu, 25 Aug 2022 17:00:58 +0000

We don’t often do a post-mortem on the development and deployment of new protocols … but here at the Hedge we’re going to brave these deep waters to discuss some of the lessons we can learn from the development and deployment of IPv6, especially as they apply to design and deployment cycles in the “average network” (if there is such at thing). Join us as James Harr, Tom Ammon, and Russ White consider the lessons we can learn from IPv6’s checkered history.

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-144.mp3

download

Route Servers and Loops

Russ — Tue, 16 Aug 2022 17:00:15 +0000

From the question pile: Route servers (as opposed to route reflectors) don’t change anything about a BGP route when re-advertising it to a peer, whether iBGP or eBGP. Why don’t route servers cause routing loops (or other problems) in a BGP network?

Route servers are often used by Internet Exchange Points (IXPs) to distribute routes between connected BGP speakers. BGP route servers

Don’t change anything about a received BGP route when advertising the route to its peers (other BGP speakers)
Don’t install routes received through BGP into the local routing table

Shouldn’t using route servers in a network—pontentially, at least—cause routing loops or other BGP routing issues? Maybe a practical example will help.

Assume b, e, and s are all route servers in their respective networks. Starting at the far left, a receives some route, 101::/64, and sends it on to b,, which then sends the unmodified route to c. When c receives traffic destined to 101::/64, what will happen? Regardless of whether these routers are running iBGP or eBGP, b will not change the next hop, so when c receives the route, a is still the next hop. If there’s no underlying routing protocol, c won’t know how to reach A, so it will ignore the route and drop the traffic. Even if there is an underlying routing protocol, c’s route to 101::/64’s route passes through b, and b isn’t installing any routing information learned from BGP into its local routing table (because it’s a route server). b is going to drop traffic destined to 101::/64.

We can solve this simple problem by adding a new link between the two clients of the route server, as shown in the center diagram. Here, d sends 101::/64 to e, which then sends the unchanged route to g. Since g has a direct connection to d, we can assume g will send traffic destined to 101::/64 directly to d, where it will be forwarded to the destination. Why wouldn’t d and g peer directly instead of counting on e to carry routes between them? In most cases this kind of indirect peering is done to increase network scale. If there are thousand routes like d and g, it will be simpler for them all to peer to e than to build a full mesh of connections.

Why not use a route reflector rather than a route server in this situation? Route reflectors can only be used to carry routes between iBGP peers. If d, e, and g are all in different autonomous systems, route reflectors cannot be used to solve this problem.

But this brings us back to the original question—route reflectors use the cluster list to prevent loops within an AS (the cluster list is similar in form and function to the AS path carried between autonomous systems, but it uses router ID’s rather than AS numbers to describe the path)?

If you have multiple route servers connected to one another you can, in fact, form routing loops.

In this network, a is sending 101::/64 to b, which is then sending the route, unmodified, to e. Because of some local policy, e is choosing the path through a, which means e forwards traffic destined to 101::/64 to c. At the same time, e is advertising 101::/64 to b, which is then sending the route (unmodified) to a, and a is choosing the path through c. In this case, a permanent (persistent) routing loop is formed through the control plane, primarily because no single BGP speaker has a complete view of the topology. The two route servers, by hiding the real path to 101::/64, makes is possible to form a routing loop.

The deploy route servers without forming these kinds of loops—

BGP speakers learning routes from route servers should be directly connected—there should not be destinations reachable via some “hidden” intermediate hop
Route servers should send all the routes they learn from clients; they should not use bestpath to choose which routes to send to clients

These restrictions prevent routing loops from forming when deploying route servers—but they also restrict the use of route servers to situations like carrying routes between BGP speakers connected to a single fabric.

Cisco filed a patent some time back describing a method to prevent routing loops when using BGP route servers; it makes interesting reading for folks who want to dive a little deeper.

RFC9199: Lessons in Large-scale Service Deployment

Russ — Mon, 08 Aug 2022 18:51:55 +0000

While RFC9199 (are we really in the 9000’s?) is targeted at large-scale DNS deployments–specifically root zone operators–so it might seem the average operator won’t find a lot of value here.

This is, however, far from the truth. Every lesson we’ve learned in deploying large-scale DNS root servers applies to any other large-scale user-facing service. Internally deployed DNS recursive servers are an obvious instance, but the lessons here might well apply to a scheduling, banking, or any other multi-user application accessed from a lot of places by a lot of different users. There are some unique points in DNS, such as the relatively slower pace of database synchronization across nodes, but the network-side lessons can still be useful for a lot of applications.

What are those lessons?

First, using anycast dramatically improves performance for these kinds of services. For those who aren’t familiar with the concept, anycase turns an IP address into a service identifier. Any host with a copy (or instance) or a given service advertises the same address, causing the routing table to choose the (topologically) closest instance of the service. If you’re using anycast, traffic destined to your service will automatically be forwarded to the closest server running the application, providing a kind of load sharing among multiple instances through routing. If there are instances in New York, California, France, and Taipei, traffic from users in North Carolina will be routed to New York and traffic from users in Singapore will be routed to Taipei.

You can think of an anycast address something like a cell tower; users within a certain desintance will be “captured” by a particular instance. The more copies of the service you deploy, the smaller the geographic region the service will support. Hence you can control the number of users using a particular copy of the service by controlling the number and location of service copies.

To understand where and how to deploy service instances, create anycast catchment maps. Again, just like a wifi signal coverage map, or a cellphone tower coverage map, it’s important to understand which users will be directed to which instances. Using a catchment map will help you decide where new instances need to be deployed, which instances need the fastest links and hardware, etc. The RIPE ATLAS pobes and looking glass servers are good ways to start building such a map. If the application supports a large number of users, you might be able to convince the application developer to include some sort of geographic information in requests to help build these maps.

Third, when deploying service instance, pay as much attention to routing and connectivity as you do the number of instances deployed. As the authors note, sometimes eight instances will provide the same level of service as several thousand instances. The connectivity available into each instance of the service–bandwidth, delay, availability, etc.–still has a huge impact on service speed.

Fourth, reduce the speed at which the database needs to be synchronized where possible. Not every piece of information needs to be synchronized at the same rate. The less data being synchronized, the more consistent the view from multiple users is going to be.

RFC9199 is well worth reading, even for the average network engineer.

Hedge 134: Ten Things

Russ — Wed, 15 Jun 2022 11:51:03 +0000

One of the many reasons engineers should work for a vendor, consulting company, or someone other than a single network operator at some point in their career is to develop a larger view of network operations. What are common ways of doing things? What are uncommon ways? In what ways is every network broken? Over time, if you see enough networks, you start seeing common themes and ideas. Just like history, networks might not always be the same, but the problems we all encounter often rhyme. Ken Celenza joins Tom Ammon, Eyvonne Sharp, and Russ White to discuss these common traits—ten things I know about your network.

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-134.mp3

download

Revisiting BGP Convergence

Russ — Mon, 06 Jun 2022 17:00:16 +0000

My video on BGP convergence elicited a lot of . . . feedback, mainly concerning the difference between convergence in a data center fabric and convergence in the DFZ. Let’s begin here—BGP hunt and the impact of the MRAI are very real in the DFZ. Withdrawing a route can take several minutes.

What about the much more controlled environment of a data center fabric?

Several folks pointed out that the MRAI is often set to 0 in DC fabrics (and many implementations by default). Further, almost all implementations will use an MRAI of 0 for the first received update, holding the second and subsequent advertisements by the MRAI. Several folks also pointed out that all the paths through a DC fabric are the same length, so the second part of the equation is also very small.

These are good points—how do they impact BGP convergence? Let’s use the network below, a small slice of a five-stage butterfly fabric, to think it through. Assume every router is in a different AS, so all the peering sessions are eBGP.

Start with A losing its connection to 101::/64—

T1: A withdraws its route from B and C
T2: B withdraws its route from D and E, C withdraws its route from F and G
T3: D and E withdraw their routes from H, F and G withdraw their routes from K
T4: H and K withdraw their routes from L

Note that L cannot receive one withdraw to remove the route from its local table; it must receive withdraws from both H and K. There’s no way at L to tell whether a withdraw from H means 101::/64 is no longer reachable at all or it is no longer reachable through H. For path-vector protocols, like distance-vector, the neighbor through each path must be considered independently.

What does an MRAI of 0 do? Each of the routers in the network will process the withdraw as soon as they receive it and send a withdraw to their peers as soon as they’re done processing it. The process still takes the same number of steps but each step is much faster.

What is the impact of all the paths’ equal length? So long as every router processes the withdraw at around the same speed, there is no hunt. If H and K send their withdraws simultaneously, L should receive them simultaneously and remove the route to 101::/64 from its table rather than switching from one path to the other. Even if they send their withdraws at different times, L removes entries from its ECMP table until it receives the last withdrawal.

If MRAI slows down convergence, why set it to anything other than 0? Because it’s improbable that every router in the network will process each withdraw simultaneously.

Before 101::/64 is withdrawn, H will be using the paths through D and E for ECMP, but it is only going to be advertising one of these two routes to L—say the path through E. When B sends withdraws to D and E, assume E processes the withdraw just a little faster than D. When H receives D’s withdraw, it will send an implicit withdraw to L, updating the AS path to include D rather than E. A few moments later, D sends a withdraw. H processes this withdraw and sends a withdraw to L.

L has received one implicit withdraw and one withdraw from H because of processing time differentials. In a larger fabric, with a much larger fan-out, the likelihood of differences in timing is much higher and spread across a broader range of possibilities. You can (generally) expect H to send about half as many implicit withdraws as it has paths towards the destination before sending an actual withdraw. If there are eight paths between B and H, H would likely send 3 or 4 implicit withdraws before sending a withdraw.

What if the MRAI were set to 1 second at H? H would receive E’s withdrawal and set the MRAI timer. Assuming D’s withdraw arrives within that 1-second MRAI, H will receive D’s withdraw, squash the implicit withdraw, and send a single withdraw to L instead. Setting the MRAI to something other than 0 reduces the number of updates and reduces processing.

Setting the MRAI to 1 second, and forcing it to trigger across all updates, might improve convergence time—or not. Without experimenting with setting the MRAI to different values at different places in a real network, it is hard to know. Replacing the routers, link speeds, changing processor load, and increasing memory can all have an impact on the “best” settings for optimal convergence.

the bottom line

There will be no hunt in BGP convergence in a network with multiple equal-length/equal-cost paths. This is what we should expect. Because the maximum path length minus the best (current) path length will always be 0, the network will converge as quickly as each router can process and advertise withdraws, bounded by the MRAI.

Setting the MRAI to 0 improves convergence speed at the cost of additional updates, especially in wide fan-out data center fabrics. It’s hard to know whether setting the MRAI to 0 or 1 will give you better convergence speeds; you have to try it to see.

I still think we should be moving away from BGP as our underlay protocol in all but the largest data center fabrics. IGPs (like IS-IS and RIFT) will converge more quickly, are easier to configure and manage, and using different protocols for the underlay and overlay breaks up failure and security domains in useful ways. I know I’m tilting at a windmill on this point, but still …

BGP Policies (Part 2)

Russ — Mon, 14 Mar 2022 17:00:50 +0000

At the most basic level, there are only three BGP policies: pushing traffic through a specific exit point; pulling traffic through a specific entry point; preventing a remote AS (more than one AS hop away) from transiting your AS to reach a specific destination. In this series I’m going to discuss different reasons for these kinds of policies, and different ways to implement them in interdomain BGP.

There are many reasons an operator might want to select which neighboring AS through which to send traffic towards a given reachable destination (for instance, 100::/64). Each of these examples assumes the AS in question has learned multiple paths towards 100::/64, one from each peer, and must choose one of the two available paths to forward along.

In the following network—

From AS65004’s perspective…

Transit providers primarily choose the most optimal exit from their AS to reduce the amount of peering settlement they are paying by using and maintaining settlement-free peering where possible and reducing the amount of time and distance traffic is carried through their network (through hot potato routing, discussed in more detail below).
If, for instance, AS65004 has a paid peering relationship with AS65002, and a contract with AS65003 which is settlement-free so long as the traffic between AS65004 and AS65003 is roughly symmetric. AS65004 has two roughly equal-cost paths (both have the same AS Path length) towards 100::/64. In this situation, AS 65004 is going to direct traffic towards AS65003 to maintain symmetrical traffic flows and direct any remaining traffic towards AS65002.

This kind of balancing is normally done through a controller or network management system that monitors the balance of traffic with AS65003, adjusting the preference of sets of routes to attain the correct balance with AS65003 while reducing the costs of using the link to AS65002 to the minimum possible.

From AS65005’s perspective…

AS65005 can either send traffic originating in AS65001, received from AS65002, and destined to AS65006, to either AS65004—a peer—or AS65006—a customer. The internal path between the entry point for this traffic is longer if the traffic is carried to AS65006, and shorter if the traffic is carried to AS65004. These longer and shorter paths give rise to the concepts of hot and cold potato routing.

If AS65006 is paying AS65005 for transit, AS65005 would normally carry traffic across the longer path to its border with AS65006. This is cold potato routing. AS65005’s reason for choosing this option is to maximize revenue from the customer. First, as the link between AS65005 and AS65006 becomes busier, AS65006 is likely to upgrade the link, generating additional revenue for AS65005. Even if the traffic level is not increasing, steady traffic flow encourages the customer to maintain the link, which protects revenue. Second, AS65005 can control the quality-of-service AS65006 receives by keeping the traffic within its network for as long as possible, improving the customer’s perception of the service they are receiving.
Cold potato routing is normally implemented by setting the preference on routes learned from customers, so these routes are preferred over all routes learned from peers.
If AS6006 is not paying AS65005 for transit, it is to AS65005’s advantage to carry the traffic as short a distance as possible. In this case, although AS65005 is directly connected to AS65006, and the destination is in AS65006, AS65005 will choose to direct the traffic towards its border with AS65004 (because there is a valid route learned for this reachable destination from AS65004).

This is hot potato routing—like the kids’ game, you want to hold on to the traffic for as short an amount of time as possible. Hot potato routing is normally implemented by setting the preference on routes to the same and relying on the IGP metric component of the BGP bestpath decision process to find the closest exit point.

Next week I’ll continue this series on BGP interdomain policies… feel free to leave a comment if you think I’ve explained something incorrectly, etc.

BGP Policies (part 1)

Russ — Mon, 07 Mar 2022 18:00:43 +0000

In the following network—

Examining this from AS65006’s Perspective …

Assuming AS65006 is an edge operator (commonly called enterprise, but generally just originating and terminating traffic, and never transiting traffic), there are several reasons the operator may prefer one exit point (through an upstream provider), including:

An automated system may determine AS65004 has some sort of brownout; in this case, the operator at 65006 has configured the system to prefer the exit through AS65005
The traffic destined to 100::/64 may require a class of service (such as video transport) AS65004 cannot support (for instance, because the link between AS65006 and 65005 has low bandwidth, high delay, or high jitter)

The most common way this kind of policy would be implemented is by setting the BGP LOCAL_PREFERENCE (called preference throughout the rest of this document) on routes learned from AS65005 higher than the preference on routes learned from AS65004.

Another common case is AS65006 would prefer to send traffic to AS65005 only when the destination is in an AS directly connected to AS65005 itself, while sending all other traffic through AS65004. This is common when a one provider has good local and poor global coverage, while the other provider has good global but poor local coverage.

For instance, if AS65006 is in a somewhat isolated part of the world, such as some parts of the South Pacific or Central America, there may be a local provider, such as AS65004, that has solid connectivity to most of the other edge operators in the local geographic region but charges a high cost for transiting to the rest of the global Internet. A second provider, such as AS65005, charges less to reach destinations beyond the local geographic region but is relatively expensive to use when sending traffic to other edge operators within the local region.

Preference, by itself, would be difficult to use in this case, because the operator would need to distinguish between geographically local and geographically distant routes. To implement this kind of policy, the operator would accept partial routes from the geographically local provider (AS65004 in this case) and set a high preference on these routes. Partial routes are typically those the local provider learns only from other directly connected autonomous systems, and hence would only include operators in the local geographic region. The operator would then accept full routes, or the entire Internet global routing table, from the second provider (AS65005 in this case) and set a lower preference.

An alternative way to implement geographic preference is using communities. Many transit providers mark individual reachable destinations with information about where the route originated. NTT, for instance, describes their geographic marking here. An operator can create filters using regular expressions to change the preference of a route based on its geographic origin.

This is not a common way to solve the problem because the filtering rules involved can become complex—but it might be deployed if local providers do not offer partial routes for some reason.

Another alternate to implement geographic preference is to use a regular expression filter to set the preference for each reachable destination based on the length of the AS Path. Theoretically, routes originating within the local region should have an AS Path of one or two hops, while those originating outside a region should have longer AS Paths.
This generally does not work for two reasons. First, the average length of an AS Path (after prepending is factored out) is about 4 hops in the entire global Internet—and it is easy to reach four hops even within a local region in some situations. Second, many operators prepend the AS Path to manage inbound entry point preference; these prepended hops must be factored out to use this method.

Next week I’ll continue this series on BGP interdomain policies… feel free to leave a comment if you think I’ve explained something incorrectly, etc.

Quality is (too often) the missing ingredient

Russ — Mon, 03 Jan 2022 18:00:32 +0000

Software Eats the World?

I’m told software is going to eat the world very soon now. Everything already is, or will be, software based. To some folks, this sounds completely wonderful, but—leaving aside the privacy issues—I still see an elephant in the room with this vision of the future.

Quality.

Let me give you some recent examples.

First, ceiling fans. Modern ceiling fans, in case you didn’t know, don’t rely on the wall switch and pull chains. Instead, they rely on remote controls. This is brilliant—you can dim the light, change the speed of the fan, etc., from a remote control. No unsightly chains hanging from the ceiling.

Well, it’s brilliant so long as it works. I’ve replaced three of the four ceiling fans in my house. Two of the remote controls have somehow attached themselves to two of the three fans. It’s impossible to control one of the fans without also controlling the other. They sometimes get into this entertaining mode where turning one fan off turns the other one on.

For the third one—the one hanging from a 13-foot ceiling—the remote control sometimes operates one of the other fans, and sometimes the fan its supposed to operate. Most of the time it doesn’t seem to do much of anything.

The fan manufacturer—a large, well-known company—mentions this situation in their instructions and points to a FAQ that doesn’t exist. Searching around online I found instructions for solving this problem that involve unwiring the fans and repeating a set of steps 12 times for each fan to correct the situation. These instructions, needless to say, don’t work.

There is no way to reset the remote, nor the connection between the remote and the fan. There is no way to manually select some dip switch so the remote has a specific fan it talks to. Just some mystical software that’s supposed to work (but doesn’t) and no real instructions on how to resolve the problem. The result will be a multi-hour wait on a customer support line, spending hours of my time to sort the problem out, and the joy of climbing (tall) ladders to unwire and wire ceiling fans in four different rooms.

Thinking through possible problems and building software interfaces that take those situations into account … might be a bit more important than we think they are if software is really going to eat the world.

Second, the retailer’s web site—a large retailer with thousands of physical stores across the United States. Twice I’ve ordered from this site, asking to have the item held in the local store so I can pick it up. The site won’t let you order the item for store pickup unless they have it in stock.

The first time they called me to say they couldn’t find the item I ordered, but they found a “newer model” that was a lot less expensive. It was a lot less expensive because it wasn’t the same item. They never did find the item I originally ordered.

The second time they called me to say they couldn’t find the item I ordered. I asked if they could just ship the item to my house when it’s back in stock. “I’m sorry, our system doesn’t allow us to do that …” Several hours later, they called back to tell me they found it, but they cannot reinstate my order—I must place a new order.

Again, software quality strikes … what should be a simple process just isn’t. There will always be mismatches between the state in software and the state in the real world—but design the system so it’s possible to adapt when this happens, rather than shutting down the process and starting over.

Third, I own a car that has all the “bells and whistles,” including an adaptive cruise control system. There are certain situations, however, where this adaptive control does the wrong thing, producing potentially dangerous results. There is no way to set the car to use the non-adaptive cruise control permanently (I called and waited on the phone for several hours to discover this). You can set the non-adaptive cruise control on a per-use basis by going through set of menus to change the settings … while driving.

Software quality anyone?

Software eats the world might be someone’s ultimate dream—but I suspect that software quality will always be the fly in the ointment. People are not perfect (even in crowds); software is created by people; hence software will always suffer from quality problems.

Maybe a little humility about our ability to make things as complex as we might like because “we can always have software do that bit” would be a good thing—even in the networking world.

Thoughts on Auto Disaggregation and Complexity

Russ — Mon, 15 Nov 2021 18:00:54 +0000

Way in the past, the EIGRP team (including me) had an interesting idea–why not aggregate routes automatically as much as possible, along classless bounds, and then deaggregate routes when we could detect some failure was causing a routing black hole? To understand this concept better, consider the network below.

In this network, B and C are connected to four different routers, each of which is advertising a different subnet. In turn, B and C are aggregating these four routes into 2001:db8:3e8:10::/60, and advertising this aggregate towards A. From a control plane state perspective, this is a major win. The obvious gain is that the amount of state is reduced from four routes to one. The less obvious gain is A doesn’t need to know about any changes in the state for the four destinations aggregated into the /60. Depending on how often these links change state, the reduction in the rate of change is, perhaps, more important than the reduction in the amount of control plane state.

We always know there will be a tradeoff when reducing state; what is the tradeoff here? If C somehow loses its connection to one of the four routers, say the router advertising 11::/64, C’s 10::/60 aggregate will not change. Since A thinks C still has a route to every subnet within 10::/60, it will continue sending traffic destined to addresses in the 11::/64 towards both B and C. C will not have a route towards these destinations, so it will drop the traffic.

We have a routing black hole.

for more information on aggregation in networks, take a look at my livelesson on abstraction in computer networks

This much is pretty simple. The harder part is figuring out to eliminate this routing black hole. Our first choice is to just not aggregate these routes. While you might be cringing right now, this isn’t such a bad option in many networks. We often underestimate the amount of state and the speed of state change modern routing protocols running on modern processors can support. I’ve seen networks running IS-IS in a single flooding domain with tens of thousands of routes and thousands of nodes running “in the wild.” I’ve seen IS-IS networks with thousands of nodes and hundreds of thousands of routes running in lab environments. These networks still converge.

But what if we really think we need to reduce the amount and speed of state, so we really need to aggregate these routes?

One solution that has been proposed a number of times through the years is auto disaggregation.

In this case, suppose D somehow realizes C cannot reach one of the components of a shared aggregate route. D could simply stop advertising the aggregate, advertising each of the components instead. The question here might be: is this a good idea? Looking at this from the perspective of the SOS triad, the aggregation replaced four routes with a single route. In the auto disaggregation case, the single route change is replaced by four route changes. The amount of state is variable, and in some cases the rate of change in state is actually higher than without the aggregation.

So…

I don’t hold that auto disaggregation is either good nor bad—it just presents a different set of challenges to the network designer. Instead of designing for average rates of change and given table sizes, you can count on much smaller tables, but you might find there are times when the rate of change is dramatically higher than you expect. A good question to ask, before deploying this kind of technology, might be: can I forsee a chain of events that will cause a high enough rate of state change that auto disaggregation is actually more destabilizing than just not summarizing at all in this network?

A real danger with auto disaggregation, by the way, is using summarization to dramatically reduce table sizes without understanding how a goldilocks failure (what we used to call in telco a mother’s day event, or perhaps a black swan) can cascade into widespread failures. If you’re counting on particular devices in your network only have a dozen or two dozen table entries, but just the right set of failures can cause them to have several thousand entries because of auto disaggregation, what kinds of failures modes should you anticipate? Can you anticipate or mitigate this kind of problem?

The idea of automatically summarizing and disaggregating routes is an interesting study in complexity, state, and optimization. It’s a good brain exercise in thinking through what-if situations, and carefully thinking about when and where to deploy this kind of thing.

What do you think about this idea? When would you deploy it, where, and why? When and where would you be cautious about deploying this kind of technology?

Hedge 108: In Defense of Boring Technology with Andrew Wertkin

Russ — Thu, 11 Nov 2021 18:00:06 +0000

Engineers (and marketing folks) love new technology. Watching an engineer learn or unwrap some new technology is like watching a dog chase a squirrel—the point is not to catch the squirrel, it’s just that the chase is really fun. Join Andrew Wertkin (from BlueCat Networks), Tom Ammon, and Russ White as we discuss the importance of simple, boring technologies, and moderating our love of the new.

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-108.mp3

download

Hedge 105: Johan Gustawsson and Changing Provider Architectures

Russ — Wed, 20 Oct 2021 21:10:23 +0000

Many service providers have the feeling that they “didn’t do anything wrong, but somehow we still lost.” How are providers reacting to the massive changes in the networking field, and how are they trying to regain their footing so they can move into the coming decades better positioned to compete? Join Johan Gustawsson, Tom Ammon, and Russ White as we discuss the impact of merchant silicon and changing applications on the architecture of service providers.

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-105.mp3

download

You can read Johan’s post on this topic here.

Thoughts on the Collapsed Spine

Russ — Tue, 21 Sep 2021 17:00:25 +0000

One of the designs I’ve been encountering a lot of recently is a “collapsed spine” data center network, as shown in the illustration below.

In this design, and B are spine routers, while C-F are top of rack switches. The terminology is important here, because C-F are just switches—they don’t route packets. When G sends a packet to H, the packet is switched by C to A, which then routes the packet towards F, which then switches the packet towards H. C and F do not perform an IP lookup, just a MAC address lookup. A and B are responsible for setting the correct next hop MAC address to forward packets through F to H.
What are the positive aspects of this design? Primarily that all processing is handled on the two spine routers—the top of rack switches don’t need to keep any sort of routing table, nor do any IP lookups. This means you can use very inexpensive devices for your ToR. In brownfield deployments, so long as the existing ToR devices can switch based on MAC addresses, existing hardware can be used.

This design also centralizes almost all aspects of network configuration and management on the spine routers. There is little (if anything) configured on the ToR devices.
What about negative aspects? After all, if you haven’t found the tradeoffs, you haven’t looked hard enough. What are they here?

First, I’m struggling to call this a “fabric” at all—it’s more of a mash-up between a traditional two-layer hierarchical design with a routed core and switched access. Two of the points behind a fabric are the fabric doesn’t have any intelligence (all ports are undifferentiated Ethernet) and all the devices in the fabric are the same.

I suppose you could say the topology itself makes it more “fabric-like” than “network-like,” but we’re squinting a bit either way.

The second downside of this design is that it impacts the scaling properties of the fabric. This design assumes you’ll have larger/more intelligent devices in the spine, and smaller/less intelligent devices in the ToR. One of my consistent goals in designing fabrics has always been to push as close to single-sku as possible—use the same device in every position in the fabric. This greatly simplifies instrumentation, troubleshooting, and supply chain management.

One of the primary points of moving from a network in the more traditional sense to a “true fabric” is to radically simplify the network—this design doesn’t seem like it’s as “simple,” on the network side of things, as it could be. Again, something of a “mash-up” of a simpler fabric and a more traditional two-layer hierarchical routed/switched network.

Scale-out is problematic in this design, as well. You’d need to continue pushing cheap/low-intelligence switches along the edge, and adding larger devices in the spine to make this work over time. At some point, say when you have eight or sixteen spines, you’d be managing just as much configuration—and configuration that’s necessarily more complex because you’re essentially managing remote ports rather than local ones—as you would by just moving routing down to the ToR devices. There’s some scale point here with this design where it’s adding overhead and unnecessary complexity to save a bit of money on ToR switches.

When making the choice between OPEX and CAPEX, we should all know which one to pick.

Where would I use this kind of design? Probably in a smaller network (small enough not to use chassis devices in the spine) which will never need to be scaled out. I might use it as a transition mechanism to a full fabric at some point in the future, but I would want a well-designed planned to transition—and I would want it written in stone that this would not be scaled in the future beyond a specific point.

There’s nothing more permanent in the world than temporary government programs and temporary network designs.
If anyone has other thoughts on this design, please leave them in the comments below.

Russ’ Rules of Network Design

Russ — Tue, 14 Sep 2021 17:00:54 +0000

We have the twelve truths of networking, and possibly Akin’s Laws, but is there a set of rules for network design? I couldn’t find one, so I decided to create one, containing 18 laws I’ve listed below.

Russ’ Rules of Network Design

If you haven’t found the tradeoffs, you haven’t looked hard enough.
Design is an iterative process. You probably need one more iteration than you’ve done to get it right.
A design isn’t finished when everything needed is added, it’s finished when everything possible is taken away.
Good design isn’t making it work, it’s making it fail gracefully.
Effective, elegant, efficient. All other orders are incorrect.
Don’t fix blame; fix problems.
Local and global optimization are mutually exclusive.
Reducing state always reduces optimization someplace.
Reducing state always creates interaction surfaces; shallow and narrow interaction surfaces are better than deep and broad ones.
The easiest place to improve or screw up a design is at the interaction surfaces.
The optimum is almost always in the middle someplace; eschew extremes.
Sometimes its just better to start over.
There are a handful of right solutions; there is an infinite array of wrong ones.
You are not immensely smarter than anyone else in networking.
A bad design with a good presentation is doomed eventually; a good design with a bad presentation is doomed immediately.
You can only know your part of the system and a little bit about the parts around your part. The rest is rumor and pop psychology.
To most questions the correct initial answer should be “how many balloons fit in a bag?”
Virtual environments still have hard physical limits.

You can find a handy printable version here.

The Grass is Always Greener

Russ — Mon, 09 Aug 2021 18:10:15 +0000

This last week I was talking to someone at a small startup that intends to eliminate all the complex routing from campus networks. In the past, when reading blog posts about Kubernetes, I’ve read about how it was designed to eliminate routing protocols because “routing protocols are so complex.”

Color me skeptical.

There are two reasons for complexity in a design. The first is you’re solving a hard problem. The second is you’ve made bad design choices in the past, and you’re pasting complexity on top to solve some perceived problem (whether perceived or real).

The problem with all this talk about building something that’s “less complex” is people tend to see complexity of the first kind and think, “we can get rid of that complexity if we start over.” Failing to understand the past before building the future is a recipe for repeated failures of the same kind. Building a network without a distributed routing protocol hasn’t been tried before either, right? Well, yes, it has … We either forget how it turned out, or we say “well, that’s not the same thing I’m talking about here” (just like “real socialism hasn’t ever been tried”).

Even worse, they think they get rid of second and third kinds of complexity by starting over, or getting the humans out of the decision-making loop, or focusing on the data. Our modern penchant for relying “the data,” without ever thinking about the source of the data or how the data has been shaped and interpreted, is truly breathtaking.

They look over the horizon, see an unspoiled field, and think “the grass really is greener on the other side.”

Get rid of all those complex dynamic routing protocols … get rid of all those humans making decisions, so the decisions are “data driven” … and everything will be so much better.

Adding complexity to solve hard real-world problems is just the way things are, and they will always be, so the first reason for complexity will always be with us. People make mistakes, don’t see into the future perfectly, or just don’t have a perfect understanding of the system (technical debt), so the second kind of complexity will always be with us. You can’t “fix” people—God save us from those who think they can. The grass isn’t always greener—it just always looks that way.

What’s the practical upshot? Networks are always going to be complex. It’s just the nature of the problem being solved.

We add complexity because we fail to ask the right questions, we don’t understand the system, or we fail to do good design. The solution isn’t to seek out a greener field “out there,” but rather to make the field we currently live in greener by asking the right questions and reducing complexity through good design. Sometimes you might even need to start over with a new network … but when you start thinking about starting over with a newly designed set of protocols because the old ones are “too complex,” you need to ask how those old ones got that way, and how you’re going to stop the new ones from getting to the same place.

The grass is always greener because you looking at it through green-colored lenses just as the new grass is in its full flush, and before the weeds have had a chance to take over.

Learn how old things worked before you fall for some new “modern wonder” that’s going to solve every problem. The complexity in old things will show you where you can expect to find complexity grow up in new things.

NATs, PATs, and Network Hygiene

Russ — Tue, 13 Jul 2021 17:00:14 +0000

While reading a research paper on address spoofing from 2019, I ran into this on NAT (really PAT) failures—

In the first failure mode, the NAT simply forwards the packets with the spoofed source address (the victim) intact … In the second failure mode, the NAT rewrites the source address to the NAT’s publicly routable address, and forwards the packet to the amplifier. When the server replies, the NAT system does the inverse translation of the source address, expecting to deliver the packet to an internal system. However, because the mapping is between two routable addresses external to the NAT, the packet is routed by the NAT towards the victim.

The authors state 49% of the NATs they discovered in their investigation of spoofed addresses fail in one of these two ways. From what I remember way back when the first NAT/PAT device (the PIX) was deployed in the real world (I worked in TAC at the time), there was a lot of discussion about what a firewall should do with packets sourced from addresses not indicated anywhere.

If I have an access list including 192.168.1.0/24, and I get a packet sourced from 192.168.2.24, what should the NAT do? Should it forward the packet, assuming it’s from some valid public IP space? Or should it block the packet because there’s no policy covering this source address?

This is similar to the discussion about whether BGP speakers should send routes to an external peer if there is no policy configured. The IETF (though not all vendors) eventually came to the conclusion that BGP speakers should not advertise to external peers without some form of policy configured.

My instinct is the NATs here are doing the right thing—these packets should be forwarded—but network operators should be aware of this failure mode and configure their intentions explicitly. I suspect most operators don’t realize this is the way most NAT implementations work, and hence they aren’t explicitly filtering source addresses that don’t fall within the source translation pool.

In the real world, there should also be a box just outside the NATing device that’s running unicast reverse path forwarding checks. This would resolve these sorts of spoofed packets from being forwarding into the DFZ—but uRPF is rarely implemented by edge providers, and most edge connected operators (enterprises) don’t think about the importance of uRPF to their security.

All this to say—if you’re running a NAT or PAT, make certain you understand how it works. Filters are tricky in the best of circumstances. NAT and PATs just make filters trickier.

The Hedge 85: Terry Slattery and the ROI of Automation

Russ — Wed, 26 May 2021 17:37:57 +0000

It’s easy to assume automation can solve anything and that it’s cheap to deploy—that there are a lot of upsides to automation, and no downsides. In this episode of the Hedge, Terry Slattery joins Tom Ammon and Russ White to discuss something we don’t often talk about, the Return on Investment (ROI) of automation.

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-085.mp3

download

Is it really the best just because its the most common?

Russ — Mon, 24 May 2021 17:00:38 +0000

I cannot count the number of times I’ve heard someone ask these two questions—

What are other people doing?
What is the best common practice?

While these questions have always bothered me, I could never really put my finger on why. I ran across a journal article recently that helped me understand a bit better. The root of the problem is this—what does best common mean, and how can following the best common produce a set of actions you can be confident will solve your problem?

Bellman and Oorschot say best common practice can mean this is widely implemented. The thinking seems to run something like this: the crowd’s collective wisdom will probably be better than my thinking… more sets of eyes will make for wiser or better decisions. Anyone who has studied the madness of crowds will immediately recognize the folly of this kind of state. Just because a lot of people agree it’s a good idea to jump off a cliff does not mean it is, in fact, a good idea to jump off a cliff.

Perhaps it means something closer to this is no worse than our competitors. If that’s the meaning, though, it’s a pretty cynical result. It’s saying, “I don’t mind condemning myself to mediocrity so long as I see everyone else doing the same thing.” It doesn’t sound like much of a way to grow a business.

The authors do provide their definition—

For a given desired outcome, a “best practice” is a means intended to achieve that outcome, and that is considered to be at least as “good” as the best of other broadly considered means to achieve that same outcome.

The thinking seems to run something like this—it’s likely that everyone else has tried many different ways of doing this; that they have all settled on doing this, this way, means all those other methods are probably not as good as this one for some reason.

Does this work? There’s no way to tell without further investigation. How many of the other folks doing “this” spent serious time trying alternatives, and how many just decided the cheapest way was the best no matter how poor the result might be? In fact, how can we know what the results of doing things “this way” have in all those other networks? Where would we find this kind of information?

In the end, I can’t ever make much sense out of the question, “what is everyone else doing?” Discovering what everyone else is doing might help me eliminate possibilities (that didn’t work for them, so I certainly don’t want to try it), or it might help me understand the positive and negative attributes of a given solution. Still, I don’t understand why “common” should infer “best.”

The best solution for this situation is simply going to be the best solution. Feel free to draw on many sources, but don’t let other people determine what you should be doing.

The Effectiveness of AS Path Prepending (2)

Russ — Mon, 17 May 2021 17:00:02 +0000

Last week I began discussing why AS Path Prepend doesn’t always affect traffic the way we think it will. Two other observations from the research paper I’m working off of are:

Adding two prepends will move more traffic than adding a single prepend
It’s not possible to move traffic incrementally by prepending; when it works, prepending will end up moving most of the traffic from one inbound path to another

A slightly more complex network will help explain these two observations.

Assume AS65000 would like to control the inbound path for 100::/64. I’ve added a link between AS65001 and 65002 here, but we will still find prepending a single AS to the path won’t make much difference in the path used to reach 100::/64. Why?

Because most providers will have a local policy configured—using local preference—that causes them to choose any available customer connection over other paths. AS65001, on receiving the route to 100::/64 from AS65000, will set the local preference so it will prefer this route over any other route, including the one learned from AS65002. So while the cause is a little different in this case than the situation covered in the first post, the result is the same.

We can, of course, prepend twice onto the AS Path rather than once. What impact would that have here? It still won’t impact the traffic originating in 65005 because AS65001 is the only path available towards 100::64 from their perspective. Prepending cannot change anything if there’s only one path.

However, if most of the traffic destined to 100::/64 coming from AS65006, 7, and 8 rather than from AS65005, prepending two times will allow AS65000 to shift the traffic from the path through AS65002 to the path through AS65001. This example might seem a little contrived. Still, it’s pretty similar to networks that have one connection to some local provider (a cable company or something similar) and one connection to a more prominent national or international provider. Any time you are connected to two different providers who have different ranges of connectivity, prepending two autonomous systems on the AS Path will probably be able to shift traffic from one inbound link to another.

What about prepending more than two hops to the AS Path? Each additional prepend going to shift smaller amounts of traffic. It makes sense that increasing the number of prepends doesn’t shift much more because the further away you get from the edge of the Internet, the more fully connected the autonomous systems are, and the more likely you are to run into some other policy that will override the AS Path in determining the best path. The average length of the AS Path in the Internet is around four; prepending more than this normally won’t have much of an effect on traffic flow

The second question above can also be answered by looking at this network. Why can’t you shift traffic incrementally by prepending onto the AS Path? Because the connectivity close to the edge is probably not meshy enough. You can’t shift over just the traffic from one AS or another; you can only shift traffic from the entire set of autonomous systems behind your upstream from one inbound link to another. You can adjust traffic on a per-prefix basis, however, which can be useful for balancing between two inbound links.

What can you do to control inbound traffic with more certainty? Take a look at this older post for thoughts on using communities and de-aggregation to steer traffic.

The Effectiveness of AS Path Prepending (1)

Russ — Mon, 10 May 2021 17:00:53 +0000

Just about everyone prepends AS’ to shift inbound traffic from one provider to another—but does this really work? First, a short review on prepending, and then a look at some recent research in this area.

What is prepending meant to do?

Looking at this network diagram, the idea is for AS6500 (each router is in its own AS) to steer traffic through AS65001, rather than AS65002, for 100::/64. The most common method to trying to accomplish this is AS65000 can prepend its own AS number on the AS Path Multiple times. Increasing the length of the AS Path will, in theory, cause a route to be less preferred.

In this case, suppose AS65000 prepends its own AS number on the AS Path once before advertising the route towards AS65001, and not towards AS65002. Assuming there is no link between AS65001 and AS65002, what would we expect to happen? What we would expect is AS65001 will receive one route towards 100::/64 with an AS Path of 2 and use this route. AS65002 will, likewise, receive one route towards 100::/64 with an AS Path of 1 and use this route.

AS65003, however, will receive two routes towards 100::/64, one with an AS Path of 3 through AS65001, and one with an AS Path of 2 through AS65002. All other things being equal (local preference, etc.), AS65003 will prefer the route with the shorter AS Path through AS65002, and select that path to reach 100::/64. AS65004 will only receive one path towards 100::/64, the one through AS65002, because AS65003 will only advertise its best path to AS65004.

The obvious question—how much good does this really do? The only impact on the best path is two hops away, as AS65003, and beyond. The route chosen by AS65001 and AS65002 will not be affected by the prepending.

A recent paper found—

We observe that the effectiveness of prepending can strongly depend on the location (for around 20% of cases, ASPP has moved no targets, while for another 20% , it moved almost all targets).

You might expect As Path prepending to have a much more consistent effect on inbound traffic. Why doesn’t it?

What might not be obvious (the danger of simplified diagrams): if autonomous systems directly attached to AS65001 originate most of the traffic destined to 100::/64, no amount of prepending is going to make any difference in the inbound traffic flow. Assume AS5001 has a connection to some cloud service, AS65002 does not have a connection to the same cloud service, and 100::64 is a local server that communicates with this cloud service on a regular basis. Since AS65001 is the only AS transiting traffic from the cloud service to the server located on the 100::/64 subnet, and AS65001 only has one route to 100::/64, you are not going to be able to shift traffic off that single path no matter how many times you prepend.

The first rule of prepending is location matters. You have to know where the traffic you want to shift is originating, and whether or not it can be shifted.

In my next post on this topic, I’ll continue exploring AS path prepending more in light of the results of the research paper above.

Complexity Reduction?

Russ — Mon, 19 Apr 2021 17:00:31 +0000

Back in January, I ran into an interesting article called The many lies about reducing complexity:

Reducing complexity sells. Especially managers in IT are sensitive to it as complexity generally is their biggest headache. Hence, in IT, people are in a perennial fight to make the complexity bearable.

Gerben then discusses two ways we often try to reduce complexity. First, we try to simply reduce the number of applications we’re using. We see this all the time in the networking world—if we could only get to a single pane of glass, or reduce the number of management packages we use, or reduce the number of control planes (generally to one), or reduce the number of transport protocols … but reducing the number of protocols doesn’t necessarily reduce complexity. Instead, we can just end up with one very complex protocol. Would it really be simpler to push DNS and HTTP functionality into BGP so we can use a single protocol to do everything?

Second, we try to reduce complexity by hiding it. While this is sometimes effective, it can also lead to unacceptable tradeoffs in performance (we run into the state, optimization, surfaces triad here). It can also make the system more complex if we need to go back and leak information to regain optimal behavior. Think of the OSPF type 4, which just reinjects information lost in building an area summary, or even the complexity involved in the type7 to type 5 process required to create not-so-stubby areas.

It would seem, then, that you really can’t get rid of complexity. You can move it around, and sometimes you can effectively hide it, but you cannot get rid of it.

This is, to some extent, true. Complexity is a reaction to difficult environments, and networks are difficult environments.

Even so, there are ways to actually reduce complexity. The solution is not just hiding information because it’s messy, or munging things together because it requires fewer applications or protocols. You cannot eliminate complexity, but if you think about how information flows through a system you might be able to reduce the amount of complexity, and even create boundaries where state (hence complexity) can be more effectively hidden.

As an instance, I have argued elsewhere that building a DC fabric with distinct overlay and underlay protocols can actually create a simpler overall design than using a single protocol. Another instance might be to really think about where route aggregation takes place—is it really needed at all? Why? Is this the right place to aggregate routes? Is there any way I can change the network design to reduce state leaking through the abstraction?

The problem is there are no clear-cut rules for thinking about complexity in this way. There’s no rule of thumb, there’s no best practices. You just have to think through each individual situation and consider how, where, and why state flows, and then think through the state/optimization/surface tradeoffs for each possible way of reducing the complexity of the system. You have to take into account that local reductions in complexity can cause the overall system to be much more complex, as well, and eventually make the system brittle.

There’s no “pat” way to reduce complexity—that there is, is perhaps one of the biggest lies about complexity in the networking world.

The Hedge 79: Brooks Westbrook and the Data Driven Lens

Russ — Thu, 15 Apr 2021 17:00:37 +0000

Many networks are designed and operationally drive by the configuration and management of features supporting applications and use cases. For network engineering to catch up to the rest of the operational world, it needs to move rapidly towards data driven management based on a solid understanding of the underlying protocols and systems. Brooks Westbrook joins Tom Amman and Russ White to discuss the data driven lens in this episode of the Hedge.

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-079.mp3

download

The Hedge 75: Mike Parks and the Remote Work Scramble

Russ — Wed, 17 Mar 2021 19:04:37 +0000

The international pandemic has sent companies scrambling to support lots of new remote workers, which has meant changes in processes, application development, application deployment, connectivity, and even support. Mike Parks joins Eyvonne Sharp and Russ White to discuss these changes on this episode of the Hedge.

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-075.mp3

download

Complexity Bites Back

Russ — Mon, 15 Mar 2021 17:00:47 +0000

What percentage of business-impacting application outages are caused by networks? According to a recent survey by the Uptime Institute, about 30% of the 300 operators they surveyed, 29% have experienced network related outages in the last three years—the highest percentage of causes for IT failures across the period.

A secondary question on the survey attempted to “dig a little deeper” to understand the reasons for network failure; the chart below shows the result.

We can be almost certain the third-party failures, if the providers were queried, would break down along the same lines. Is there a pattern among the reasons for failure?

Configuration change—while this could be somewhat managed through automation, these kinds of failures are more generally the result of complexity. Firmware and software failures? The more complex the pieces of software, the more likely it is to have mission-impacting errors of some kind—so again, complexity related. Corrupted policies and routing tables are also complexity related. The only item among the top preventable causes that does not seem, at first, to relate directly to complexity is network overload and/or congestion problems. Many of these cases, however, might also be complexity related.

The Uptime Institute draws this same lesson, though through a slightly different process, saying: “Networks are complex not only technically, but also operationally.”

For years—decades, even—we have talked about the increasing complexity of networks, but we have done little about it. Yes, we have automated all the things, but automation can only carry us so far in covering complexity up. Automation also adds a large dop of complexity on top of the existing network—sometimes (not always, of course!) automating a complex system without making substantial efforts at simplification is just like trying to put a fire out with a can of gas (or, in one instance I actually saw, trying to put out an electrical fire with a can of soda, with the predictable trip to the local hospital.

We are (finally) starting to be “bit hard” by complexity problems in our networks—and I suspect this is the leading edge of the problem, rather than the trailing edge.

Maybe it’s time to realize making every protocol serve every purpose in the network wasn’t a good idea—we now have protocols that are so complex that they can only be correctly configured by machines, and then only when you narrow the use case enough to make the design parameters intelligible.

Maybe it’s time to realize optimizing for every edge use case wasn’t a good idea. Sometimes it’s just better to throw resources at the problem, rather than throwing state at the control plane to squeeze out just one more ounce of optimization.

Maybe it’s time to stop building networks around “whatever the application developer can dream up.” To start working as a team with the application developers to build a complete system that puts complexity where it most makes sense, and divides complexity from complexity, rather than just assuming “the network can do that.”

Maybe it’s time to stop thinking we can automate our way out of this.

Maybe it’s time to lay our superhero capes down and just start building simpler systems.

Rethinking BGP on the DC Fabric (part 5)

Russ — Mon, 01 Mar 2021 18:00:48 +0000

BGP is widely used as an IGP in the underlay of modern DC fabrics. This series argues this is not the best long-term solution to the problem of routing in fabrics because BGP is not ideal for this use case. This post will consider the potential harm we are doing to the larger Internet by pressing BGP into a role it was not originally designed to fulfill—an underlay protocol or an IGP.

My last post described the kinds of configuration required to make BGP work on a DC fabric—it turns out that the configuration of each BGP speaker on the fabric is close to unique. It is possible to automate configuring each speaker—but it would be better if we could get closer to autonomic operation.

To move BGP closer to autonomic operation in a DC fabric, there are several things we can do. First, we can allow a BGP speaker to peer with any other BGP speaker it receives an open message from—this is often called promiscuous mode. While each router in the fabric will still need to be configured with the right autonomous system, at least we won’t need to configure the correct peers on each router (including the remote AS).

Note, however, that using this kind of promiscuous peering does come with a set of tradeoffs (if you’re reading this blog, you know there will be tradeoffs). BGP speakers running in promiscuous mode open a large attack surface on the control plane of the network. We can close this attack surface by configuring authentication on all BGP speakers … but we are now adding complexity to reduce complexity. We could also reduce the scope of the attack surface by never permitting BGP to peer beyond a single hop, and then filtering all BGP packets at the fabric edge. Again, just a bit more complexity to manage—but remember that the road to highly fragile and complex systems is always paved with individual steps that never, on their own, seem to add “too much complexity.”

The second thing we can do to move BGP closer to autonomic operation is to advertise routes to every connected peer without any policy configured. This does, again, introduce some tradeoffs, particularly in the realm of security, but let’s leave that aside for the moment.

Assume we can create a version of BGP that has these modifications—it always accepts any peer from any other AS, and it advertises all routes without any policy configured. Put these features behind a single knob which also includes setting the MRAI to 0 or 1, tightens up the dampening parameters, and adjusts a few other things to make BGP work better in a DC fabric.

As an experiment, let’s enable this DC fabric knob on a BGP speaker at the edge of a dual-homed “enterprise customer.” What will happen?

The enterprise network will automatically peer to any speaker that sends an open message—a huge security hole on the open Internet—and it will advertise every route it learns even though there is no policy configured. This second issue—advertising routes with no policy configured—can cause the enterprise network to become a transit between two much larger provider networks, crashing out some small corner of the Internet.

This might seem like a trivial issue. After all, just don’t ever enable the DC fabric knob on an eBGP peering session upstream into the DFZ, or any other “real” internetwork. Sure, and just don’t ever hit the brakes when you mean to hit the accelerator, or the accelerator when you mean to hit the brakes. If I had a dime for every time we “just don’t ever make that mistake …” Well, I wouldn’t be blogging, I’d be relaxing in the sun someplace (okay, I’m not likely to ever stop working to sit around and “relax” all the time, but you get the picture anyway).

Maybe—just maybe—it would really be better overall to use two different protocols for IGP and EGP work. Maybe—just maybe—it’s better not to mix these two different kinds of functions in a single protocol. Not only is the single resulting protocol bound to be really complex (most BGP implementations are now over 100,000 lines of code, after all), but it will end up being really easy to make really bad mistakes.

No tool is omnicompetent. If you found a tool that was, in fact, omnicompetent, it would also be the most dangerous tool in your toolbox.

Rethinking BGP on the DC Fabric (part 4)

Russ — Mon, 22 Feb 2021 18:00:23 +0000

Before I continue, I want to remind you what the purpose of this little series of posts is. The point is not to convince you to never use BGP in the DC underlay ever again. There’s a lot of BGP deployed out there, and there are lot of tools that assume BGP in the underlay. I doubt any of that is going to change. The point is to make you stop and think!

Why are we deploying BGP in this way? Is this the right long-term solution? Should we, as a community, be rethinking our desire to use BGP for everything? Are we just “following the crowd” because … well … we think it’s what the “cool kids” are doing, or because “following the crowd” is what we always seem to do?

In my last post, I argued that BGP converges much more slowly than the other options available for the DC fabric underlay control plane. The pushback I received was two-fold. First, the overlay converges fast enough; the underlay convergence time does not really factor into overall convergence time. Second, there are ways to fix things.

If the first pushback is always true—the speed of the underlay control plane convergence does not matter—then why have an underlay control plane at all? Why not just use a single, merged, control plane for both underlay and overlay? Or … to be a little more shocking, if the speed at which the underlay control plane converges does not matter, why not just configure the entire underlay using … static routes?

The reason we use a dynamic underlay control plane is because we need this foundational connectivity for something. So long as we need this foundational connectivity for something, then that something is always going to be better if it is faster rather than slower.

The second pushback is more interesting. Essentially—because we work on virtual things rather than physical ones, just about anything can be adapted to serve any purpose. I can, for instance, replace BGP’s bestpath algorithm with Dijkstra’s SPF, and BGP’s packet format with a more straight-forward TLV format emulating a link-state protocol, and then say, “see, now BGP looks just like a link-state protocol … I made BGP work really well on a DC fabric.”

Yes, of course you can do these things. Somewhere along the way we became convinced that we are being really clever when we adapt a protocol to do something it wasn’t designed to do, but I’m not certain this is a good way of going about building reliable systems.

Okay, back to the point … the next reason we should rethink BGP on the DC fabric is because it is complex to configure when its being used as an IGP. In my last post, when discussing the configuration required to make BGP converge, I noted AS numbers and AS Path filters must be laid out in a very specific way, following where each device is located in the fabric. The MRAI must be taken down to some minimum on every device (either 0 or 1 second), and individual peers must be configured.

Further, if you are using a version of BGP that follows the IETF’s BCPs for the protocol, you must configure some sort of filter (generally a permit all) to get a BGP speaker to advertise anything to an eBGP peer. If you’re using iBGP, you need to configure route reflectors and tell BGP to advertise multiple paths.

There are two ways to solve this problem. First, you can automate all this configuration—of course! I am a huge fan of automation. It’s an important tool because it can make your network consistent and more secure.

But I’m also realistic enough to know that adding the complexity of an automation system on top of a too-complex system to make things simpler is probably not a really good idea. To give a visual example, consider the possibility of automatically wiping your mouth while eating soup.

Yes, automation can be taken too far. A good rule of thumb might be: automation works best on systems intentionally designed to be simple enough to automate. In this case, perhaps it would be simpler to just use a protocol more directly designed so solve the problem at hand, rather than trying to automate our way out of the problem.

Second, you can modify BGP to be a better fit for use as an IGP in various ways. This post has already run far too long, however, so … I’ll hold off on talking about this until the next post.

Rethinking BGP on the DC Fabric (part 3)

Russ — Mon, 15 Feb 2021 18:00:37 +0000

The fist post on this topic considered some basic definitions and the reasons why I am writing this series of posts. The second considered the convergence speed of BGP on a dense topology such as a DC fabric, and what mechanisms we normally use to improve BGP’s convergence speed. This post considers some of the objections to slow convergence speed—convergence speed is not important, and ECMP with high fanouts will take care of any convergence speed issues. The network below will be used for this discussion.

Two servers are connected to this five-stage butterfly: S1 and S2 Assume, for a moment, that some service is running on both S1 and S2. This service is configured in active-active mode, with all data synchronized between the servers. If some fabric device, such as C7, fails, traffic destined to either S1 or S2 across that device will be very quickly (within tens of milliseconds) rerouted through some other device, probably C6, to reach the same destination. This will happen no matter what routing protocol is being used in the underlay control plane—so why does BGP’s convergence speed matter? Further, if these services are running in the overlay, or they are designed to discover failed servers and adjust accordingly, it would seem like the speed at which the underlay converges just does not matter.

Consider, however, the case where the services running on S1 and S2 are both reachable through an eVPN overlay with tunnel tail-ends landing on the ToR switch through which each server connects to the fabric. Applications accessing these services, for this example, either access the service via a layer 2 MAC address or through a single (anycast) IP address representing the service, rather than any particular instance. To make all of this work, there would be one tunnel tail-end landing on A8, and another landing on E8.

Now what happens if A8 fails? For the duration of the underlay control plane convergence the tunnel tail-end at A8 will appear to be reachable to the overlay. Thus the overlay tunnel will remain up and carrying traffic to a black hole on one of the routers adjacent to A8. In the case of a service reachable via anycast, the application can react in one of two ways—it can fail out operations taking place during the underlay’s convergence, or it can wait. Remember that one second is an eternity in the world of customer-facing services, and that BGP can easily take up to one second to converge in this situation.

A rule of thumb for network design—it’s not the best-case that controls network performance, it’s the worst-case convergence.

The convergence speed of the underlay leaks through to the state of the overlay. The questions that should pop into your mind about right now is—can you be certain this kind of situation cannot happen in your current network, can you be certain it will never happen, and can you be certain this will not have an impact on application performance? I don’t see how the answer to those questions can be yes. The bottom line: convergence speed should be left out of the equation when building a DC fabric. There may be times when you control the applications, and hence can push the complexity of dealing with slow convergence to the application developers—but this seems like a vanishingly small number of cases. Further, is pushing solving for slow convergence to the application developer optimal?

My take on the argument that convergence speed doesn’t matter, then, is that it doesn’t hold up under deeper scrutiny.

as I noted when I started this series—I’m not arguing that we should rip BGP out of every DC fabric … instead, what I’m trying to do is to stir up a conversation and to get my readers to think more deeply about their design choices, and how those design choices work out in the real world

Rethinking BGP on the DC Fabric (part 2)

Russ — Mon, 08 Feb 2021 18:00:29 +0000

In my last post on this topic, I laid out the purpose of this series—to start a discussion about whether BGP is the ideal underlay control plane for a DC fabric—and gave some definitions. Here, I’d like to dive into the reasons to not use BGP as a DC fabric underlay control plane—and the first of these reasons is BGP converges very slowly and requires a lot of help to converge at all.

Examples abound. I’ve seen the results of two testbeds in the last several years where a DC fabric was configured with each router (switch, if you prefer) in a separate AS, and some number of routes pushed into the network. In both cases—one large-scale, the other a more moderately scaled network on physical hardware—BGP simply failed to converge. Why? A quick look at how BGP converges might help explain these results.

Assume we are watching the 110::/64 route (attached to A, on the left side of the diagram), at P. What happens when A loses it’s connection to 110::/64? Assuming every router in this diagram is in a different AS, and the AS path length is the only factor determining the best path at every router.

Watching the route to 110::/64 at P, you would see the route move from G to M as the best path, then from M to K, then from K to N, and then finally completely drop out of P’s table. This is called the hunt because BGP “hunts,” apparently trying every path from the current best path to the longest possible path before finally removing the route from the network entirely. BGP isn’t really “hunting;” this is just an artifact of the way BGP speakers receive, process, and send updates through the network.

If you consider a more complex topology, like a five-stage butterfly fabric, you will find there are many (very many) alternate longer-length paths available for BGP to hunt through on a withdraw. Withdrawing thousands of routes at the same time, combined with the impact of the hunt, can put BGP in a state where it simply never converges.

To get BGP to converge, various techniques must be used. For instance, placing all the routers in the spine so they are in the AS, configuring path filters at ToR switches so they are never used as a transit path, etc. Even when these techniques are used, however, BGP can still require a minute or so to perform a withdraw.

This means the BGP configuration cannot be the same on every device—it is determined by where the device is located—which harms repeatability, the BGP configuration must contain complex filters, and messing up the configuration can bring the entire fabric down.

There are several counters to the problem of slow convergence, and the complex configurations required to make BGP converge more quickly, but this post is pushing against its limit … so I’ll leave these until next time.

Rethinking BGP on the DC Fabric

Russ — Mon, 01 Feb 2021 18:00:47 +0000

Everyone uses BGP for DC underlays now because … well, just because everyone does. After all, there’s an RFC explaining the idea, every tool in the world supports BGP for the underlay, and every vendor out there recommends some form of BGP in their design documents.

I’m going to swim against the current for the moment and spend a couple of weeks here discussing the case against BGP as a DC underlay protocol. I’m not the only one swimming against this particular current, of course—there are at least three proposals in the IETF (more, if you count things that will probably never be deployed) proposing link-state alternatives to BGP. If BGP is so ideal for DC fabric underlays, then why are so many smart people (at least they seem to be smart) working on finding another solution?

But before I get into my reasoning, it’s probably best to define a few things.

In a properly design data center, there are at least three control planes. The first of these I’ll call the application overlay. This control plane generally runs host-to-host, providing routing between applications, containers, or virtual machines. Kubernetes networking would be an example of an application overlay control plane.

The second of these I’ll call the infrastructure overlay. This is generally going to be eVPN running BGP, most likely with VXLAN encapsulation, and potentially with segment routing for traffic steering support. This control plane will typically run on either workload supporting hosts, providing routing for the hypervisor or internal bridge, or on the Top of Rack (ToR) routers (switches, but who knows what “router” and “switch” even mean any longer?).

Now notice that not all networks will have both application and infrastructure overlays—many data center fabrics will have one or the other. It’s okay for a data center fabric to only have one of these two overlays—whether one or both are needed is really a matter of local application and business requirements. I also expect both of these to use either BGP or some form of controller-based control plane. BGP was originally designed to be an overlay control plane; it only makes sense to use it where an overlay is required.

I’ll call the third control plane the infrastructure underlay. This control plane provides reachability for the tunnel head- and tail-ends. Plain IPv4 or IPv6 transport is supported here; perhaps some might inject MPLS as well.

My argument, over the next couple of weeks, is BGP is not the best possible choice for the infrastructure underlay. What I’m not arguing is every network that runs BGP as the infrastructure underlay needs to be ripped out and replaced, or that BGP is an awful, horrible, no-good choice. I’m arguing there are very good reasons not to use BGP for the infrastructure underlay—that we need to start reconsidering our monolithic assumption that BGP is the “only” or “best” choice.

I’m out of words for this week; I’ll begin the argument proper in my next post… stay tuned.

Technical Debt (or Is Future Proofing Even a Good Idea?)

Russ — Mon, 09 Nov 2020 18:00:51 +0000

What, really, is “technical debt?” It’s tempting to say “anything legacy,” but then why do we need a new phrase to describe “legacy stuff?” Even the prejudice against legacy stuff isn’t all that rational when you think about it. Something that’s old might also just be well-tested, or well-worn but still serviceable. Let’s try another tack.

Technical debt, in the software world, can be defined as working on a piece of software for long periods of time by only adding features, and never refactoring or reorganizing the code to meet current conditions. The general idea is that as new features are added on top of the old, two things happen. First, the old stuff becomes a sort of opaque box that no-one understands. Second, the stuff being added to the old increasingly relies on public behavior that might be subject to unintended consequences or leaky abstractions.

To resolve this problem in the software world, software is “refactored.” In refactoring, every use of a public API is examined, including what information is being drawn out, or what the expected inputs and outputs are. The old code is then “discarded,” in a sense, and a new underlying function written that meets the requirements discovered in the existing codebase. The refactoring process allows the calling functions and called functions, the clients and the servers, to “move together.” As the overall understanding of the system changes, the system itself can change with that understanding. This brings the implementation closer to the understanding of the current engineering team.

So technical debt is really the mismatch between the understanding of the current engineering team imposed onto the implementation of a prior engineering team with a different understanding of the requirements (even if the older engineering team is the same people at a different time).

How can we apply this to networking? In the networking world, we actively work against refactoring by future proofing.

Future proofing: building a network that will never need to be replaced. Or, perhaps, never needing to go to your manager and say “I need to buy new hardware,” because the hardware you bought last week does not have the functionality you need any longer. This sounds great in theory, but the problem with theory is it often does not hold up when the glove hits the nose (every has a plan until they are punched in the face). The problem with future proofing is you can’t see the punch that’s coming until the future actually throws it.

If something is future proofed, it either cannot be refactored, or there will be massive resistence to the refactoring process.

Maybe its time we tried to change our view of things so we can refactor networks. What would refactoring a network look like? Maybe examining all the configurations within a particular module, figuring what they do, and then trying to figure out what application or requirement led to that particular bit of configuration. Or, one step higher, looking at every protocol or “feature” (whatever a feature might be), figuring out what purpose it might serve, and then intentionally setting about finding some other way, perhaps a simpler way, to provide that same service.

One key point in this process is to begin by refusing to look at the network as a set of appliances, and starting to see it as a system made up of hardware and software. Mentally disaggregating the software and hardware can allow you to see what can be changed and what cannot, and inject some flexibility into the network refactoring process.

When you refactor a bit of code, what you typically end up with a simpler piece of software that more closely matches the engineering team’s understanding of current requirements and conditions. Aren’t simplicity and coherence goals for operational networks, too?

If so, ehen was the last time you refactored your network?

Strong Reactions and Complexity

Russ — Mon, 02 Nov 2020 18:00:13 +0000

In the realm of network design—especially in the realm of security—we often react so strongly against a perceived threat, or so quickly to solve a perceived problem, that we fail to look for the tradeoffs. If you haven’t found the tradeoffs, you haven’t looked hard enough—or, as Dr. Little says, you have to ask what is gained and what is lost, rather than just what is gained. This failure to look at both sides often results in untold amounts of technical debt and complexity being dumped into network designs (and application implementations), causing outages and failures long after these decisions are made.

A 2018 paper on DDoS attacks, A First Joint Look at DoS Attacks and BGP Blackholing in the Wild provides a good example of causing more damage to an attack than the attack itself. Most networks are configured to allow the operator to quickly configure a remote triggered black hole (RTBH) using BGP. Most often, a community is attached to a BGP route that points the next-hop to a local discard route on each eBGP speaker. If used on the route advertising the destination of the attack—the service under attack—the result is the DDoS attack traffic no longer has a destination to flow to. If used on the route advertising the source of the DDoS attack traffic, the result is the DDoS traffic will no pass any reverse-path forwarding policies at the edge of the AS, and hence be dropped. Since most DDoS attacks are reflected, blocking the source traffic still prevents access to some service, generally DNS or something similar.

In either case, then, stopping the DDoS through an RTBH causes damage to services rather than just the attacker. Because of this, remote triggered black holes should really only be used in the most extreme cases, where no other DDoS mitigation strategy will work.

The authors of the Joint Look use publicly avaiable information to determine the answers to several questions. First, what scale of DDoS attacks are RTBHs used against? Second, how long after an attack begins is the RTBH triggered? Third, for how long is the RTBH left in place after the attack has been mitigated?

The answer to the first question should be—the RTBH is only used against the largest-scale attacks. The answer to the second question should be—the RTBH should be put in place very quickly after the attack is detected. The answer to the third question should be—the RTBH should be taken down as soon as the attack has stopped. The researchers found that RTBHs were most often used to mitigate the smallest of DDoS attacks, and almost never to mitigate larger ones. The authors also found that RTBHs were often left in place for hours after a DDoS attack had been mitigated. Both of these imply that current use of RTBH to mitigate DDoS attacks is counterproductive.

How many more design patterns do we follow that are simply counterproductive in the same way? This is not a matter of “following the data,” but rather one of really thinking through what it is you are trying to accomplish, and then how to accomplish that goal with the simplest set of tools available. Think through what it would mean to remove what you have put in, whether you really need to add another layer or protocol, how to minimize configuration, etc.

If you want your network to be less complex, examine the tradeoffs realistically.

Hints and Principles: Applied to Networks

Russ — Mon, 19 Oct 2020 19:00:22 +0000

While software design is not the same as network design, there is enough overlap for network designers to learn from software designers. A recent paper published by Butler Lampson, updating a paper he wrote in 1983, is a perfect illustration of this principle. The paper is caleld Hints and Principles for Computer System Design. I’m not going to write a full review here–you should really go read the paper for yourself–but rather just point out some useful bits of the paper.

The first really useful point of this paper is Lampson breaks down the entire field of software design into three basic questions: What, How, and When (or who)? Each of these corresponds to the goals, techniques, and processes used to design and develop software. These same questions and answers apply to network design–if you are missing one of these three areas, then you are probably missing some important set of questions you have not answered yet. Each of these is also represented by an acronym: what? is STEADY, how? is AID, and when? is ART. Let’s look at a couple of these in a little more detail to see how Lampson’s system works.

STEADY stands for simple, timely, efficient, adaptable, dependable, and yummy. Simple is just what it sounds like –reduce complexity. I’m not entirely on board with Lampson’s description of simplicity, which seems to focus on abstraction–abstraction is one useful tool, but anyone who reads my work regularly knows I’m rather more careful about abstraction than most because it involves often-unexamined tradeoffs. Timely primarily relates to “is there a market for this,” in software design; for networks it might be better put as “does the business need this now or later?” Efficient is one of those tradeoffs involved in abstraction–what I might call one of the various ways of optimizing a system. Adaptable means just what it sounds like–are you creating technial debt that must be resolved later? Dependable could be translated to resilience in network design, but it would also relate to many aspects of security, and even the jitter and delay elements in application support.

Yummy is one many network engineers will not be familiar with, but is worth considering. If I’m reading Lampson right here, another way to say this might be “easy to consume.” Why do you want your customers to be able to consume the network easily? Because you do not want them running off and using the cloud (for instance) because they find committing and understanding resources in your network so difficult. We have, for far to long, assumed that “easy to consume” in the network design world means “just plug it into the wall.” It’s not that simple.

The second one, AID, stands for approximate, incremental, and divide & conquer. These are, again, easily adaptable to network design. You don’t need to make the design perfect the first time. In fact, as a young artist one thing that was drilled into my head was that the perfect was the enemy of the good–it’s better to get it approximately right, right now, than perfectly right ten years down the road (when no-one cares any longer). Incremental speaks to modularization, scale-out, and lifecycle management, for instance.

While not every principle here can be applied, a lot of them can. Having them listed out in an easy-to-remember format like this is a great design aid–learn these, and use them.

Reducing Complexity through Interaction Surfaces

Russ — Mon, 05 Oct 2020 17:00:59 +0000

A recent paper on network control and management (which includes Jennifer Rexford on the author list—anything with Jennifer on the author list is worth reading) proposes a clean slate 4d approach to solving much of the complexity we encounter in modern networks. While the paper is interesting, it’s very unlikely we will ever see a clean slate design like the one described, not least because there will always be differences between what the proper splits are—what should go where.

There is one section of the paper that eloquently speaks to current architecture, however. The authors describe a situation where routing and packet filters are used together to prevent one set of hosts from reaching another set of hosts. Changes in the network, however, cause the packet filters to be bypassed, opening up communications between these two sets of hosts.

This is exactly the problem we so often face in network engineering today—overlapping systems used to solve a single problem do not pay attention to the same signals or information to do their jobs. So here’s a thought about an obvious way to reduce the complexity of your network—try to use one tool to do one job. Before the days of automation, this was much harder to do. There was no way to distribute QoS configurations, for instance, or access lists, much less what might be considered an “easy way.” Because of this, it made some kind of sense to use routing protocols as a sort of distributed database and policy engine to move filters and the like around.

Today, however, we have automation. Because of this, it makes more sense to use automation to manage as much data plane policy as you can, leaving the routing protocol to do its job—provide reachability across an ever-changing network. There are still things, like traffic steering and prefix distribution rules, which should stay inside routing. But when you put routing filters in place to solve a data plane problem, it might be worth thinking about whether that is the right thing to do any longer.

Automation, in this case, can change everything.

Random Thoughts

Russ — Mon, 28 Sep 2020 17:00:06 +0000

This week is very busy for me, so rather than writing a single long, post, I’m throwing together some things that have been sitting in my pile to write about for a long while.

From Dalton Sweeny:

A physicist loses half the value of their physics knowledge in just four years whereas an English professor would take over 25 years to lose half the value of the knowledge they had at the beginning of their career. . . Software engineers with a traditional computer science background learn things that never expire with age: data structures, algorithms, compilers, distributed systems, etc. But most of us don’t work with these concepts directly. Abstractions and frameworks are built on top of these well studied ideas so we don’t have to get into the nitty-gritty details on the job (at least most of the time).

This is precisely the way network engineering is. There is value in the kinds of knowledge that expire, such as individual product lines, etc.—but the closer you are to the configuration, the more ephemeral the knowledge is. This is one of the entire points of rule 11 is your friend. Learn the foundational things that make learning the ephemeral things easier. There are only four problems (really) in moving data from one place to another. There are only around four solutions for each of those problems. Each of those solutions is bounded into a small set (again, about four for each) sub-solutions, or ways of implementing the solution, etc.

I’m going to spend some time talking about this in the Livelesson I’m currently recording, so watch this space for an announcement sometime early next year about publication.

From Ivan P:

What I’m pointing out in this rant is the reverse reasoning along the lines “vendor X is doing something, which confirms there’s a real market need for it”. I’ve been in IT too long, and seen how the startup/VC sausage is made, to believe that fairy tale… and even when it’s true, it doesn’t necessarily imply that you need whatever vendor X is selling.

There are two ways to look at this. Either vendors should lead the market in building solutions, or they should follow whatever the customer wants. From my perspective, one of the problems we have right now is everything is a massive mish-mash of these two things. The operator’s design team thinks of a neat way to do X, and then promises the account team a big check if its implemented. It doesn’t matter that X could be solved some other way that might be simpler, etc.—all that matters is the check. In this case, the vendor stops challenging the customer to build things better, and starts just acting like a commodity provider, rather than an innovative partner.

The interaction between the customer and the vendor needs to be more push-pull than it currently is—right now, it seems like either the operator simply dictates terms to the vendor, or the vendor pretty much dictates architecture to the operator. We need to find a middle ground. The vendor does need to have a solid solution architecture, but the architecture needs to be flexible, as well, and the blocks used to build that architecture need to be usable in ways not anticipated by the vendor’s design folks.

On the other hand, we need to stop chasing features. This isn’t just true of vendors, this is true of operators as well. You get feature lists because that’s what you ask for. Often, operators ask for feature lists because that’s the easiest thing to measure, or because they already have a completely screwed up design they are trying to brownfield around. The worst is—“we have this brownfield that we just can’t get rid of, so we want to build yet another overlay on top, which will make everything simpler.” After about the twentieth overlay a system crash becomes a matter of when rather than if.

Everyone Must Learn to Code

Russ — Mon, 14 Sep 2020 17:00:10 +0000

The word on the street is that everyone—especially network engineers—must learn to code. A conversation with a friend and an article passing through my RSS reader brought this to mind once again—so once more into the breach. Part of the problem here is that we seem to have a knack for asking the wrong question. When we look at network engineer skill sets, we often think about the ability to configure a protocol or set of features, and then the ability to quickly troubleshoot those protocols or features using a set of commands or techniques.

This is, in some sense, what various certifications have taught us—we have reached the expert level when we can configure a network quickly, or when we can prove we understand a product line. There is, by the way, a point of truth in this. If you claim your expertise is with a particular vendor’s gear, then it is true that you must be able to configure and troubleshoot on that vendor’s gear to be an expert. There is also a problem of how to test for networking skills without actually implementing something, and how to implement things without actually configuring them. This is a problem we are discussing in the new “certification” I’ve been working on, as well.

This is also, in some sense, what the hiring processes we use have taught us. Computers like to classify things in clear and definite ways. The only clear and definite way to classify networking skills is by asking questions like “what protocols do you understand how to configure and troubleshoot?” It is, it seems, nearly impossible to test design or communication skills in a way that can be easily placed on a resume.

Coding, I think, is one of those skills that is easy to appear to measure accurately, and it’s also something the entire world insists is the “coming thing.” No coding skills, no job. So it’s easy to ask the easy question—what languages do you know, how many lines of code have you written, etc. But again, this is the wrong question (or these are the wrong questions).

What is the right question? In terms of coding skills, more along the lines of something like, “do you know how to build and use tools to solve business problems?” I phrase it this way because the one thing I have noticed about every really good coder I have known is they all spend as much time building tools as they do building shipping products. They build tools to test their code, or to modify the code they’ve already written en masse, etc. In fact, the excellent coders I know treat functions like tools—if they have to drive a nail twice, they stop and create a hammer rather than repeating the exercise with some other tool.

So why is coding such an important skill to gain and maintain for the network engineer? This paragraph seems to sum it up nicely for me—

“Coding is not the fundamental skill,” writes startup founder and ex-Microsoft program manager Chris Granger. What matters, he argues, is being able to model problems and use computers to solve them. ”We don’t want a generation of people forced to care about Unicode and UI toolkits. We want a generation of writers, biologists, and accountants that can leverage computers.”

It’s not the coding that matters, it’s “being able to model problems and use computers to solve them.” This is the essence of tool building or engineering—seeing the problem, understanding the problem, and then thinking through (sometimes by trial and error) how to build a tool that will solve the problem in a consistent, easy to manage way. I fear that network engineers are taking their attitude of configuring things and automating it to make the configuration and troubleshooting faster. We seem to end up asking “how do I solve the problem of making the configuration of this network faster,” rather than asking “what business problem am I trying to solve?”

To make effective use of the coding skills we’re telling everyone to learn, we need to go back to basics and understand the problems we’re trying to solve—and the set of possible solutions we can use to solve those problems. Seen this way, the routing protocol becomes “just another tool,” just like a function call, that can be used to solve a specific set of problems—instead of a set of configuration lines that we invoke like a magic incantation to make things happen.

Coding skills are important—but they require the right mindset if we’re going to really gain the sorts of efficiencies we think are possible.

The Hedge 51: Tim Fiola and pyNTM

Russ — Wed, 09 Sep 2020 17:00:25 +0000

Have you ever looked at your wide area network and wondered … what would the traffic flows look like if this link or that router failed? Traffic modeling of this kind is widely available in commercial tools, which means it’s been hard to play with these kinds of tools, learn how they work, and understand how they can be effective. There is, however, an open source alternative—pyNTM.

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-051.mp3

download

Understandability

Russ — Mon, 07 Sep 2020 17:00:06 +0000

According to Maor Rudick, in a recent post over at Cloud Native, programming is 10% writing code and 90% understanding why it doesn’t work. This expresses the art of deploying network protocols, security, or anything that needs thought about where and how. I’m not just talking about the configuration, either—why was this filter deployed here rather than there? Why was this BGP community used rather than that one? Why was this aggregation range used rather than some other? Even in a fully automated world, the saying holds true.

So how can you improve the understandability of your network design? Maor defines understandability as “the dev who creates the software is to effortlessly … comprehend what is happening in it.” Continuing—“the more understandable a system is, the easier it becomes for the developers who created it to change it in a way that is safe and predictable.” What are the elements of understandability?

Documentation must be complete, clear, concise, and organized. The two primary failings I encounter in documentation are completeness and organization. Why something is done, when it was last changed, and why it was changed are often missing. The person making the change just assumes “I’ll remember this, or someone will figure it out.” You won’t, and they won’t. Concise is the “other side” of complete … Recording unsubstantial changes just adds information that won’t ever be needed. You have to balance between enough and too much, of course.

Organization is another entire problem in documentation—most people have a favorite way to organize things. When you get a team of people all organizing things based on their favorite way, you end up with a mess. Going back in time … I remember that just about everyone who was assigned to the METNAV shop began their time by re-organizing the tools. Each time the re-organization made things so much easier to find, and improved the MTTR for the airfield equipment we supported … After a while, you’d think someone would ask, “Does re-organizing all the tools every year really help? Or are you just making stuff up for new folks to do?”

Moving beyond documentation, what else can we do to make our networks more understandable?

First, we can focus on actually making networks simpler. I don’t mean just glossing things over with a pretty GUI, or automating thousands of lines of configuration using Python. I mean taking steps by using protocols that are simpler to run, require less configuration, and produce more information you can use for troubleshooting—choose something like IS-IS for your DC fabric underlay rather than BGP, unless you really have several hundred thousand of underlay destinations (hint, if you’ve properly separated “customer” routes in the overlay from “infrastructure” routes in the underlay, you shouldn’t have this kind of routing tangle in the underlay anyway).

What about having multiple protocols that do the same job? Do you really need three or four routing protocols, four or five tunneling protocols, and five or six … well, you get the idea. Reducing the sheer number of protocols running in your network can make a huge difference in the tooling troubleshooting time. What about having four or five kinds of boxes in your network that fulfill the same role? Okay—so maybe you have three DC fabrics, and you run each one using a different vendor. But is there is any reason to have three DC fabrics, each of which has a broad mix of equipment from five different vendors? I doubt it.

Second, you can think about what you would measure in the case of failure, how you would measure it, and put the basic piece in place in the design phase to do those measurements. Don’t wait until you need the data to figure out how to get at it, and what the performance results of trying to get it are going to be.

Third, you can think about where you put policy in your network. There is no “right” answer to this question, other than … be consistent. The first option is to put all your policy in one place—say, on the devices that connect the core to the aggregation, or the devices in the distribution layer. The second option is to always put the policy as close to the source or destination of the traffic impacted by the policy. In a DC fabric, you should always put policy and external connectivity in the T0 or ToR, never in the spine (it’s not a core, it’s a spine).

Maybe you have other ideas on how to improve understandability in networks … If you do, get in touch and let’s talk about it. I’m always looking for practical ways to make networks more understandable.

Unsolicited Multicast: Random Thoughts on the LFN White Paper

Russ — Mon, 17 Aug 2020 17:00:36 +0000

A short while back, the Linux Foundation (Networking), or LFN, published a white paper about the open source networking ecosystem. Rather than review the paper, or try to find a single theme, I decided to just write down “random thoughts” as I read through it. This is the (rather experimental) result.

The paper lists five goals of the project which can be reduced to three: reducing costs, increasing operator’s control over the network, and increasing security (by increasing code inspection). One interesting bit is the pairing of cost reduction with increasing control. Increasing control over a network generally means treating it less like an opaque box and more like a disaggregated set of components, each of which can be tuned in some way to improve the fit between network services, network performance, and business requirements. The less a network is an opaque box, however, the more time and effort required to manage it. This only makes sense—tuning a network to perform better requires time and talent, both of which cost money.

The offsetting point here is disaggregation and using open source can save money—although in my experience it never does. Again, running disaggregated software and hardware requires time and talent, both of which cost money. Intuitively, then, reducing costs and increasing control don’t “pair” together like this. Building a more tunable, flexible network might be possible at the same cost as building a network in some “more traditional way,” but “costs less” is generally the last thing on my mind when I’m thinking about disaggregation, open source, and more modular, systemic wholistic designs.

The bottom line—if you are going to try to sell disaggregation to your boss, do not play the “cost card.” Focus on the business side of things, instead. Another way of putting this—don’t reduce what you are selling to a commodity, or an item with no business value other than reducing costs. First, this is simply not true. Second, you’ll never meet a salesman who says, “what I’m selling you is a commodity, so I’m really just after getting it to you for cheaper.” Even toilet paper companies sell on quality. Networks are more important than toilet paper to the business, right?

A second interesting point— “the network control layer is where end-to-end complex network services are designed and executed.” I broadly agree with this statement, but … I have been pushing, for some time, the idea that there is not just one control plane. Part of the problem with network design is we tend to modularize topologically, using one form or another of hierarchy, quite nicely. What we do no do, however, is think about how to modularize vertically to create a true wholistic view of the system “as a whole.” We do create multiple layers of data planes (they’re called tunnels!), but we do not think about creating multiple layers of control planes.

Going way back in time, when transit providers first started scaling, BGP speakers would not advertise a route that was not also present in a local IGP table (not just the routing table, but the actual OSPF. EIGRP, or IS-IS table). This was called synchronization, and it was designed to ensure traffic being forwarded into the network from the edge had a real path through the network. Since BGP can “skip over” routers using multihop, it was far too easy to send traffic to a router that was not running BGP, and hence did not have forwarding information for the destination … even though some router just one or two hops away did because it was running BGP.

Operators soon discovered keeping their IGP and BGP tables synchronized simply isn’t possible—the IGPs just could not support the route counts required. This led to running BGP on every router in the AS, which led to confederations, then to route reflectors, then to MPLS and other tunneling mechanisms to reduce state in the network core.

Running BGP on every router cleanly separates internal or infrastructure routes from external or customer routes. This is an effective use of information hiding vertically, an overlay with some information, and an underlay with other information. This is what I’ve been pressing for in the DC fabric space for quite a while now … and its something the network engineering community needs to explore and flesh out more in many other spaces.

So, there it is—I could go on, but I think you’re probably already bored with my random thoughts.

Deterministic Networking and New IP

Russ — Mon, 10 Aug 2020 17:34:30 +0000

For those not following the current state of the ITU, a proposal has been put forward to (pretty much) reorganize the standards body around “New IP.” Don’t be confused by the name—it’s exactly what it sounds like, a proposal for an entirely new set of transport protocols to replace the current IPv4/IPv6/TCP/QUIC/routing protocol stack nearly 100% of the networks in operation today run on. Ignoring, for the moment, the problem of replacing the entire IP infrastructure, what can we learn from this proposal?

What I’d like to focus on is deterministic networking. Way back in the days when I was in the USAF, one of the various projects I worked on was called PCI. The PCI network was a new system designed to unify the personnel processing across the entire USAF, so there were systems (Z100s, 200s, and 250s) to be installed in every location across the base where personnel actions were managed. Since the only wiring we had on base at the time was an old Strowger mainframe, mechanical crossbars at a dozen or so BDFs, and varying sizes of punch-downs at BDFs and IDFs, everything for this system needed to be punched- or wrapped-down as physical circuits.

It was hard to get anything like real bandwidth over paper-wrapped cables with lead shielding that had been installed in the 1960s (or before??), wrap-downs, and 66 blocks. The max we could get in terms of bandwidth was about a T1, and often less. We needed more bandwidth than this, so we installed inverse multiplexers that would combine the bandwidth across multiple physical circuits to something large enough for the PCI system to run on.

The problem we ran into with these inverse multiplexers was they would sometimes fall out of synchronization, meaning frames would be delivered out of order—one of the two cardinal violations of deterministic networks. Since PCI was designed around a purely deterministic networking model, these failures caused havoc with HR. Not a good thing.

The second and third cardinal rules of deterministic networking are that there will be no jitter, and the network will deliver all packets accepted by the network. To make these rules work, there must be some form of entrance gating (circuit setup, generally speaking, with busy signals), fixed packet sizes, and strict quality of service.

In contrast, the rules of packet-switched (non-deterministic) networking are: all packets are accepted, packets can be (almost) any size, there is no guarantee any particular packet will be delivered, and there’s no way to know what the jitter and delay on delivering any particular packet might be.

Some kinds of payloads just need deterministic semantics, while others just need deterministic semantics. The problem, then, is how to build a single network that supports both kinds of semantics. Solving this problem is where thinking through the situation using a problem-solution mindset can help you, as a network engineer (or protocol designer, or software creator) understand what can be done, what cannot be done, and what the limitations are going to be no matter what solution you choose.

There are, at base, only three solutions to this problem. The first is to build a network that supports both deterministic and packet-switched traffic from the control planes to the switching path. The complexity of doing something like this would ultimately outweigh any possible benefit, so let’s leave that solution aside. The second is to emulate a deterministic network on top of a packet-switched network, and the third is to emulate a packet-switched network on top of a deterministic network. Consider the last option first—emulating a packet-switched network on top of a deterministic network. For those who are old enough, this is IP-over-ATM. It didn’t work. The inefficiencies of trying to stuff variably sized packets into the fixed-size frames required to create a deterministic network were so significant that it just … didn’t work. The control planes were difficult to deploy and manage—think ATM LANE, for instance—and the overall network was just really complicated.

Today, what we mostly do is emulate deterministic networks over packet-switched networks. This design allows traffic that does well in a packet-switched environment to run perfectly fine, and traffic that likes “some” deterministic properties, like voice, to work fine with a little effort on the part of the network designer and operator. Building something with all the properties of a genuinely deterministic network on top of a packet-switched network, however, is difficult.

To get there, you need a few things. First, you need some way to strictly control/steer flows, so the mix of packet and deterministic traffic is going allow you to meet deterministic requirements on every link through the network. Second, you need some excellent QoS mechanism that knows how to provide deterministic results while mixing in packet-switched traffic “underneath.” Third, you need to overprovision enough to take up any “slack” and variability, as well as to account for queuing and clock on/clock off delays through switching devices.

The truth is we can come pretty close—witness the ability of IP networks to carry video streaming and what looks like traditional voice today. Can it get better? I’m confident we can build systems that are ever closer to emulating a truly deterministic network on top of a packet-switched network—so long as we mind the tradeoffs and are willing to throw capacity and hardware at the problem.

Can we ever truly emulate packet-switched networks on top of deterministic networks? I don’t see how. It’s not just that we’ve tried and failed; it’s that the math just doesn’t ever seem to “work right” in some way or another.

So while the “new IP” proposal brings up an interesting problem—future applications may need more deterministic networking profiles—it doesn’t explain why we should believe either building a completely parallel deterministic network or trying to flip the stack to emulate a packet-switched network on top of a deterministic network, makes more sense than what we are doing today.

Looking at this from a problem/solution perspective helps clarify the situation, and produce a conclusion about which path is best, without even getting into specific protocols or implementations. Really understanding the problem you are trying to solve, even at an abstract level, and then working through all the possible solutions, even ones that might not have been invented yet (although I can promise you then have been invented), can help you get your mind around the engineering possibilities.

It is probably not how you’re accustomed to looking at network design, protocol selection, etc. But it’s one you should start using.

The 4D Network

Russ — Mon, 03 Aug 2020 17:00:29 +0000

I think we can all agree networks have become too complex—and this complexity is a result of the network often becoming the “final dumping ground” of every problem that seems like it might impact more than one system, or everything no-one else can figure out how to solve. It’s rather humorous, in fact, to see a lot of server and application folks sitting around saying “this networking stuff is so complex—let’s design something better and simpler in our bespoke overlay…” and then falling into the same complexity traps as they start facing the real problems of policy and scale.

This complexity cannot be “automated away.” It can be smeared over with intent, but we’re going to find—soon enough—that smearing intent on top of complexity just makes for a dirty kitchen and a sub-standard meal.

While this is always “top of mind” in my world, what brings it to mind this particular week is a paper by Jen Rexford et al. (I know Jen isn’t on the lead position in the author list, but still…) called A Clean Slate 4D Approach to Network Control and Management. Of course, I can appreciate the paper in part because I agree with a lot of what’s being said here. For instance—

We believe the root cause of these problems lies in the control plane running on the network elements and the management plane that monitors and configures them. In this paper, we argue for revisiting the division of functionality and advocate an extreme design point that completely separates a network’s decision logic from the protocols that govern interaction of network elements.

In other words, we’ve not done our modularization homework very well—and our lack of focus on doing modularization right is adding a lot of unnecessary complexity to our world. The four planes proposed in the paper are decision, dissemination, discovery, and data. The decision plane drives network control, including reachability, load balancing, and access control. The dissemination plane “provides a robust and efficient communication substrate” across which the other planes can send information. The discovery plane “is responsible for discovering the physical components of the network,” giving each item an identifier, etc. The data plane carries packets edge-to-edge.

I do have some quibbles with this architecture, of course. To begin, I’m not certain the word “plane” is the right one here. Maybe “layers,” or something else that implies more of a modular concept with interactions, and less a “ships in the night” sort of affair. My more substantial disagreement is with the placement of “interface configuration” and where reachability is placed in the model.

Consider this: reachability and access control are, in a sense, two sides of the same coin. You learn where something is to make it reachable, and then you block access to it by hiding reachability from specific places in the network. There are two ways to control reachability—by hiding the destination, or by blocking traffic being sent to the destination. Each of these has positive and negative aspects.

But notice this paradox—the control plane cannot hide reachability towards something it does not know about. You must know about something to prevent someone from reaching it. While reachability and access control are two sides of the same coin, they are also opposites. Access control relies on reachability to do its job.

To solve this paradox, I would put reachability into discovery rather than decision. Discovery would then become the discovery of physical devices, paths, and reachability through the network. No policy would live here—discovery would just determine what exists. All the policy about what to expose about what exists would live within the decision plane.

While the paper implies this kind of system must wait for some day in the future to build a network using these principles, I think you can get pretty close today. My “ideal design” for a data center fabric right now is (modified) IS-IS or RIFT in the underlay and eVPN in the overlay, with a set of controllers sitting on top. Why?

IS-IS doesn’t rely on IP, so it can serve in a mostly pure discovery role, telling any upper layers where things are, and what is changing. eVPN can provide segmented reachability on top of this, as well as any other policies. Controllers can be used to manage eVPN configuration to implement intent. A separate controller can work with IS-IS on inventory and lifecycle of installed hardware. This creates a clean “break” between the infrastructure underlay and overlays, and pretty much eliminates any dependence the underlay has on the overlay. Replacing the overlay with a more SDN’ish solution (rather than BGP) is perfectly do-able and reasonable.

While not perfect, the link-state underlay/BGP overlay model comes pretty close to implementing what Jen and her co-authors are describing, only using protocols we have around today—although with some modification.

But the main point of this paper stands—a lot of the reason for the complexity we fact today is simply because we modularize using aggregation and summarization and call the job “done.” We aren’t thinking about the network as a system, but rather as a bunch of individual appliances we slap together into places, which we then connect through other places (or strings, like between tin cans).

Something to think about the next time you get that 2AM call.

Smart Network or Dumb?

Russ — Mon, 27 Jul 2020 17:00:33 +0000

Should the network be dumb or smart? Network vendors have recently focused on making the network as smart as possible because there is a definite feeling that dumb networks are quickly becoming a commodity—and it’s hard to see where and how steep profit margins can be maintained in a commodifying market. Software vendors, on the other hand, have been encroaching on the network space by “building in” overlay network capabilities, especially in virtualization products. VMWare and Docker come immediately to mind; both are either able to, or working towards, running on a plain IP fabric, reducing the number of services provided by the network to a minimum level (of course, I’d have a lot more confidence in these overlay systems if they were a lot smarter about routing … but I’ll leave that alone for the moment).

How can this question be answered? One way is to think through what sorts of things need to be done in processing packets, and then think through where it makes most sense to do those things. Another way is to measure the accuracy or speed at which some of these “packet processing things” can be done so you can decide in a more empirical way. The paper I’m looking at today, by Anirudh et al., takes both of these paths in order to create a baseline “rule of thumb” about where to place packet processing functionality in a network.

Sivaraman, Anirudh, Thomas Mason, Aurojit Panda, Ravi Netravali, and Sai Anirudh Kondaveeti. “Network Architecture in the Age of Programmability.” ACM SIGCOMM Computer Communication Review 50, no. 1 (March 23, 2020): 38–44. https://doi.org/10.1145/3390251.3390257.

The authors consider six different “things” networks need to be able to do: measurement, resource management, deep packet inspection, network security, network virtualization, and application acceleration. The first of these they measure by setting introducing errors into a network and measuring the dropped packet rate using various edge and in-network measurement tools. What they found was in-network measurement has a lower error rate, particularly as time scales become shorter. For instance, Pingmesh, a packet loss measurement tool that runs on hosts, is useful for measuring packet loss in the minutes—but in-network telemetry can often measure packet loss in the seconds or milliseconds. They observe that in-network telemetry of all kinds (not just packet loss) appears to be more accurate when application performance is more important—so they argue telemetry largely belongs in the network.

Resource management, such as determining which path to take, or how quickly to transmit packets (setting the window size for TCP or QUIC, for instance), is traditionally performed entirely on hosts. The authors, however, note that effective resource management requires accurate telemetry information about flows, link utilization, etc.—and these things are best performed in-network rather than on hosts. For resource management, then, they prefer a hybrid edge/in-network approach.

The argue deep packet inspection and network virtualization are both best done at the edge, in hosts, because these are processor intensive tasks—often requiring more processing power and time than network devices have available. Finally, they argue network security should be located on the host, because the host has the fine-grained service information required to perform accurate filtering, etc.

Based on their arguments, the authors propose four rules of thumb. First, tasks that leverage data only available at the edge should run at the edge. Second, tasks that leverage data naturally found in the network should be run in the network. Third, tasks that require large amounts of processing power or memory should be run on the edge. Fourth, tasks that run at very short timescales should be run in the network.

I have, of course, some quibbles with their arguments … For instance, the argument that security should run on the edge, in hosts, assumes a somewhat binary view of security—all filters and security mechanisms should be “one place,” and nowhere else. A security posture that just moves “the firewall” from the edge of the network to the edge of the host, however, is going to (eventually) face the same vulnerabilities and issues, just spread out over a larger attack surface (every host instead of the entry to the network). Security shouldn’t work this way—the network and the host should work together to provide defense in depth.

The rules of thumb, however, seem to be pretty solid starting points for thinking about the problem. An alternate way of phrasing their result is through the principle of subsidiarity—decisions should be made as close as possible to the information required to make them. While this is really a concept that comes out of ethics and organizational management, it succinctly describes a good rule of thumb for network architecture.

To Route or Not?

Russ — Mon, 01 Jun 2020 19:00:13 +0000

When you are building a data center fabric, should you run a control plane all the way to the host? This is question I encounter more often as operators deploy eVPN-based spine-and-leaf fabrics in their data centers (for those who are actually deploying scale-out spine-and-leaf—I see a lot of people deploying hybrid sorts of networks designed as “mini-hierarchical” designs and just calling them spine-and-leaf fabrics, but this is probably a topic for another day). Three reasons are generally given for deploying the control plane all on the hosts attached to the fabric: faster down detection, load sharing, and traffic engineering. Let’s consider each of these in turn.

Faster Down Detection. There’s no simple way for ToR switches to determine when the connection to a host has failed, whether the host is single or dual-homed. Somehow the set of routes reachable through the host must be related to the interface state, or some underlying fast hello state (such as BFD), so that if a link fails the ToR knows to pull the correct set of routes from the routing table. It’s simpler to just let the host itself advertise the correct reachability information; when the link fails, the routing session will fail, and the correct routes will automatically be withdrawn.

Load Sharing. While this only applies to hosts with two connections into the fabric (dual-homed hosts), this is still an important use case. If a dual-homed host only has two default routes to work from, the host is blind to network conditions, and can only load share equally across the available paths. Equal load sharing, however, may not be ideal in all situations. If the host is running routing, it is possible to inject more intelligence into the load sharing between the upstream links.

Traffic Engineering. Or traffic shaping, steering, etc. In some cases, traffic engineering requires injecting a label or outer header onto the packet as it enters the fabric. In others, more specific routes might be sent along one path and not another to draw specific kinds of traffic through a more optimal route in the fabric. This kind of traffic engineering is only possible if the control plane is running on the host.

All these reasons are well and good, but they all assume something that should be of great interest to the network designer: which control plane are we talking about?

Most DC fabric designs I see today assume there is a single control plane running on the fabric—generally this single control plane is BGP, and it’s being used both to provide basic IP connectivity through the fabric (the infrastructure underlay control plane) and to provide tunneled overlay reachability (the infrastructure overlay control plane—generally eVPN).

This entangling of the infrastructure underlay and overlay has always seemed, to me, to be less than ideal. When I worked on large-scale transit provider networks in my more youthful days, we intentionally designed networks that separated customer routes from infrastructure routes. This created two separate failure and security domains in the network, as well as dividing the telemetry data in ways that allowed faster troubleshooting of common problems.

The same principles should apply in a DC fabric—after all, the workloads are essentially customers of the fabric, while the basic underlay connectivity counts as infrastructure. The simplest way to adopt this sort of division of labor is the same way large-scale transit providers did (and do)—use two different routing protocols for the underlay and overlay. For instance, IS-IS or RIFT for the underlay and eVPN using BGP for the overlay.

If you move to two layers of control plane, the question above becomes a bit more nuanced—should the overlay control plane run on the hosts? Should the underlay control plane run on the hosts?

For faster down detection—for those hosts that need faster down detection, BFD tied to IGP neighbor state can remove the correct nexthop from the local routing table at a ToR, causing the correct reachable destinations to be withdrawn. Alternatively, the host can run an instance of the overlay control plane, which allows it to advertise and withdraw “customer routes” directly. In neither case is the underlay control plane required to run on the host.

For load sharing and traffic engineering—if something like SRm6, or even other more traditional forms of traffic engineering, the information needed will be carried in the overlay rather than the underlay—so the underlay routing protocol does not need to run on the host.

On the other side of the coin, not running the underlay protocol on the host can help the overall network security posture. Assume a public facing host connected to the fabric is somehow pwned… If the host is running the underlay protocol, its pretty simple to DoS the entire fabric to take it down, or to inject incorrect routing information. If the overlay is configured correctly, however, only the virtual topology which the host has access to can be impacted by an attack—and if microsegmentation is deployed, that damage can be minimized as well.

From a complexity perspective, running the underlay control plane on the host dramatically increases the amount of state the host must maintain; there is no effective filter you can run to reduce state on the host without destroying some of the advantages gained by running the underlay control plane there. On the other hand, the ToR can be configured to filter routing information the host receives, controlling the amount of state the host needs to manage.

Control plane on the host or not? This is one of those questions where properly modularized and layered network design can make a big difference in what the right answer should be.

Ruminating on SOS

Russ — Mon, 25 May 2020 17:00:51 +0000

Many years ago I attended a presentation by Dave Meyers on network complexity—which set off an entire line of thinking about how we build networks that are just too complex. While it might be interesting to dive into our motivations for building networks that are just too complex, I starting thinking about how to classify and understand the complexity I was seeing in all the networks I touched. Of course, my primary interest is in how to build networks that are less complex, rather than just understanding complexity…

This led me to do a lot of reading, write some drafts, and then write a book. During this process, I ended coining what I call the complexity triad—State, Optimization, and Surface. If you read the book on complexity, you can see my views on what the triad consisted of changed through in the writing—I started out with volume (of state), speed (of state), and optimization. Somehow, though, interaction surfaces need to play a role in the complexity puzzle.

First, you create interaction surface when you modularize anything—and you modularize to control state (the scope to set apart failure domains, the speed and volume to enable scaling). Second, adding interaction surfaces adds complexity by creating places where information must be exchanged—which requires protocols and other things. Finally, reducing state through abstraction at an interaction surface is the primary cause of many forms of suboptimal behavior in a control plane, and causes unintended consequences. Since interaction surfaces are so closely tied to state and optimization, then, I added surfaces to the triad, and merged the two kinds of state into one, just state.

I have been thinking through the triad again in the last several weeks for various reasons, and I’m not certain it’s quite right still because I’m not convinced surfaces are really a tradeoff against state and optimization. It seems more accurate to say that state and optimization trade off through interaction surfaces. This does not make it any less of a triad, but it might mean I need to find a little different way to draw it. One way to illustrate it is as a system of moving parts, such as the illustration below.

If you think of the interaction surface between modules 1 and 2—two topological parts, or a virtual topology on top of a physical—then the abstraction is the amount of information allowed to pass between the two modules. For instance, in aggregation the length of the aggregated prefixes, or the aggregated prefix metrics, etc.

When you “turn the crank,” so-to-speak, you adjust the volume, speed (velocity), breadth, or depth of information being passed between the modules—either more or less information, faster or slower, in more places or fewer, or the reaction of the module receiving the state. Every time you turn the crank, however, there is not one reaction but many. Notices optimization 1 will turn in the opposite direction from optimization 2 in the diagram—so turning the crank for 1 to be more optimal will always result in 2 becoming less optimal. There are tens or hundreds of such interactions in any system, and it is impossible for any person to know or understand all of them.

For instance, if you aggregate hundreds of /64’s to tens of /60’s, you reduce the state and optimize by reducing the scope of the failure domain. On the other hand, because you have less specific routing information, traffic is (most likely) going to flow along less-than-optimal paths. If you “turn the crank” by aggregating those same hundreds of /64’s to a 0::0, you will have more “airtight” failure domains or modules, but less optimal traffic flow. Hence …

If you haven’t found the tradeoffs, you haven’t looked hard enough.

What understanding the SOS triad allows you, combined with a fundamental knowledge of how these things work, is to know where to look for the tradeoffs. Maybe it would be better to illustrate the SOS triad with surfaces at the bottom all the time, acting as a sort of fulcrum or balance point between state and optimization… Or maybe a completely different illustration would be better. This is something for me to think about more and figure out.

Complexity interacts with these interaction surfaces as well, of course—the more complex a system becomes, the more complex the interaction surface within the system become or the more of them you have. A key point in design of any kind is balancing the number of interaction surfaces with their complexity, depth, and breath—in other words, where should you modularize, what should each module contain, what sort of state passed between the modules, where does state pass between the modules, etc. Somehow, mentally, you have to factor in the unintended consequences of hiding information (the first corollary to Keith’s Law, in effect), and the law of leaky abstractions (all nontrivial abstractions leak).

This is a far different way of looking at networks and their design than what you learned in any random certification, and its probably not even something you will find in a college textbook. It is quite difficult to apply when you’re down in the configuration of individual devices. But it’s also the key to understanding networks as a system and beginning the process of thinking about where and how to modularize to create the simplest system to solve a given hard problem.

Going back to the beginning, then—one of the reasons we build such complex networks is we do not really think about how the modules fit together. Instead, we use rules-of-thumb and folk wisdom while we mumble about failure domains and “this won’t scale” under our breath. We are so focused on the individual gears becoming commodities that we fail to see the system and all its moving parts—or we somehow think “this is all so easy,” that we build very inefficient systems with brute-force resolutions, often resulting in mass failures that are hard to understand and resolve.

Sorry, there’s no clear point or lesson here… This is just what happens when I’ve been buried in dissertation work all day and suddenly realize I have not written a blog post for this week… But it should give you something to think about.

Understanding DC Fabric Complexity

Russ — Mon, 11 May 2020 17:00:34 +0000

When I think of complexity, I mostly consider transport protocols and control planes—probably because I have largely worked in these areas from the very beginning of my career in network engineering. Complexity, however, is present in every layer of the networking stack, all the way down to the physical. I recently ran across an interesting paper on complexity in another part of the network I had not really thought about before: the physical plant of a data center fabric.

Some researchers at USC and VMWare have thought about complexity in the physical infrastructure, however, and they wrote a rather interesting paper about it.

The paper begins by defining what complexity in the physical infrastructure of a DC fabric looks like. They focus on packaging, or the layout of the switches in the fabric, the bundles of cabling required to wire the topology, and the number and locations of patch panels required. The packaging and patch panels impact the length and complexity of the cable runs (whether optical or copper), which represents a base complexity for the entire topology.

The second thing they consider is the lifecycle of the physical fabric infrastructure. What steps are required to upgrade the fabric from a smaller configuration to a larger one? Or from a lower speed (higher oversubscription) to a higher speed (lower oversubscription)? The result is the ability to put a number on the overall complexity of each topology.

The first class of topologies they consider are spine-and-leaf, such as the Clos, Benes, and butterfly fabrics. They call all kinds of spine-and-leaf fabrics Clos fabrics. Spine-and-leaf fabrics, they note generally have very low cabling complexity because their symmetry encourages consistent bundling and hardware placement. They call the second kind of topology expander fabrics; the most common fabric in this class is the dragonfly. These topologies are more difficult to wire but simpler to scale out because they can be expanded largely by modifying just the edge of the fabric. Their analysis shows these classes of fabric rate equally on their complexity scale.

A side note they don’t consider in the paper—their complexity computation implies that if you are building a fabric with a somewhat fixed range of sizes, and you can preplan the location of spines leaving enough room for the maximum sized fabric on the first day, spine-and-leaf fabrics are less complex than the fancier topologies you might hear about from time to time. Since most data center fabrics do, in fact, fall into these kinds of constraints (given a good day one designer!), this seems to validate the widespread use of butterfly and Clos fabrics for most applications. This feels like a significant result for most common data center fabric designs.

Finally, they describe an interesting topology they call FatClique, which is an interesting blend of spine-and-lead and edge expander topologies; I’ve screen grabbed the image from the paper below.

Overall, it’s well worth spending the time to read the entire paper if you have an in-depth interest in fabric design.The way this topology is described feels very much like a Benes to me, or a butterfly where the fabric routers are replaced by fabrics (making a seven-stage fabric). It’s hard to tell how useful this topology would be in real deployments—but that researchers are looking into alternatives other than the venerable spine-and-leaf is interesting in its own right.

The Resilience Problem

Russ — Mon, 27 Apr 2020 17:00:12 +0000

…we have educated generations of computer scientists on the paradigm that analysis of algorithm only means analyzing their computational efficiency. As Wikipedia states: “In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them.” In other words, efficiency is the sole concern in the design of algorithms. … What about resilience? —Moshe Y. Vardi

This quote set me to thinking about how efficiency and resilience might interact, or trade off against one another, in networks. The most obvious extreme cases are two routers connected via a single long-haul link and the highly parallel data center fabrics we build today. Obviously adding a second long-haul link would improve resilience—but at what cost in terms of efficiency? Its also obvious highly meshed data center fabrics have plenty of resilience—and yet they still sometimes fail. Why?

These cases can be described as efficiency extremes. The single link between two distant points is extremely efficient at minimizing cost and complexity; there is only one link to pay for, only one pair of devices to configure, etc. The highly meshed data center fabric, on the other hand, is extremely efficient at rapidly carrying large amounts of data between vast numbers of interconnected devices (east/west traffic flows). Have these optimizations towards one goal resulted in tradeoffs in resilience?

Consider the case of the single long-haul link between two routers. In terms of the state/optimization/surfaces (SOS) tirade, this single pair of routers and single link minimize the amount of control plane state and the breadth of surfaces (there is only point at which the control plane and the physical network intersect, for instance). The tradeoff, however, is a single link failure causes all traffic through the network to stop flowing—the network completely fails to do the work its designed to do. To create resiliency, or rather add a second dimension of optimization to the network, a second link and a second pair of routers need to be added. Adding these, however, will increase the amount of state and the number of interaction surfaces in the network. Another way to put this is the overall system becomes more complex to solve a harder set of problems—inexpensive traffic flow versus minimal cost traffic flow with resilience.

The second case is a little harder to understand—we assume all those parallel links necessarily make the network more resilient. If this is the case, then why do data center fabrics ever fail? In fact, DC fabrics are plagued by one of the hardest kinds of failure to understand and repair—grey failures. Going back to the SOS triad, the massive number of parallel links and devices in a DC fabric, designed to optimize the network for carrying massive amounts of traffic, also add lots of state and interaction surfaces to the network. Increasing the amount of state and interaction surfaces should, in theory, reduce some other form of optimization—in this case resilience through overwhelmed control planes and grey failures.

In the case of a DC fabric, simplification can increase resilience. Since you cannot reduce the number of links and devices, you must think through how and where to abstract information to reduce state. Reducing state, in turn, is bound to reduce the efficiency of traffic flows through the network, so you immediately run into a domino effect of optimization tradeoffs. Highly turned optimization for traffic carrying causes a lack of optimization in resilience; optimizing for resilience reduces the optimization of traffic flow through the network. These kinds of chain reactions are common in the network engineering world. How can you optimize against grey failures? Perhaps simplifying design by using a single kind of optic, rather than having multiple kinds, or finding other ways to cope with the complexity in physical design.

Returning to the original quote—we often build a lot of resilience into network designs, so we do not face the same sorts of problems software designers and implementors do. Quite often the hyper-focus on resilience in network design is a result of a lack of resilience in software design—software designers have thrown the complexity of resilient design over the cubicle wall into the network operator’s lap. This clearly does not seem to be the most efficient way to handle things, as network are vastly more complex because of the absolute resilience they are expected to provide; looking at the software and network as a system might produce a more resilient, and yet simpler, system.

The key, in the meantime, is for network engineers to learn how to ply the tradeoffs, understanding precisely what their goals are—or what they are optimizing for—and how those optimizations trade off against one another.

Learning from Failure at Scale

Russ — Mon, 13 Apr 2020 17:00:54 +0000

One of the difficulties for the average network operator trying to understand their failure rates and reasons is they just don’t have enough devices, or enough incidents, to make informed observations. If you have a couple of dozen switches, it is often hard to understand how often software defects take a device down versus human error (Mean Time Between Mistakes, or MTBM). As networks become larger, however, more information becomes available, and more interesting observations can be made. A recent paper written in conjunction with Facebook uses information from Facebook’s data center fabrics to make some observations about the rate and severity of different kinds of failures—needless to say, the results are fairly interesting.

To produce the study, the authors took data from Facebook’s ticket logging system over 6 years, from 2011 through 2018. They used language-based systems to classify each event based on severity, kind of remediation, and root cause. Once the events were classified, the researchers plotted and tried to understand the results. For instance, table 2 lists the most common root causes of data center fabric incidents: 17% were maintenance, 13% misconfiguration, 13% hardware, and 12% software defects (bugs).

Given Facebook’s network is completely automated, with a full code review/canary process for validating changes before they are put into production, misconfiguration failures should lower than a manually operated network. That 13% of failures are still accounted for by misconfiguration shows even the best automation program cannot eliminate failures from misconfiguration. This number is also interesting because it implies networks without this degree of automation must have much higher failure rates due to misconfiguration. While the raw number of failures are not given, this seems to provide both an idea of how much improvement automation can create, as well as a sort of “cap” on how much improvement operators can expect by automating.

If misconfiguration causes 13% of all failures, and software defects cause 12%, then 25% of all failures are caused by human error. I don’t know of any other studies of this kind, but 25% sounds about right based on years of experience. Whether this 25% is spread across failures in vendor code and operator configuration, or across operator created code and operator configuration, the percentage of failure seems to remain about the same. It is not likely you can eliminate failures caused by human error, nor are you likely to drive it down more than a couple of percentage points.

Another interesting finding here is larger networks increase the time humans take to resolve incidents. As the size of the network scales up, the MTTR scales up with it. This is intuitive—larger networks tend to have more complex configurations, leading to more time spent trying to chase down and understand a problem. One thing the paper does not discuss, but might be interesting, is how modularization impacts these numbers. Intuitively, containing failures within a module (whether horizontally along topological lines or vertically through virtualization) should decrease the scope in which a network engineer needs to search to find a problem and resolve it. This is, on the other hand, likely to be offset somewhat by the increased complexity and reduction in visibility caused by segmentation—so it’s hard to determine what the overall effect of deeper segmentation in a network might be.

Overall, this is an interesting paper to parse through and understand—there are lots of great insights here for network operators at any scale.

Enterprise and Service Provider—Once more into the Windmill

Russ — Mon, 16 Mar 2020 17:00:32 +0000

There is no enterprise, there is no service provider—there are problems, and there are solutions. I’m certain everyone reading this blog, or listening to my podcasts, or listening to a presentation I’ve given, or following along in some live training or book I’ve created, has heard me say this. I’m also certain almost everyone has heard the objections to my argument—that hyperscaler’s problems are not your problems, the technologies and solutions providers user are fundamentally different than what enterprises require.

Let me try to recap some of the arguments I’ve heard used against my assertion.

The theory that enterprise and service provider networks require completely different technologies and implementations is often grounded in scale. Service provider networks are so large that they simply must use different solutions—solutions that you cannot apply to any network running at a smaller scale.

The problem with this line of thinking is it throws the baby out with the bathwater. Google is using automation to run their network? Well, then… you shouldn’t use automation because Google’s problems are not your problems. Microsoft is deploying 100g Ethernet over fiber? Then clearly enterprise networks should be using Token Ring or ARCnet because… Microsoft’s problems are not your problems.

The usual answer is—“I’m not saying we shouldn’t take good ideas when we see them, but we shouldn’t design networks the way someone else does just because.” I don’t see how this clarifies the solution, though—when is it a good idea or a bad one? What is our criterion to decide what to adopt and what not to adopt? Simply saying “X’s problems aren’t your problems” doesn’t really give me any actionable information—or at least I’m not getting it if it’s buried in there someplace.

Instead—maybe—just maybe—we are looking at this all wrong. Maybe there is some other way classify networks that will help us see the problem set better.

I don’t think networks are undifferentiated—I think the enterprise/service provider/hyerpscaler divide is not helpful to understand how different networks are … different, and how to correctly identify an environment and build to it. Reading a classic paper in software design this week—Programs, Life Cycles, and Laws of Software Evolution—brought all this to mind. In writing this paper, Meir Lehman was facing many of the same classification problems, just in software development rather than in building networks.

Rather than saying “enterprise software is different than service provider software”—an assertion absolutely no-one makes—or even “commercial software is different than private software, and developers working in these two areas cannot use the same tools and techniques,” Lehman posits there are three kinds of software systems. He calls these S-Programs, in which the problem and solution can be fully specified; P-Programs, in which the problem can be fully specified, but the program can only be partially specified because of complexity and scale; and E-Programs, where the program itself become part of the world it models. Lehman thinks most software will move towards S-Program status as time moves on—something that hasn’t happened (the reasons are out of scope for this already-too-long-blog-post).

But the classification is useful. For S-Programs, the inputs and outputs can be fully specified, full-on testing can take place before the software is deployed, and lifecycle management is largely about making the software more fully conform to its original conditions. Maybe there are S-Networks, too? Single-purpose networks which are aimed at fulfilling on well-defined thing, and only that thing. Lehman talks about learning how to breaking larger problems into smaller one so the S-Problems can be dealt with separately—is this anything different than separating out the basic problem of providing IP connectivity in a DC fabric underlay, or even providing basic IP connectivity in a transit or campus network, treating it as a separate module with fairly well design goals and measurements?

Lehman talks about P-Programs, where the problem is largely definable, but the solutions end up being more heuristic. Isn’t this similar to a traffic engineering overlay, where we largely know what the goals are, but we don’t necessarily know what specific solution is going to needed at any moment, and the complete set of solutions is just too large to initially calculate? What about E-Programs, where the software becomes a part of the world it models? Isn’t this like the intent-based stuff we’ve been talking about networking for going one 30 years now?

Looking at it another way, isn’t it possible that some networks are largely just S-Networks? And others are largely E-Networks? And that these classifications have nothing to do with whether the network is being built by what we call an “enterprise” or a “service provider?” Isn’t is possible that S-Networks should probably all use the same basic sort of structure and largely be classified as a “commodity,” while E-Networks will all be snowflakes, and largely classified as having high business importance?

Just like I don’t think the OSI model is particularly helpful in teaching and understanding networks any longer, I don’t find the enterprise/service/hyperscaler model very useful in building and operating networks. The service enterprise/service provider divide tends to artificially limit idea transfer when it wants to be transferred, and artificially “hype up” some networks while degrading others—largely based on perceptions of scale.

Scale != complexity. It’s not about service providers and enterprises. It doesn’t matter if Google’s problems are not your problems; borrowing from the hyperscale is not a “bad thing.” It’s just a “thing.” Think clearly about the problem set, understand the problem set, and borrow liberally. There is no such thing as a “service provider technology,” nor is there any such thing as an “enterprise technology.” There are problems, and there are solutions. To be an engineer is to connect the two.

The Hedge 24: Single Source of Truth

Russ — Wed, 26 Feb 2020 18:00:16 +0000

Tim Schreyack recently presented at NANOG on the topic of building a single source of truth for network automation. Tim joins Tom and Russ in a wide-ranging discussion about single sources of truth, changing the way we see the network, and the changing skills of network engineers.

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-024.mp3

download