Is it really the best just because its the most common?

I cannot count the number of times I’ve heard someone ask these two questions—

  • What are other people doing?
  • What is the best common practice?

While these questions have always bothered me, I could never really put my finger on why. I ran across a journal article recently that helped me understand a bit better. The root of the problem is this—what does best common mean, and how can following the best common produce a set of actions you can be confident will solve your problem?

Bellman and Oorschot say best common practice can mean this is widely implemented. The thinking seems to run something like this: the crowd’s collective wisdom will probably be better than my thinking… more sets of eyes will make for wiser or better decisions. Anyone who has studied the madness of crowds will immediately recognize the folly of this kind of state. Just because a lot of people agree it’s a good idea to jump off a cliff does not mean it is, in fact, a good idea to jump off a cliff.

Perhaps it means something closer to this is no worse than our competitors. If that’s the meaning, though, it’s a pretty cynical result. It’s saying, “I don’t mind condemning myself to mediocrity so long as I see everyone else doing the same thing.” It doesn’t sound like much of a way to grow a business.

The authors do provide their definition—

For a given desired outcome, a “best practice” is a means intended to achieve that outcome, and that is considered to be at least as “good” as the best of other broadly considered means to achieve that same outcome.

The thinking seems to run something like this—it’s likely that everyone else has tried many different ways of doing this; that they have all settled on doing this, this way, means all those other methods are probably not as good as this one for some reason.

Does this work? There’s no way to tell without further investigation. How many of the other folks doing “this” spent serious time trying alternatives, and how many just decided the cheapest way was the best no matter how poor the result might be? In fact, how can we know what the results of doing things “this way” have in all those other networks? Where would we find this kind of information?

 

In the end, I can’t ever make much sense out of the question, “what is everyone else doing?” Discovering what everyone else is doing might help me eliminate possibilities (that didn’t work for them, so I certainly don’t want to try it), or it might help me understand the positive and negative attributes of a given solution. Still, I don’t understand why “common” should infer “best.”

The best solution for this situation is simply going to be the best solution. Feel free to draw on many sources, but don’t let other people determine what you should be doing.

The Hedge 84: David Brown and the Root of Trust

Many engineers just assume that secure hardware boot is, in fact, secure. How does this security work, and just how secure is it, though? David Brown joins Tom Ammon, Eyvonne Sharp, and Russ White on this episode of the Hedge to discuss the secure boot loader in some detail. For more information on the secure boot loader and IoT, see David’s presentation at the Open Source Summit.

download

The Effectiveness of AS Path Prepending (2)

Last week I began discussing why AS Path Prepend doesn’t always affect traffic the way we think it will. Two other observations from the research paper I’m working off of are:

  • Adding two prepends will move more traffic than adding a single prepend
  • It’s not possible to move traffic incrementally by prepending; when it works, prepending will end up moving most of the traffic from one inbound path to another

A slightly more complex network will help explain these two observations.

Assume AS65000 would like to control the inbound path for 100::/64. I’ve added a link between AS65001 and 65002 here, but we will still find prepending a single AS to the path won’t make much difference in the path used to reach 100::/64. Why?

Because most providers will have a local policy configured—using local preference—that causes them to choose any available customer connection over other paths. AS65001, on receiving the route to 100::/64 from AS65000, will set the local preference so it will prefer this route over any other route, including the one learned from AS65002. So while the cause is a little different in this case than the situation covered in the first post, the result is the same.

We can, of course, prepend twice onto the AS Path rather than once. What impact would that have here? It still won’t impact the traffic originating in 65005 because AS65001 is the only path available towards 100::64 from their perspective. Prepending cannot change anything if there’s only one path.

However, if most of the traffic destined to 100::/64 coming from AS65006, 7, and 8 rather than from AS65005, prepending two times will allow AS65000 to shift the traffic from the path through AS65002 to the path through AS65001. This example might seem a little contrived. Still, it’s pretty similar to networks that have one connection to some local provider (a cable company or something similar) and one connection to a more prominent national or international provider. Any time you are connected to two different providers who have different ranges of connectivity, prepending two autonomous systems on the AS Path will probably be able to shift traffic from one inbound link to another.

What about prepending more than two hops to the AS Path? Each additional prepend going to shift smaller amounts of traffic. It makes sense that increasing the number of prepends doesn’t shift much more because the further away you get from the edge of the Internet, the more fully connected the autonomous systems are, and the more likely you are to run into some other policy that will override the AS Path in determining the best path. The average length of the AS Path in the Internet is around four; prepending more than this normally won’t have much of an effect on traffic flow

The second question above can also be answered by looking at this network. Why can’t you shift traffic incrementally by prepending onto the AS Path? Because the connectivity close to the edge is probably not meshy enough. You can’t shift over just the traffic from one AS or another; you can only shift traffic from the entire set of autonomous systems behind your upstream from one inbound link to another. You can adjust traffic on a per-prefix basis, however, which can be useful for balancing between two inbound links.

What can you do to control inbound traffic with more certainty? Take a look at this older post for thoughts on using communities and de-aggregation to steer traffic.

The Hedge 83: Bruce Davie and the System View

Network engineers tend to look at the world through the lens of a single device—an individual appliance, sold by a vendor, with a well-developed CLI for configuration and maintenance. Networks, however, are the “odd person out” in the world of information technology. In the broader technology world, a stronger systems-oriented view is more common. In this episode of the Hedge, Bruce Davie joins Tom Ammon and Russ White to discuss a systems view of the world, as well as a new publishing model he’s working on, and some thoughts on the place of SDN.

download

You can find Bruce’s book, Computer Networks: A Systems Approach, here.

The Effectiveness of AS Path Prepending (1)

Just about everyone prepends AS’ to shift inbound traffic from one provider to another—but does this really work? First, a short review on prepending, and then a look at some recent research in this area.

What is prepending meant to do?

Looking at this network diagram, the idea is for AS6500 (each router is in its own AS) to steer traffic through AS65001, rather than AS65002, for 100::/64. The most common method to trying to accomplish this is AS65000 can prepend its own AS number on the AS Path Multiple times. Increasing the length of the AS Path will, in theory, cause a route to be less preferred.

In this case, suppose AS65000 prepends its own AS number on the AS Path once before advertising the route towards AS65001, and not towards AS65002. Assuming there is no link between AS65001 and AS65002, what would we expect to happen? What we would expect is AS65001 will receive one route towards 100::/64 with an AS Path of 2 and use this route. AS65002 will, likewise, receive one route towards 100::/64 with an AS Path of 1 and use this route.

AS65003, however, will receive two routes towards 100::/64, one with an AS Path of 3 through AS65001, and one with an AS Path of 2 through AS65002. All other things being equal (local preference, etc.), AS65003 will prefer the route with the shorter AS Path through AS65002, and select that path to reach 100::/64. AS65004 will only receive one path towards 100::/64, the one through AS65002, because AS65003 will only advertise its best path to AS65004.

The obvious question—how much good does this really do? The only impact on the best path is two hops away, as AS65003, and beyond. The route chosen by AS65001 and AS65002 will not be affected by the prepending.

A recent paper found—

We observe that the effectiveness of prepending can strongly depend on the location (for around 20% of cases, ASPP has moved no targets, while for another 20% , it moved almost all targets).

You might expect As Path prepending to have a much more consistent effect on inbound traffic. Why doesn’t it?

What might not be obvious (the danger of simplified diagrams): if autonomous systems directly attached to AS65001 originate most of the traffic destined to 100::/64, no amount of prepending is going to make any difference in the inbound traffic flow. Assume AS5001 has a connection to some cloud service, AS65002 does not have a connection to the same cloud service, and 100::64 is a local server that communicates with this cloud service on a regular basis. Since AS65001 is the only AS transiting traffic from the cloud service to the server located on the 100::/64 subnet, and AS65001 only has one route to 100::/64, you are not going to be able to shift traffic off that single path no matter how many times you prepend.

The first rule of prepending is location matters. You have to know where the traffic you want to shift is originating, and whether or not it can be shifted.

In my next post on this topic, I’ll continue exploring AS path prepending more in light of the results of the research paper above.

The Hedge 82: Jared Smith and Route Poisoning

Intentionally poisoning BGP routes in the Default-Free Zone (DFZ) would always be a bad thing, right? Actually, this is a fairly common method to steer traffic flows away from and through specific autonomous systems. How does this work, how common is it, and who does this? Jared Smith joins us on this episode of the Hedge to discuss the technique, and his research into how frequently it is used.

download

Ambiguity and complexity: once more into the breach

Recent research into the text of RFCs versus the security of the protocols described came to this conclusion—

While not conclusive, this suggests that there may be some correlation between the level of ambiguity in RFCs and subsequent implementation security flaws.

This should come as no surprise to network engineers—after all, complexity is the enemy of security. Beyond the novel ways the authors use to understand the shape of the world of RFCs (you should really read the paper; it’s really interesting), this desire to increase security by decreasing the ambiguity of specifications is fascinating. We often think that writing better specifications requires having better requirements, but down this path only lies despair.

Better requirements are the one thing a network engineer can never really hope for.

It’s not just that networks are often used as a sort of “complexity sink,” the place where every hard problem goes to be solved. It’s also the uncertainty of the environment in which the network must operate. What new application will be stuffed on top of the network this week? Will anyone tell the network folks about this new application, or just open a ticket when it doesn’t work right? What about all the changes developers are making to applications right now, and their impact on the network? There are link failures, software failures, hardware failures, and the mean time between mistakes. There is the pace of innovation (which I tend to think is a bit overblown—rule11, after all—we are often talking about new products rather than new ideas).

What the network is supposed to do—just provide IP transport between two devices—turns out to be hard. It’s hard because “just transporting packets” isn’t ever enough. These packets must be delivered consistently (jitter and drops) across an ever-changing landscape.

To this end—

[C]omplexity is most succinctly discussed in terms of functionality and its robustness. Specifically, we argue that complexity in highly organized systems arises primarily from design strategies intended to create robustness to uncertainty in their environments and component parts.

Uncertainty is the key word here. What can we do about all of this?

We can reduce uncertainty. There are three ways to reduce uncertainty. First, you can obfuscate it—this is harmful. Second, you can reduce the scope of the job at hand, throwing some of the uncertainty (and therefore complexity) over the cubicle way. This can be useful in some situations, but remember that the less work you’re doing, the less value you add. Beware of self-commodifying.

Finally, you can manage the uncertainty. This generally means using modularization intelligently to partition off problems into smaller sets. It’s easier to solve a set of well-scope problems with little uncertainty than to solve one big problem with unknowable uncertainty.

This might all sound great in theory, but how do we do this in real life? Where does the rubber hit the road? This is what Ethan and I tried to show in Problems and Solutions—how to understand the problems that need to be solved, and then how to solve each of those problems within a larger system. This is also what many parts of The Art of Network Architecture are about, and then again what Jeff and I wrote about in Navigating Network Complexity.

I know it often seems like it’s not worth learning the theory; it’s so much easier to focus on the day-to-day, the configuration of this device, or the shiny thing that vendor just created. It’s easier to assume that if I can just hide all the complexity behind intent or automation, I can get my weekends back.

The truth is that we’re paid to solve hard problems, and solving hard problems involves complexity. We can either try to cover that up, or we can learn to manage it.