Reaction: BGP convergence, divergence & the ‘net

Let’s have a little talk about BGP convergence.

We tend to make a number of assumptions about the Internet, and sometimes these assumptions don’t always stand up to critical analysis. . . . On the Internet anyone can communicate with anyone else – right? -via APNIC

Geoff Huston’s recent article on the reality of Internet connectivity—no, everyone cannot connect to everyone—prompted a range of reactions from various folks I know.

For instance, BGP is broken! After all, any routing protocol that can’t provide basic reachability to every attached destination must be broken, right? The problem with this statement is it assumes BGP is, at core, a routing protocol. To set the record straight, BGP is not, at heart, a routing protocol in the traditional sense of the term. BGP is a system used to describe bilateral peering arrangements between independent parties in a way that provides loop free reachability information. The primary focus of BGP is not loop free reachability, but policy.

After all, BGP convergence is a big deal, right? Part of the problem here is that we use BGP as a routing protocol in some situations (for instance, on data center fabrics), so we have a hard time adjusting our thinking to the original peering policy based focus it was designed for. In the larger ‘net, it’s not a bug that some destinations are unreachable from some sources. It’s an expression of policy, and hence it’s a feature. There are certainly times when such policies are unintentional, but unintentional/unplanned policy is policy just the same as intentional/planned policy is.

We shouldn’t declare BGP broken for doing something it’s supposed to do.

There’s another point here, as well: Some networks never converge. And that’s okay. This is, perhaps, even harder for network engineers to get their heads around. I’ve spent twenty years making sure networks converge quickly, as loop free as possible, with as little chance for failure as possible, and using the least number of resources possible. But every network in the world doesn’t always have to converge to a single view of the topology and reachability. Really!

The problem here is the micro and macro views of the world. The ‘net doesn’t converge for two reason.

First, there’s that pesky policy problem again. Policy, in the real world, never converges. There are always contradictory policies, and policies will often form bistable states. This is maddening, of course, to the mind of an engineer, but it’s just reality intruding on our little bubble. Bubbles are, after all, meant to be burst.

Second, there’s that whole CAP theorem thing in there someplace. Not many people understand the application of CAP to routing, so I’m stuffing a post or two on this on my todo list, but just remember: you can choose to a Consistent database, a database that is Accessible by every reader/user all the time, or a database that can be Partitioned. If you think about it, routing protocols are readable by every network device all the time, and they are partitioned among all the routers/intermediate systems in the network. Which means… They aren’t going to be consistent.

As in, if you feed a routing protocol enough changes often enough, it won’t ever converge—because it’s eventual consistency will always be catching up with reality. This is just the way the world is built—piling all the SDN unicorn magic in the world into routing isn’t going to solve this one, folks. On a network the size of the Internet, someone, somewhere, is always going to be changing something. This cripples BGP convergence; the ‘net never converges.

In the history of ideas, perhaps BGP shouldn’t float to the top as one of the most brilliant (and Tony and Yaakov would probably even agree with you)—but it has, on the other hand, been one of the most successful. It’s just tight enough to work often enough to rely on the connectivity as described, and it’s just loose enough to allow policies to be injected where they need to be. No such system is ever going to be “perfect.”

We could beat our heads against a wall trying, of course, but even virtual reality has physical limitations.

DoS’ing your mind: Controlling information inflow

Everyone wants your attention. No, seriously, they do. We’ve gone from a world where there were lots of readers and not much content, to a world where there is lots of content, and not many readers. There’s the latest game over here, the latest way to “get 20,000 readers,” over there, the way to “retire by the time you’re 32” over yonder, and “how to cure every known disease with this simple group of weird fruit from someplace you’ve never heard of (but you’ll certainly go find, and revel in the pictures of perfectly healthy inhabitants now),” naggling someplace at the back of your mind.

The insidious, distracting suck of the Internet has become seemingly inescapable. Calling us from our pockets, lurking behind work documents, it’s merely a click away. Studies have shown that each day we spend, on average, five and a half hours on digital media, and glance at our phones 221 times. -via connecting

Living this way isn’t healthy. It reduces your attention span, which in turn destroys your ability to get anything done, as well as destroying your mind. So we need to stop. “Squirrel” is funny, but you crash planes. “Shiny thing” is funny, but you end up mired in jellyfish (and, contrary to the cartoons, you don’t survive that one, either).

So how do we stop? Three hints.

Slack off. Tom, over at Networking Nerd, points out the importance of asynchronous communication (once again) in an excellent post about Slack. Sometimes you need to just sit and talk to someone. But sometimes you can’t. As Tom says:

Think back to the last time you responded to an email. How often did you start your response with “Sorry for the delay” or some version of that phrase? In today’s society, we’ve become accustomed to instant responses to things.

This means shutting down the IM, shutting down Slack for a bit, shutting down email, turning off your ringer, and just actually getting things done. This is also part of my theory of macro tasking. Controlling information inflow is as much about moving to asynchronous communications as it is anything else. And asynchronous means “I don’t have to answer right now,” as well as “I don’t expect them to answer right now, because I have other stuff I can work on while I wait.”

Yes, this is hard, which leads to the second suggestion.

Precommit. Set aside some amount of time that you’ll do nothing but work on a single project, or do a specific thing. You can’t set aside time for everything, and sometimes specific goals are more important than specific timeframes. For instance, I have two personal goals every day in the way of managing information. I write at least 2,000 words a day, and I read at least 75 pages a day. These aren’t static goals, of course; I’ve ramped them up over time, and I don’t meet them every day. I have larger goals, as well—for instance, I try to read 100 books a year (this is a new goal; last year it was 70 or so, but this year I’m trying to ramp up).

Odysseus had his men tie him to the mast of their ship until they were out of the sirens’ range. This is an example of “precommitment,” a self-control strategy that involves imposing a condition on some aspect of your behavior in advance. For example, an MIT study showed that paid proofreaders made fewer errors and turned in their work earlier when they chose to space out their deadlines (e.g., complete one assignment per week for a month), compared to when they had the same amount of time to work, but had only one deadline at the end of a month. -via connecting

Self discipline, as Connecting points out, isn’t so much about resisting temptation as it is avoiding it.

Vary your sources. This one might not seem so obvious, but you really do need to do more than rely on Google for your entire view of the world.

Think of the WorldWideWeb increasingly as the public and open façade of the Web, and Google’s Inner-net as Google’s private, and more closed regime of mostly-dominant, Google-controlled operating systems, platforms, apps, protocols, and APIs that collectively govern how most of the Web operates, largely out of public view. -via Somewhat Reasonable

You build a box for your brain by only consuming movies, or fiction books, or technical books, or Google searches, or a particular game. As I’ve said elsewhere:

You can’t really think outside the box in the real world. You can, however, get a bigger box in which to do your thinking.

Do these actually work? Yes. Do they work all the time, and immediately? No. They take time. Which is why you also must learn to be patient, to give yourself some slack, and to build up your virtue.

But if you don’t start someplace, it’s certainly never going to work. Ultimately, you are responsible for what enters your brain; controlling information inflow is just part of your job as a human being. Or, as I always tell my daughters: garbage in, garbage out. If you don’t learn to control the garbage in part, no-one else is going to be able to help you control the garbage out part.

Securing BGP: A Case Study (4)

In part 1 of this series, I looked at the general problem of securing BGP, and ended by asking three questions. In part 2 and part 3, I considered the third question: what can we actually prove in a packet switched network. For this section, I want to return to the first question:

Should we focus on a centralized solution to this problem, or a distributed one?

There are, as you might expect, actually two different problems within this problem:

  • Assuming we’re using some sort of encryption to secure the information used in path validation, where do the keys come from? Should each AS build its own private/public key pairs, have anyone they want to validate the keys, and then advertise them? Or should there be some central authority that countersigns keys, such as the Regional Internet Registries (RIRs) so everyone has a single trust root?
  • Should the information used to validate paths be distributed or stored in a somewhat centralized database? At the extreme ends of this answer are two possibilities: every eBGP speaker individually maintains a database of path validation information, just like they maintain reachability information; or there are a few servers (like the root DNS servers) which act as a source of truth, and which are synchronized somehow to each AS.

Let’s discuss the first problem first. Another way to phrase this question is: should we use a single root to provide trust for the cryptographic keys, or should we use something like a web of trust? The diagram below might be helpful in understanding these two options.

bgp-sec-03a

Assume, for a moment, that AS 65000 has just received some information that purports to be from AS65002. How can AS65000 validate this information? Assuming AS65002 has signed this information with its private key, AS65000 needs to find AS6500’2 public key and validate the signature. There are a number of ways AS65000 could find AS65002’s key—a URI or a distributed database of some type would do nicely—but how does AS65000 actually know that the public key it finds belongs to the real AS65002?

This is a matter finding a third party that will say/validate/attest that this public key actually belongs to AS65002. Normally this is done by the third party signing the key with their private key, which can then be validated using their private key—but then you run into an infinite regress. You can’t go on checking everyone’s public key for all of eternity; you have to stop and trust someone someplace to tell you the truth about these keys. Where do you stop? The answer to this question is the key difference between a single set of trusted roots and a web of trust.

In the single set of roots (left side of the diagram), AS65000 would stop at either RIR1 or RIR2 because they are a single set of entities everyone participating in the system should trust. In the web of trust (right side of the diagram), you stop looking when you find someone you trust who trusts the person you’re receiving information from. If AS65000 trusts AS65001 to tell the truth about AS65002, then AS65001’s signature on AS65002’s certificate should be enough for AS65000 to trust the public key it retrieved for AS65002. Multiple steps through the web of trust are possible, as well. Assume AS65000 trusts AS65003, and AS65003 trusts AS65004 to tell the truth about AS65002. AS65000 might decide that because AS65003 trusts AS65004, it can trust AS65004, as well, and accept the public key for AS65002. This is a form of transitive trust, where you trust someone because you trust someone who trusts them.

There are, of course, advantages and disadvantages to each system.

In the first system, trust is rooted in an entity that can be held financially responsible for telling the truth. This often means, however, that they will charge to use their service; this additional cost can be an effective barrier to entry as the services provided by the root entity become more complex, or as the value of the system being anchored in this entity increase. The centralized repository of trust is also a risk—the certificate used by the single trusted entity is a high value target. Finally, the single trust entity is a political risk for those who operate within the system. A single contract mistake, or a single problem of any other sort, causes the value of anyone operating a network (in this case) to immediately drop to nothing. While the single root provides an easy way to prevent abuse within the system, it’s also a single place from which to abuse those within the system.

In the second, trust is not rooted in any single entity, but rather in what might be called community consensus. This type of system is more resilient to attack, but it can also be abused through shell entities and bad actors. In operating a web of trust, you can quickly run into byzantine failures which are difficult to resolve. Beyond this, the amount of processing required to validate a certificate will depend on the number of links in the transitive trust chain you have to cross before you find someone you trust to tell the truth.

Which one is better? It really comes down to business objectives rather than technical superiority. Which one better fits the kind of system being secured? Web of trust models tend to work better in smaller communities with personal relationships. Rooted trust models tend to provide more security in diverse environments where the participants are competitors.

I can see your eyes glazing over at this point, so I’ll leave the answer to the second question ’til next time.

Securing BGP: A Case Study (3)

To recap (or rather, as they used to say in old television shows, “last time on ‘net Work…”), this series is looking at BGP security as an exercise (or case study) in understanding how to approach engineering problems. We started this series by asking three questions, the third of which was:

What is it we can actually prove in a packet switched network?

From there, in part 2 of this series, we looked at this question more deeply, asking three “sub questions” that are designed to help us tease out the answer this third question. Asking the right questions is a subtle, but crucial, part of learning how to deal with engineering problems of all sorts. Those questions can be summed up as:

  • Is the path through this peer going to pass through someone I don’t want it to pass through?
  • Is the path this peer is advertising a valid route to the destination?

Let’s quickly look at the first of these two to see why it’s not provable in the context of a packet switched network, using the network diagram below.

bgp-sec-02

When working with BGP at Internet scale, we tend to think of an autonomous system as one “thing”—we draw it that way on network diagrams, for instance, as I’ve been doing so far in this series. But the reality is far different. Autonomous systems are made up of those pesky little things called routers. In a packet switched network, it’s important to remember each router makes an independent forwarding decision. For instance, in this network, assume Router B is advertising some destination in AS65004, say 2001:db8:0:1::/64, to Router A with an AS Path of [65004,65002]. When Router A sends traffic to a host within that the :1::/64, then, it can assume the traffic will follow a path from AS65002 directly to AS65004—there won’t be any intermediate hops.

The problem is: this assumption is wrong. There are a number of reasons Router C might forward traffic to :1::/64 to Router D, and hence through AS65003, rather than to Router F. For instance, Router C might be a route reflector running add paths, which means Router B has multiple routes to the destination, but it’s only advertising one of the available paths to Router A. Or perhaps the :1::/64 route is actually an aggregate of two longer prefixes, and the destination Router A is forwarding traffic to has a longer prefix match through Router E. Or perhaps Router C just has a static route configured forwarding traffic along a different path than the AS is advertising.

Whatever the reason, packet switched networks just don’t work this way. The first option—that traffic forwarded based on a specific advertisement will follow the AS Path in that advertisement, is false. What of the second? It all depends on what you mean by the word “valid.” There are actually (as is often the case) two different questions embedded within this question:

  • Is there a physical path between the peer advertising the route and the reachable destination?
  • Does every AS along the path between the advertising the route and the reachable destination agree to forward traffic towards the advertised destination?

The first question could be proven by proving if every AS along the AS Path claims to have a physical connection. The second, however, is trickier. To see why, let’s switching things around a little. Assume AS65004 is advertising 2001:db8:0:1::/64 towards AS65003, but not towards AS65002. Assume, as well, that AS65003 is a customer of AS65004 and AS65002—in other words, AS65003 should not be transiting traffic to any destination. How could AS65000 know this?

First, AS65002 could filter at Router D, for instance, based on some prior knowledge, or some sort of information provided by AS65004.

Second, AS65004 could somehow signal AS65002 that AS65003 shouldn’t be transiting traffic (either at all, or for this one destination).

We’ll explore the concept of signaling later in this series, when we start thinking about what sorts of solutions might be acceptable for the problem set we’re trying to solve. For now, it’s important to consider is these two points:

  • All the signaling in the world from AS65004 isn’t going to help if AS65002 doesn’t pay attention to the signal.
  • If AS65004 is unwilling to tell AS65002 what its policy towards AS65003 is, there’s no way for anyone to enforce it.

In other words, you can’t enforce what you don’t know, and enforcement is based on a prior trust arrangement of some sort. These two crucial points should be listed in the set of requirements we’re building before we start considering solutions.

In my next post in this series, I want to back up to the original three questions we discovered and start thinking through what sorts of requirements we can decipher from them.

Cultivate questions

Imagine that you’re sitting in a room interviewing a potential candidate for a position on your team. It’s not too hard to imagine, right, because it happens all the time. You know the next question I’m going to ask: what questions will you ask this candidate? I know a lot of people who have “set questions” they use to evaluate a candidate, such as “what is the OSPF type four for,” or “why do some states in the BGP peering session not have corresponding packets?” Since I’ve worked on certifications in the past (like the CCDE), I understand the value of these sorts of questions. They pinpoint the set and scope of the candidate’s knowledge, and they’re easy to grade. But is easy to grade what we should really be after?

Let me expand the scope a little: isn’t this the way we see our own careers? The engineer with the most bits of knowledge stuffed away when they die wins? I probably need to make a sign that says that, actually, just to highlight the humor of such a thought.

The problem is it simply isn’t a good way to measure an engineer, including the engineer reading this post (you). For one thing, as Ethan so eloquently pointed out this week—

The future of IT is not compatible with a network that waits for a human to make a change in accordance with a complex process that takes weeks. And thus it is that the future of networking becomes important. Yes, we grumpy old network engineers know how to build networks in a reliable, predictable way. But that presumes a reliable, predictable demand from business that just isn’t so in many cases.

The question becomes: how do we cultivate this culture among network engineers? It’s nice enough to say, but what do I do? I’m going to make a simple suggestion. Perhaps, in fact, it’s too simple. But it’s worth a try.

Instead of cultivating knowledge, cultivate questions.

Let’s take my current series on security BGP as an example. In part two of the series, from last week, I pointed out that it’s a long slog through the world of security for BGP. You have to ask a lot of questions, beginning with one that doesn’t even seem to make sense: what can I actually secure? Cultivating question asking is important because it helps us to actually feel our way around the problem at hand, understanding it better, and finding new ways to solve it.

Okay, so given we want to encourage engineers to ask more questions—that networks must change, now—and the path to changing networks is changing engineers, what do we do?

First, we need to rethink our certifications around cultivating questions. I think we did a pretty good job with the CCDE here, but the concept of asking if the candidate understands the right question to ask at any given phase of the process is an important skill to measure. I haven’t taken a CCIE lab since 1997, but I remember my proctor asking me if I knew what I was looking for at various times—he was trying to make certain I knew what questions to ask.

Second, we need to start thinking in models, rather than in technologies. I’ve written a lot about this; there’s an entire chapter on models in The Art of Network Architecture, and more on models in Navigating Network Complexity, but we really need to start thinking about why rather than how more often. Why do you think I talk about this stuff so often? It’s not because I don’t know the inner guts of IS-IS (I have an upcoming video series on this being published by Cisco Press), but because I think the ability to turn models and networks into questions is more important than knowing the guts of any particular protocol.

Third, we need to follow Ethan’s lead and start thinking about a broader set of skills and technology.

Finally, maybe—just maybe—we need to start setting up interviews so we can find out if the candidate knows the right questions, rather than focusing on the esoteric game, and whether or not they know all the right answers.

Securing BGP: A Case Study (2)

In part 1 of this series, I pointed out that there are three interesting questions we can ask about BGP security. The third question I outlined there was this: What is it we can actually prove in a packet switched network? This is the first question I want dive in too—this is a deep dive, so be prepared for a long series. 🙂 This question feels like it is actually asking three different things, what we might call “subquestions,” or perhaps “supporting points.” These three questions are:

  • If I send a packet to the peer I received this update from, will it actually reach the advertised destination?
  • If I send this information to this destination, will it actually reach the intended recipient?
  • If I send a packet to the peer I received this update from, will it pass through an adversary who is redirecting the traffic so they can observe it?

These are the things I can try to prove, or would like to know, in a packet switched network. Note that I want to intentionally focus on the data plane and then transfer these questions to the control plane (BGP). This is the crucial point to remember: If I start with the technical or engineering problem, I’m going to end up asking, and answering, the wrong questions.

This is typically what happens in engineering. For instance, in the world of BGP, the traditional path is to ask, “how can I secure the way BGP operates?” Another example might be, “this application needs these two servers connected via layer 2,” and then we deep dive into every potential way of providing this layer 2 connectivity, tying ourselves into knots with DCI, overlays, complex control planes, and all the rest. We never back off and say, “is this really the right question to ask?” But there is always more than one way to ask the question, and it’s important to try and find the question that draws our your thinking outside the protocol.

Creative questioning is at least half of solving any problem.

Let’s process these three questions so we can take them out of the data plane and into the control plane. The first question, in BGP terms, seems to be asking something like: Is the path this peer is advertising a valid route to the destination? What do we mean by “valid?” We mean a path that will take this traffic to the destination I’m trying to reach.

The second question, in BGP terms, seems to be asking something like: How can I be certain the destination address hasn’t been hijacked, so the peer is advertising a route to a destination that isn’t the one I’m trying to reach (even though it has the same address)? This relates directly to the origin authentication problem in BGP; can I know that the actual owner of the route is the final destination of this route?

The third question, in BGP terms, seems to be asking something like: Is the path through this peer going to pass through someone I don’t want it to pass through? This third one is actually impossible to prove in real terms. We can go some way towards ensuring traffic doesn’t go through a “man in the middle,” but there’s no way, in a packet switched network, to actually be certain of this.

In my next post on this series, I want to continue looking at this line of thinking, making certain we really understand what we can prove in a packet switched network.

This post is the second series on what I consider to be a current and difficult design problem at Internet scale that involves just about every piece of the networking puzzle you can get in to—BGP security. This is designed to be a sort of case study around approaching design problems, not just at the protocol level, but at an engineering level. I will probably intersperse this series with other posts over the coming months.

Securing BGP: A Case Study (1)

What would it take to secure BGP? Let’s begin where any engineering problem should begin: what problem are we trying to solve?

A small collection of autonomous systems

In this network—in any collection of BGP autonomous systems—there are three sorts of problems that can occur at the AS level. For the purposes of this explanation, assume AS65000 is advertising 2001:db8:0:1::/64. While I’ve covered this ground before, it’s still useful to outline them:

  1. AS65001 could advertise 2001:db8:0:1::/64 as if it is locally attached. This is considered a false origination, or a hijacked route.
  2. AS65001 could advertise a route to 2001:db8:0:1::/64 with the AS path [65000,65001] to AS65003. This is another form of route hijacking, but instead of a direct hijack it’s a “one behind” attack. AS65001 doesn’t pretend to own the route in question, but rather to be connected to the AS that is originating the route.
  3. AS65000 could consider AS65003 a customer, or rather AS65003 might be purchasing Internet connectivity from AS65000. This would mean that any routes AS65000 advertises to AS65003 are not intended to be retransmitted back to AS65004. If, for instance, 2001:db8:0:1::/64, is advertised by AS65000 to AS65003, and AS65003 readvertises it to AS65004, AS65003 would be an unintentional transit AS in the path. This could either be intentional or a mistake, of course, but either way this is an incorrect traffic pattern that can be at the root of many problems. This is considered a route leak, and is fully described in this Internet draft.

There are a number of other possibilities, but these three will be enough to deal with for thinking through the problem and solution sets. Given these are the problems, it’s in the engineering mindset to jump directly to a solution. But before we do, let’s start with at a set of questions. For instance:

  1. Should we focus on a centralized solution to this problem, or a distributed one? Then there are the in-between solutions that create a single database that’s synchronized among all the participating autonomous systems.
  2. Should we consider solutions that are carried within the control plane, within BGP itself, or outside? In other words, should every eBGP speaker in the system participate, or should there be some smaller set of devices participating?
  3. What is it we can actually prove in a packet switched network? This might seem like an odd question, but we are in a position where we are trying to manage traffic flows through the control plane—for instance, we are trying to prevent traffic between AS65004 and AS65000 from flowing through AS65003 in the route leak case. What, specifically, can we prove in such a case?

We’ll consider these questions, starting with the last one first, in the next post.

This post kicks off a series on what I consider to be a current and difficult design problem at Internet scale that involves just about every piece of the networking puzzle you can get in to—BGP security. This is designed to be a sort of case study around approaching design problems, not just at the protocol level, but at an engineering level. I will probably intersperse this series with other posts over the coming months.[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]