Securing BGP: A Case Study (3)

To recap (or rather, as they used to say in old television shows, “last time on ‘net Work…”), this series is looking at BGP security as an exercise (or case study) in understanding how to approach engineering problems. We started this series by asking three questions, the third of which was:

What is it we can actually prove in a packet switched network?

From there, in part 2 of this series, we looked at this question more deeply, asking three “sub questions” that are designed to help us tease out the answer this third question. Asking the right questions is a subtle, but crucial, part of learning how to deal with engineering problems of all sorts. Those questions can be summed up as:

  • Is the path through this peer going to pass through someone I don’t want it to pass through?
  • Is the path this peer is advertising a valid route to the destination?

Let’s quickly look at the first of these two to see why it’s not provable in the context of a packet switched network, using the network diagram below.

bgp-sec-02

When working with BGP at Internet scale, we tend to think of an autonomous system as one “thing”—we draw it that way on network diagrams, for instance, as I’ve been doing so far in this series. But the reality is far different. Autonomous systems are made up of those pesky little things called routers. In a packet switched network, it’s important to remember each router makes an independent forwarding decision. For instance, in this network, assume Router B is advertising some destination in AS65004, say 2001:db8:0:1::/64, to Router A with an AS Path of [65004,65002]. When Router A sends traffic to a host within that the :1::/64, then, it can assume the traffic will follow a path from AS65002 directly to AS65004—there won’t be any intermediate hops.

The problem is: this assumption is wrong. There are a number of reasons Router C might forward traffic to :1::/64 to Router D, and hence through AS65003, rather than to Router F. For instance, Router C might be a route reflector running add paths, which means Router B has multiple routes to the destination, but it’s only advertising one of the available paths to Router A. Or perhaps the :1::/64 route is actually an aggregate of two longer prefixes, and the destination Router A is forwarding traffic to has a longer prefix match through Router E. Or perhaps Router C just has a static route configured forwarding traffic along a different path than the AS is advertising.

Whatever the reason, packet switched networks just don’t work this way. The first option—that traffic forwarded based on a specific advertisement will follow the AS Path in that advertisement, is false. What of the second? It all depends on what you mean by the word “valid.” There are actually (as is often the case) two different questions embedded within this question:

  • Is there a physical path between the peer advertising the route and the reachable destination?
  • Does every AS along the path between the advertising the route and the reachable destination agree to forward traffic towards the advertised destination?

The first question could be proven by proving if every AS along the AS Path claims to have a physical connection. The second, however, is trickier. To see why, let’s switching things around a little. Assume AS65004 is advertising 2001:db8:0:1::/64 towards AS65003, but not towards AS65002. Assume, as well, that AS65003 is a customer of AS65004 and AS65002—in other words, AS65003 should not be transiting traffic to any destination. How could AS65000 know this?

First, AS65002 could filter at Router D, for instance, based on some prior knowledge, or some sort of information provided by AS65004.

Second, AS65004 could somehow signal AS65002 that AS65003 shouldn’t be transiting traffic (either at all, or for this one destination).

We’ll explore the concept of signaling later in this series, when we start thinking about what sorts of solutions might be acceptable for the problem set we’re trying to solve. For now, it’s important to consider is these two points:

  • All the signaling in the world from AS65004 isn’t going to help if AS65002 doesn’t pay attention to the signal.
  • If AS65004 is unwilling to tell AS65002 what its policy towards AS65003 is, there’s no way for anyone to enforce it.

In other words, you can’t enforce what you don’t know, and enforcement is based on a prior trust arrangement of some sort. These two crucial points should be listed in the set of requirements we’re building before we start considering solutions.

In my next post in this series, I want to back up to the original three questions we discovered and start thinking through what sorts of requirements we can decipher from them.

Cultivate questions

Imagine that you’re sitting in a room interviewing a potential candidate for a position on your team. It’s not too hard to imagine, right, because it happens all the time. You know the next question I’m going to ask: what questions will you ask this candidate? I know a lot of people who have “set questions” they use to evaluate a candidate, such as “what is the OSPF type four for,” or “why do some states in the BGP peering session not have corresponding packets?” Since I’ve worked on certifications in the past (like the CCDE), I understand the value of these sorts of questions. They pinpoint the set and scope of the candidate’s knowledge, and they’re easy to grade. But is easy to grade what we should really be after?

Let me expand the scope a little: isn’t this the way we see our own careers? The engineer with the most bits of knowledge stuffed away when they die wins? I probably need to make a sign that says that, actually, just to highlight the humor of such a thought.

The problem is it simply isn’t a good way to measure an engineer, including the engineer reading this post (you). For one thing, as Ethan so eloquently pointed out this week—

The future of IT is not compatible with a network that waits for a human to make a change in accordance with a complex process that takes weeks. And thus it is that the future of networking becomes important. Yes, we grumpy old network engineers know how to build networks in a reliable, predictable way. But that presumes a reliable, predictable demand from business that just isn’t so in many cases.

The question becomes: how do we cultivate this culture among network engineers? It’s nice enough to say, but what do I do? I’m going to make a simple suggestion. Perhaps, in fact, it’s too simple. But it’s worth a try.

Instead of cultivating knowledge, cultivate questions.

Let’s take my current series on security BGP as an example. In part two of the series, from last week, I pointed out that it’s a long slog through the world of security for BGP. You have to ask a lot of questions, beginning with one that doesn’t even seem to make sense: what can I actually secure? Cultivating question asking is important because it helps us to actually feel our way around the problem at hand, understanding it better, and finding new ways to solve it.

Okay, so given we want to encourage engineers to ask more questions—that networks must change, now—and the path to changing networks is changing engineers, what do we do?

First, we need to rethink our certifications around cultivating questions. I think we did a pretty good job with the CCDE here, but the concept of asking if the candidate understands the right question to ask at any given phase of the process is an important skill to measure. I haven’t taken a CCIE lab since 1997, but I remember my proctor asking me if I knew what I was looking for at various times—he was trying to make certain I knew what questions to ask.

Second, we need to start thinking in models, rather than in technologies. I’ve written a lot about this; there’s an entire chapter on models in The Art of Network Architecture, and more on models in Navigating Network Complexity, but we really need to start thinking about why rather than how more often. Why do you think I talk about this stuff so often? It’s not because I don’t know the inner guts of IS-IS (I have an upcoming video series on this being published by Cisco Press), but because I think the ability to turn models and networks into questions is more important than knowing the guts of any particular protocol.

Third, we need to follow Ethan’s lead and start thinking about a broader set of skills and technology.

Finally, maybe—just maybe—we need to start setting up interviews so we can find out if the candidate knows the right questions, rather than focusing on the esoteric game, and whether or not they know all the right answers.

Securing BGP: A Case Study (2)

In part 1 of this series, I pointed out that there are three interesting questions we can ask about BGP security. The third question I outlined there was this: What is it we can actually prove in a packet switched network? This is the first question I want dive in too—this is a deep dive, so be prepared for a long series. 🙂 This question feels like it is actually asking three different things, what we might call “subquestions,” or perhaps “supporting points.” These three questions are:

  • If I send a packet to the peer I received this update from, will it actually reach the advertised destination?
  • If I send this information to this destination, will it actually reach the intended recipient?
  • If I send a packet to the peer I received this update from, will it pass through an adversary who is redirecting the traffic so they can observe it?

These are the things I can try to prove, or would like to know, in a packet switched network. Note that I want to intentionally focus on the data plane and then transfer these questions to the control plane (BGP). This is the crucial point to remember: If I start with the technical or engineering problem, I’m going to end up asking, and answering, the wrong questions.

This is typically what happens in engineering. For instance, in the world of BGP, the traditional path is to ask, “how can I secure the way BGP operates?” Another example might be, “this application needs these two servers connected via layer 2,” and then we deep dive into every potential way of providing this layer 2 connectivity, tying ourselves into knots with DCI, overlays, complex control planes, and all the rest. We never back off and say, “is this really the right question to ask?” But there is always more than one way to ask the question, and it’s important to try and find the question that draws our your thinking outside the protocol.

Creative questioning is at least half of solving any problem.

Let’s process these three questions so we can take them out of the data plane and into the control plane. The first question, in BGP terms, seems to be asking something like: Is the path this peer is advertising a valid route to the destination? What do we mean by “valid?” We mean a path that will take this traffic to the destination I’m trying to reach.

The second question, in BGP terms, seems to be asking something like: How can I be certain the destination address hasn’t been hijacked, so the peer is advertising a route to a destination that isn’t the one I’m trying to reach (even though it has the same address)? This relates directly to the origin authentication problem in BGP; can I know that the actual owner of the route is the final destination of this route?

The third question, in BGP terms, seems to be asking something like: Is the path through this peer going to pass through someone I don’t want it to pass through? This third one is actually impossible to prove in real terms. We can go some way towards ensuring traffic doesn’t go through a “man in the middle,” but there’s no way, in a packet switched network, to actually be certain of this.

In my next post on this series, I want to continue looking at this line of thinking, making certain we really understand what we can prove in a packet switched network.

This post is the second series on what I consider to be a current and difficult design problem at Internet scale that involves just about every piece of the networking puzzle you can get in to—BGP security. This is designed to be a sort of case study around approaching design problems, not just at the protocol level, but at an engineering level. I will probably intersperse this series with other posts over the coming months.

Securing BGP: A Case Study (1)

What would it take to secure BGP? Let’s begin where any engineering problem should begin: what problem are we trying to solve?

A small collection of autonomous systems

In this network—in any collection of BGP autonomous systems—there are three sorts of problems that can occur at the AS level. For the purposes of this explanation, assume AS65000 is advertising 2001:db8:0:1::/64. While I’ve covered this ground before, it’s still useful to outline them:

  1. AS65001 could advertise 2001:db8:0:1::/64 as if it is locally attached. This is considered a false origination, or a hijacked route.
  2. AS65001 could advertise a route to 2001:db8:0:1::/64 with the AS path [65000,65001] to AS65003. This is another form of route hijacking, but instead of a direct hijack it’s a “one behind” attack. AS65001 doesn’t pretend to own the route in question, but rather to be connected to the AS that is originating the route.
  3. AS65000 could consider AS65003 a customer, or rather AS65003 might be purchasing Internet connectivity from AS65000. This would mean that any routes AS65000 advertises to AS65003 are not intended to be retransmitted back to AS65004. If, for instance, 2001:db8:0:1::/64, is advertised by AS65000 to AS65003, and AS65003 readvertises it to AS65004, AS65003 would be an unintentional transit AS in the path. This could either be intentional or a mistake, of course, but either way this is an incorrect traffic pattern that can be at the root of many problems. This is considered a route leak, and is fully described in this Internet draft.

There are a number of other possibilities, but these three will be enough to deal with for thinking through the problem and solution sets. Given these are the problems, it’s in the engineering mindset to jump directly to a solution. But before we do, let’s start with at a set of questions. For instance:

  1. Should we focus on a centralized solution to this problem, or a distributed one? Then there are the in-between solutions that create a single database that’s synchronized among all the participating autonomous systems.
  2. Should we consider solutions that are carried within the control plane, within BGP itself, or outside? In other words, should every eBGP speaker in the system participate, or should there be some smaller set of devices participating?
  3. What is it we can actually prove in a packet switched network? This might seem like an odd question, but we are in a position where we are trying to manage traffic flows through the control plane—for instance, we are trying to prevent traffic between AS65004 and AS65000 from flowing through AS65003 in the route leak case. What, specifically, can we prove in such a case?

We’ll consider these questions, starting with the last one first, in the next post.

This post kicks off a series on what I consider to be a current and difficult design problem at Internet scale that involves just about every piece of the networking puzzle you can get in to—BGP security. This is designed to be a sort of case study around approaching design problems, not just at the protocol level, but at an engineering level. I will probably intersperse this series with other posts over the coming months.[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]

Fear itself: Thinking through change and turmoil

Small animal looking out from a hole in a treeFair warning: this is going to be a controversial post, and it might be considered a bit “off topic.”

Maybe it’s just the time of year for fear. Or maybe it’s several conversations I’ve been involved in recently. Or maybe it’s the result of following over 150 blogs on a daily basis covering everything from religion to politics to technology to philosophy. Whatever it is, there’s one thing I’ve noticed recently.

We’re really afraid.

I don’t mean “concerned about what the future might hold,” but rather — it seems, at least sometimes — sinking into a state of fear bordering on the irrational. Sometimes it feels like the entire world is one long troubleshooting session in the worst designed network I’ve ever encountered. Let me turn to a few completely different areas to illustrate my point. Some of these are going to make people mad, so hold on to your hats — and hear me out before you jump all over me or shut down.

We’re afraid of what the future might hold for us as engineers and as people. Maybe this entire software defined thing is going to destroy my entire career. Maybe I’ll end up like a buggy whip maker a few years after the first car was built. Maybe the entire world is going to sink under the oceans as they rise due to man made global warming. Maybe we’re all going to be replaced by robots, leaving none of us anything to do for a living at all. Maybe we’re all going to eat GMO foods and die. Maybe I don’t have the right certifications, or maybe I have too many certifications. Maybe cell phones are going to give us all cancer.

Or maybe, just maybe, we’ve come too close to perfecting fear as the ideal motivator for selling just about everything from things to training to politics. Maybe the noise level has gotten so high that we won’t listen until it’s a existential crisis right now. Maybe we’re rushing from crisis to crisis like a boat out in a huge storm trying to stay above water and forgetting to ask where it is we’re going — which port we actually should call home.

Maybe it’s time to reassess, to find some strategy that will help us cope with all this information and all this fear. Some thoughts to that end.

First, ask what claim is actually being made. This might be painful, but learn logical syllogisms, and make it a habit to turn enthymemes into a proper syllogistic form so you can actually evaluate the claim. We’re too fast to accept straw men, too quick to dismiss with a casual wave of the hand, an appropriate bit of snark, and a quick dose of name calling. We’re too slow to listen and spend time really trying to understand. We’ve sown a world of 140 character snippets, and we’re reaping a whirlwind of thoughtlessness.

Second, ask what supports the claim. I don’t mean who supports the claim or why they support it. Stop asking about feelings and motives. Start asking about facts.

Third, ask why you might have any reason to doubt the claim. Intentionally fight against your confirmation bias and seek out the most credible sources you can find that disagree with the claim. Read them carefully, intentionally, and as honestly as you can.

Okay, you’ve done all of this, and you believe the claim is correct. Now is the time to jump to action, right? Wrong. In fact, the hard work has just begun.

First, ask what it is you can actually do about it. Second, find the tradeoffs, including who pays and how.

The climate of fear we live in particularly shuts down our ability to think about tradeoffs. When we’re afraid, we move to “there is no tradeoff,” “we need to do something about this,” and “anyone who disagrees is a moral monster” far too quickly. Engineers should know from long experience with real world systems there are always tradeoffs. If you’ve not found them, then you’re not looking — and if you’re not looking, then you’re not really engaged in thinking.

Let me try to take a personal example here. “What happens if my job ends tomorrow, because the technology I know goes away?” Well, you could run around like a turkey the day before Thanksgiving. I don’t how useful that’s going to be, but it’s certainly entertaining, and, in some ways actually satisfying.

Or you could process the question, ask if it’s true (it probably is on some level all the time), think about what you can do about it, and then focus on finding the tradeoffs so you can make a rational set of decisions about what actions to take in response. Maybe you should make it a practice to learn new skills on a regular basis? “But what if I bet wrong, and learn the wrong skills?” How is that better than not betting at all? Learning is, itself, a skill that takes regular practice.

We need to use the same process across the board. Before we casually cast aside anyone’s rights (or responsibilities) in the name of creating a “safer world,” before we radically alter our entire way of life to solve the fifteenth world crisis that has a celebrity “do something now” video attached, before we all collapse in despair at the collapse of our world and our careers, we need to make certain we ask the questions — what does this really mean, what are the facts supporting it, why should I doubt it, what can I really do about it, and what are the tradeoffs?

I don’t want to get into a long, drawn out, political discussion. That’s not what this blog is about. I’m not trying to make a political point, but rather a thinking point. Fear makes us treat one another like objects when we really need to listen to one another as people. We really need to learn to get past the fear our world seems to be drowning in. There are things we should rationally be afraid of. But there is also a sense in which fear removes our capacity to react rationally, and hence makes our nightmares into reality.

Why aren’t you teaching?

There is an old saw about teaching and teachers: “Those who can, do. Those who can’t, teach.” This seems to be a widely believed thought in the engineering world (though perhaps less in the network engineering world than many other parts of engineering) — but is it true? In fact, to go farther, does this type of thinking actually discourage individual engineers teaching, or training, in a more formal way in the networking world? Let me give you my experience.

What I’ve discovered across the years is something slightly different: if you can’t explain it to someone else in a way they can understand it, then you don’t really know it. There are few ways to put this into practice in the real world better than intentionally taking on the task of teaching others what you know. In fact, I’ve probably learned much more in the process of preparing to teach than I ever have in “just doing.” There is something about spending the time in thinking through how to explain something in a number of different ways that encourages understanding. To put it in other terms, teaching makes you really think about how something works.

Don’t get me wrong here — engineers shouldn’t lose their focus on doing. But we need to learn to blend doing with understanding in a way that we’ve not done well with up until now. We’ve often been so focused on the what that we forget about the why.

Given that one excellent way to develop the thinking skills, to exercise our why skills as well as our what skills, is to tech, why aren’t you teaching?

Is it that you don’t think you have the skills to teach? Is it that you don’t think you have the opportunity? Is it that you don’t think you have the knowledge?

All of these are excuses, rather than real reasons. You can always take the time to put together a basic course in networking for the people in your company. In fact, maybe the reason they don’t really understand your job is because you never explain the technology you work on. You can always take the time to teach your peers, or even the junior engineers on your team, or another team. There are local high schools that could use your time in the classroom teaching networking technology. Where else are new network engineers coming from, after all?

I’m also not saying you shouldn’t rely on professional education — after all, I still want you to buy my books. 🙂 But there’s something about building and giving a class that teaches things you just can’t learn many other places.

So — let me ask again — why aren’t you teaching?

Innovation and the Internet

Industries mature, of course. That they do so shouldn’t be surprising to anyone who’s watched the world for very long. The question is — do they mature in a way that places a few players at the “top,” leaving the rest to innovate along the edges? Or do they leave broad swaths of open space in which many players can compete and innovate? Through most of human history, the answer has been the first: industries, in the modern age, tend to ossify into a form where a few small players control most of the market, leaving the smaller players to innovate along the edges. When the major impetus in building a new company is to “get bought,” and the most common way for larger companies to innovate is by buying smaller companies (or doing “spin ins”), then you’ve reached a general point of stability that isn’t likely to change much.

Is the networking industry entering this “innovation free zone?” Or will the networking industry always be a market with more churn, and more innovation? There are signs in both directions.

For instance, there’s the idea that once technology reaches a certain level of capability, there’s just no reason for any further forward motion. Fifty years ago, if you would have asked people what airplanes could do, and what they would look like, you have have gotten some wild feedback. Today, ask the same question, and you’ll likely get the same wild ideas. Things haven’t changed much in air travel (other than reductions in the amount of space in the cattle cars, it seems) because we’ve reached the point where new advances don’t bring much in the way of new benefits.

Another instance: there is a growing group of “old” companies with a lot of money, and they’re turning that money into political power. The one sure way to ensure stagnation is to get the government involved. A case in point here is LTE-U, which bids fair to turn the last mile upside down. It seems a number of large companies are using their lobbying mojo to make certain older carriers aren’t allowed to use unlicensed space. A lot of top flight engineers don’t seem to agree on the overall impact of allowing AT&T, for instance, to expand their wireless network on WiFi frequencies; much of the argument at the moment seems to come down to the political, rather than the engineering aspects of the problem. When lobbying takes over engineering, it’s a sure sign the industry is moving into an ossified state. Robotics are the new and exciting thing now; the Internet seems like a “given.”

On the other hand, routing is more interesting right now than it has been in a long time. Software Defined and cloud are taking over the world, it seems (though a few of us do try to inject a bit of sanity into the news stream every now and then). Over the top services, like SD-WAN, seem to be creating new value in spaces long thought completely ossified. In a somewhat virtual world (hardware still counts, but the intelligence tends to move into the overlay), there isn’t any apparent point at which you can say, “we’re done with this, let’s move to the next thing.”

It seems, to me, that we’re on a bit of a cusp, a turning point. Which way the industry goes depends, in some part, on the way the larger players go. Will they continue to turn to the government, using political muscle to solidify revenue streams? Or will they turn back to real innovation?

Let’s not lose sight of the role each of us, as individual network engineers, play in the path from this point forward — the choice between the safe vendor bet, and innovating even on a small scale, played out over the thousands of networks in the world, can make a huge difference. We tend to divide the world into small networks with boring problems and large networks with interesting problems. This is a false dichotomy — interesting problems are interesting problems, no matter what the network size. Interested people make for interesting solutions, and in turn, interesting innovation.

We need to realize that no matter how small it seems, we’re at a point where the small decisions, en mass, will make a big difference. What decisions will you make today?