TECH – Page 13 – rule 11 reader

Securing BGP: A Case Study (3)

To recap (or rather, as they used to say in old television shows, “last time on ‘net Work…”), this series is looking at BGP security as an exercise (or case study) in understanding how to approach engineering problems. We started this series by asking three questions, the third of which was:

What is it we can actually prove in a packet switched network?

From there, in part 2 of this series, we looked at this question more deeply, asking three “sub questions” that are designed to help us tease out the answer this third question. Asking the right questions is a subtle, but crucial, part of learning how to deal with engineering problems of all sorts. Those questions can be summed up as:

Is the path through this peer going to pass through someone I don’t want it to pass through?
Is the path this peer is advertising a valid route to the destination?

Let’s quickly look at the first of these two to see why it’s not provable in the context of a packet switched network, using the network diagram below.

bgp-sec-02

When working with BGP at Internet scale, we tend to think of an autonomous system as one “thing”—we draw it that way on network diagrams, for instance, as I’ve been doing so far in this series. But the reality is far different. Autonomous systems are made up of those pesky little things called routers. In a packet switched network, it’s important to remember each router makes an independent forwarding decision. For instance, in this network, assume Router B is advertising some destination in AS65004, say 2001:db8:0:1::/64, to Router A with an AS Path of [65004,65002]. When Router A sends traffic to a host within that the :1::/64, then, it can assume the traffic will follow a path from AS65002 directly to AS65004—there won’t be any intermediate hops.

The problem is: this assumption is wrong. There are a number of reasons Router C might forward traffic to :1::/64 to Router D, and hence through AS65003, rather than to Router F. For instance, Router C might be a route reflector running add paths, which means Router B has multiple routes to the destination, but it’s only advertising one of the available paths to Router A. Or perhaps the :1::/64 route is actually an aggregate of two longer prefixes, and the destination Router A is forwarding traffic to has a longer prefix match through Router E. Or perhaps Router C just has a static route configured forwarding traffic along a different path than the AS is advertising.

Whatever the reason, packet switched networks just don’t work this way. The first option—that traffic forwarded based on a specific advertisement will follow the AS Path in that advertisement, is false. What of the second? It all depends on what you mean by the word “valid.” There are actually (as is often the case) two different questions embedded within this question:

Is there a physical path between the peer advertising the route and the reachable destination?
Does every AS along the path between the advertising the route and the reachable destination agree to forward traffic towards the advertised destination?

The first question could be proven by proving if every AS along the AS Path claims to have a physical connection. The second, however, is trickier. To see why, let’s switching things around a little. Assume AS65004 is advertising 2001:db8:0:1::/64 towards AS65003, but not towards AS65002. Assume, as well, that AS65003 is a customer of AS65004 and AS65002—in other words, AS65003 should not be transiting traffic to any destination. How could AS65000 know this?

First, AS65002 could filter at Router D, for instance, based on some prior knowledge, or some sort of information provided by AS65004.

Second, AS65004 could somehow signal AS65002 that AS65003 shouldn’t be transiting traffic (either at all, or for this one destination).

We’ll explore the concept of signaling later in this series, when we start thinking about what sorts of solutions might be acceptable for the problem set we’re trying to solve. For now, it’s important to consider is these two points:

All the signaling in the world from AS65004 isn’t going to help if AS65002 doesn’t pay attention to the signal.
If AS65004 is unwilling to tell AS65002 what its policy towards AS65003 is, there’s no way for anyone to enforce it.

In other words, you can’t enforce what you don’t know, and enforcement is based on a prior trust arrangement of some sort. These two crucial points should be listed in the set of requirements we’re building before we start considering solutions.

In my next post in this series, I want to back up to the original three questions we discovered and start thinking through what sorts of requirements we can decipher from them.

Posted in TECH, WRITTEN

Securing BGP: A Case Study (2)

In part 1 of this series, I pointed out that there are three interesting questions we can ask about BGP security. The third question I outlined there was this: What is it we can actually prove in a packet switched network? This is the first question I want dive in too—this is a deep dive, so be prepared for a long series. 🙂 This question feels like it is actually asking three different things, what we might call “subquestions,” or perhaps “supporting points.” These three questions are:

If I send a packet to the peer I received this update from, will it actually reach the advertised destination?
If I send this information to this destination, will it actually reach the intended recipient?
If I send a packet to the peer I received this update from, will it pass through an adversary who is redirecting the traffic so they can observe it?

These are the things I can try to prove, or would like to know, in a packet switched network. Note that I want to intentionally focus on the data plane and then transfer these questions to the control plane (BGP). This is the crucial point to remember: If I start with the technical or engineering problem, I’m going to end up asking, and answering, the wrong questions.

This is typically what happens in engineering. For instance, in the world of BGP, the traditional path is to ask, “how can I secure the way BGP operates?” Another example might be, “this application needs these two servers connected via layer 2,” and then we deep dive into every potential way of providing this layer 2 connectivity, tying ourselves into knots with DCI, overlays, complex control planes, and all the rest. We never back off and say, “is this really the right question to ask?” But there is always more than one way to ask the question, and it’s important to try and find the question that draws our your thinking outside the protocol.

Creative questioning is at least half of solving any problem.

Let’s process these three questions so we can take them out of the data plane and into the control plane. The first question, in BGP terms, seems to be asking something like: Is the path this peer is advertising a valid route to the destination? What do we mean by “valid?” We mean a path that will take this traffic to the destination I’m trying to reach.

The second question, in BGP terms, seems to be asking something like: How can I be certain the destination address hasn’t been hijacked, so the peer is advertising a route to a destination that isn’t the one I’m trying to reach (even though it has the same address)? This relates directly to the origin authentication problem in BGP; can I know that the actual owner of the route is the final destination of this route?

The third question, in BGP terms, seems to be asking something like: Is the path through this peer going to pass through someone I don’t want it to pass through? This third one is actually impossible to prove in real terms. We can go some way towards ensuring traffic doesn’t go through a “man in the middle,” but there’s no way, in a packet switched network, to actually be certain of this.

In my next post on this series, I want to continue looking at this line of thinking, making certain we really understand what we can prove in a packet switched network.

This post is the second series on what I consider to be a current and difficult design problem at Internet scale that involves just about every piece of the networking puzzle you can get in to—BGP security. This is designed to be a sort of case study around approaching design problems, not just at the protocol level, but at an engineering level. I will probably intersperse this series with other posts over the coming months.

Posted in TECH, WRITTEN

Securing BGP: A Case Study (1)

What would it take to secure BGP? Let’s begin where any engineering problem should begin: what problem are we trying to solve?

In this network—in any collection of BGP autonomous systems—there are three sorts of problems that can occur at the AS level. For the purposes of this explanation, assume AS65000 is advertising 2001:db8:0:1::/64. While I’ve covered this ground before, it’s still useful to outline them:

AS65001 could advertise 2001:db8:0:1::/64 as if it is locally attached. This is considered a false origination, or a hijacked route.
AS65001 could advertise a route to 2001:db8:0:1::/64 with the AS path [65000,65001] to AS65003. This is another form of route hijacking, but instead of a direct hijack it’s a “one behind” attack. AS65001 doesn’t pretend to own the route in question, but rather to be connected to the AS that is originating the route.
AS65000 could consider AS65003 a customer, or rather AS65003 might be purchasing Internet connectivity from AS65000. This would mean that any routes AS65000 advertises to AS65003 are not intended to be retransmitted back to AS65004. If, for instance, 2001:db8:0:1::/64, is advertised by AS65000 to AS65003, and AS65003 readvertises it to AS65004, AS65003 would be an unintentional transit AS in the path. This could either be intentional or a mistake, of course, but either way this is an incorrect traffic pattern that can be at the root of many problems. This is considered a route leak, and is fully described in this Internet draft.

There are a number of other possibilities, but these three will be enough to deal with for thinking through the problem and solution sets. Given these are the problems, it’s in the engineering mindset to jump directly to a solution. But before we do, let’s start with at a set of questions. For instance:

Should we focus on a centralized solution to this problem, or a distributed one? Then there are the in-between solutions that create a single database that’s synchronized among all the participating autonomous systems.
Should we consider solutions that are carried within the control plane, within BGP itself, or outside? In other words, should every eBGP speaker in the system participate, or should there be some smaller set of devices participating?
What is it we can actually prove in a packet switched network? This might seem like an odd question, but we are in a position where we are trying to manage traffic flows through the control plane—for instance, we are trying to prevent traffic between AS65004 and AS65000 from flowing through AS65003 in the route leak case. What, specifically, can we prove in such a case?

We’ll consider these questions, starting with the last one first, in the next post.

This post kicks off a series on what I consider to be a current and difficult design problem at Internet scale that involves just about every piece of the networking puzzle you can get in to—BGP security. This is designed to be a sort of case study around approaching design problems, not just at the protocol level, but at an engineering level. I will probably intersperse this series with other posts over the coming months.[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]

Posted in TECH, WRITTEN

Engineering Lessons, IPv6 Edition

Yes, we really are going to reach a point where the RIRs will run out of IPv4 addresses. As this chart from Geoff’s blog shows —

Why am I thinking about this? Because I ran across a really good article by Geoff Huston over at potaroo about the state of the IPv4 address pool at APNIC. The article is a must read, so stop right here, right click on this link, open it in a new tab, read it, and then come back. I promise this blog isn’t going anyplace while you’re over on Geoff’s site. But my point isn’t to ring the alarm bells on the IPv4 situation. Rather, I’m more interested in how we got here in the first place. Specifically, why has it taken so long for the networking industry to adopt IPv6?

Inertia is a tempting answer, but I’m not certain I buy this as the sole reason for lack of deployment. IPv6 was developed some fifteen years ago; since then we’ve deployed tons of new protocols, tons of new networking gear, and lots of other things. Remember what a cell phone looked like fifteen years ago? In fact, if we’d have started fifteen years ago with simple dual mode devices, we could easily be fully deployed in IPv6 today. As it is, we’re really just starting now.

We didn’t see a need? Perhaps, but that’s difficult to maintain, as well. When IPv6 was originally developed (remember — fifteen years ago), we all knew there was an addressing problem. I suspect there’s another reason.

I suspect that IPv6, in it’s original form tried to boil the ocean, and the result might have been too much change too fast for the networking community to handle in such a fundamental area of the stack. What engineering lessons might we draw from the long times scales around IPv6 deployment?

For those who weren’t in the industry those many years ago, there were several drivers behind IPv6 beyond just the need for more address space. For instance, the entire world exploded with “no more NATs.” In fact, many engineers, to this day, still dislike NATs, and see IPv6 as a “solution” to the NAT “problem.” Mailing lists roiled with long discussions about NAT, security by obscurity (still waiting for someone who strongly believes that obscurity is useless to step onto a modern battlefield with a state of the art armor system painted bright orange), and a thousand other topics. You see, ARP really isn’t all that efficient, so let’s do something a little different and create an entirely new neighbor discovery system. And then there’s that whole fragmentation issue we’ve been dealing with for IPv4 for all these years. And…

Part of the reason it’s taken so long to deploy IPv6, I think, is because it’s not just about expanding the address space. IPv6, for various reasons, has tried to address every potential failing ever found in IPv4.

Don’t miss my point here. The design and engineering decisions made for IPv6 are generally solid. But all of us — and I include myself here — tend to focus too much on building that practically perfect protocol, rather than building something that was “good enough,” along with stretchy spots where obvious change can be made in the future.

In this specific case, we might have passed over one specific question too easily — how easy will this be to deploy in the real world? I’m not saying there weren’t discussions around this very topic, but the general answer was, “we have fifteen years to deploy this stuff.” And, yet… Here we are fifteen years later, and we’re still trying to convince people to deploy it. Maybe a bit of honest reflection might be useful just about now.

I’m not saying we shouldn’t deploy IPv6. Rather, I’m saying we should try and take a lesson from this — a lesson in engineering process. We needed, and need, IPv6. We probably didn’t need the NAT wars. We needed, and need, IPv6. But we probably didn’t need the wars over fragmentation.

What we, as engineers, tend to do is to build solutions that are complete, total, self contained, and practically perfect. What we, as engineers, should do is build platforms that are flexible, usable, and can support a lot of different needs. Being a perfectionists isn’t just something you say during the interview to that one dumb question about your greatest weakness. Sometimes you — we, really — do need to learn to stop what we’re doing, take a look around, and ask — why are we doing this?

Posted in TECH, WRITTEN