Lessons in Location and Identity through Remote Peering

We normally encounter four different kinds of addresses in an IP network; we tend to think about each of these as:

  • The MAC address identifies an interface on a physical or virtual wire
  • The IP address identifies an interface on a host
  • The DNS name identifies a host
  • The port number identifies an application or service running on the host

There are other address-like things, of course, such as the protocol number, a router ID, an MPLS label, etc. But let’s stick to these four for the moment. Looking through this list, the first thing you should notice is we often use the IP address as if it identified a host—which is generally not a good thing. There have been some efforts in the past to split the locator from the identifier, but the IP protocol suite was designed with a separate locator and identifier already: the IP address is the location and the DNS name is the identifier.

Even if you split up the locator and the identifier, however, the word locator is still quite ambiguous because we often equate the geographical and topological locations. In fact, old police procedural shows used to include scenes where a suspect was tracked down because they were using an IP address “assigned to them” in some other city… When the topic comes up this way, we can see the obvious flaw. In other situations, conflating the IP address with the location of the device is less obvious, and causes more subtle problems.

Consider, for instance, the concept of remote peering. Suppose you want to connect to a cloud provider who has a presence in an IXP that’s just a few hundred miles away. You calculate the costs of placing a router on the IX fabric, add it to the cost of bringing up a new circuit to the IX, and … well, there’s no way you are ever going to get that kind of budget approved. Looking around, though, you find there is a company that already has a router connected to the IX fabric you want to be on, and they offer a remote peering solution, which means they offer to build an Ethernet tunnel across the public Internet to your edge router. Once the tunnel is up, you can peer your local router to the cloud provider’s network using BGP. The cloud provider thinks you have a device physically connected to the local IX fabric, so all is well, right?

In a recent paper, a group of researchers looked at the combination of remote peering and anycast addresses. If you are not familiar with anycast addresses, the concept is simple: take a service which is replicated across multiple locations and advertise every instance of the service using a single IP address. This is clever because when you send packets to the IP address representing the service, you will always reach the closest instance of the service. So long as you have not played games with stretched Ethernet, that is.

In the paper, the researchers used various mechanisms to figure out where remote peering was taking place, and another to discover services being advertised using anycast (normally DNS or CDN services). Using the intersection of these two, they determined if remote peering was impacting the performance of any of these services. I shocked, shocked, to tell you the answer is yes. I would never have expected stretched Ethernet to have a negative impact on performance. 😊

To quote the paper directly:

…we found that 38% (126/332) of RTTs in traceroutes towards anycast pre￿xes potentially a￿ected by remote peering are larger than the average RTT of pre￿xes without remote peering. In these 126 traceroute probes, the average RTT towards pre￿xes potentially a￿ected by remote peering is 119.7 ms while the average RTT of the other pre￿xes is 84.7 ms.

The bottom line: “An average latency increase of 35.1 ms.” This is partially because the two different meanings of the word location come into play when you are interacting with services like CDNs and DNS. These services will always try to serve your requests from a physical location close to you. When you are using Ethernet stretched over IP, however, your topological location (where you connect to the network) and your geographical location (where you are physically located on the face of the Earth) can be radically different. Think about the mental dislocation when you call someone with an area code that is normally tied to an area of the west coast of the US, and yet you know they now live around London, say…

We could probably add in a bit of complexity to solve these problems, or (even better) just include your GPS coordinates in the IP header. After all, what’s the point of privacy? … 🙂 The bottom line is this: remote peering might a good idea when everything else fails, of course, but if you haven’t found the tradeoffs, you haven’t looked hard enough. It might be that application performance across a remote peering session is low enough that paying for the connection might turn out cheaper.

In the meantime, wake me up when we decide that stretching Ethernet over IP is never a good thing.

Research: Securing Linux with a Faster and Scalable IPtables

If you haven’t found the trade-offs, you haven’t looked hard enough.

A perfect illustration is the research paper under review, Securing Linux with a Faster and Scalable Iptables. Before diving into the paper, however, some background might be good. Consider the situation where you want to filter traffic being transmitted to and by a virtual workload of some kind, as shown below.

To move a packet from the user space into the kernel, the packet itself must be copied into some form of memory that processes on “both sides of the divide” can read, then the entire state of the process (memory, stack, program execution point, etc.) must be pushed into a local memory space (stack), and control transferred to the kernel. This all takes time and power, of course.

In the current implementation of packet filtering, netfilter performs the majority of filtering within the kernel, while iptables acts as a user frontend as well as performing some filtering actions in the user space. Packets being pushed from one interface to another must make the transition between the user space and the kernel twice. Interfaces like XDP aim to make the processing of packets faster by shortening the path from the virtual workload to the PHY chipset.

What if, instead of putting the functionality of iptables in the user space you could put it in the kernel space? This would make the process of switching packets through the device faster, because you would not need to pull packets out of the kernel into a user space process to perform filtering.

But there are trade-offs. According to the authors of this paper, there are three specific challenges that need to be addressed. First, users expect iptables filtering to take place in the user process. If a packet is transmitted between virtual workloads, the user expects any filtering to take place before the packet is pushed to the kernel to be carried across the bridge, and back out into user space to the second process, Second, a second process, contrack, checks the existence of a TCP connection, which iptables then uses to determine whether a packet that is set to drop because there no existing connection. This give iptables the ability to do stateful filtering. Third, classification of packets is very expensive; classifying packets could take too much processing power or memory to be done efficiently in the kernel.
To resolve these issues, the authors of this paper propose using an in-kernel virtual machine, or eBPF. They design an architecture which splits iptables into to pipelines, and ingress and egress, as shown in the illustration taken from the paper below.

As you can see, the result is… complex. Not only are there more components, with many more interaction surfaces, there is also the complexity of creating in-kernel virtual machines—remembering that virtual machines are designed to separate out processing and memory spaces to prevent cross-application data leakage and potential single points of failure.
That these problems are solvable is not in question—the authors describe how they solved each of the challenges they laid out. The question is: are the trade-offs worth it?

The bottom line: when you move filtering from the network to the host, you are not moving the problem from a place where it is less complex. You may make the network design itself less complex, and you may move filtering closer to the application, so some specific security problems are easier to solve, but the overall complexity of the system is going way up—particularly if you want a high performance solution.

IPv6 and Leaky Addresses

One of the recurring myths of IPv6 is its very large address space somehow confers a higher degree of security. The theory goes something like this: there is so much more of the IPv6 address space to test in order to find out what is connected to the network, it would take too long to scan the entire space looking for devices. The first problem with this myth is it simply is not true—it is quite possible to scan the entire IPv6 address space rather quickly, probing enough addresses to perform a tree-based search to find attached devices. The second problem is this assumes the only modes of attack available in IPv4 will directly carry across to IPv6. But every protocol has its own set of tradeoffs, and therefore its own set of attack surfaces.

Assume, for instance, you follow the “quick and easy” way of configuring IPv6 addresses on devices as they are deployed in your network. The usual process for building an IPv6 address for an interface is to take the prefix, learned from the advertisement of a locally attached router, and the MAC address of one of the locally attached interfaces, combining them into an IPv6 address (SLAAC). The size of the IPv6 address space proves very convenient here, as it allows the MAC address, which is presumably unique, to be used in building a (presumably unique) IPv6 address.

According to RFC7721, this process opens several new attack surfaces that did not exist in IPv4, primarily because the device has exposed more information about itself through the IPv6 address. First, the IPv6 address now contains at least some part of the OUI for the device. This OUI can be directly converted to a device manufacturer using web pages such as this one. In fact, in many situations you can determine where and when a device was manufactured, and often what class of device it is. This kind of information gives attackers an “inside track” on determining what kinds of attacks might be successful against the device.

Second, if the IPv6 address is calculated based on a local MAC address, the host bits of the IPv6 address of a host will remain the same regardless of where it is connected to the network. For instance, I may normally connect my laptop to a port in a desk in the Raleigh area. When I visit Sunnyvale, however, I will likely connect my laptop to a port in a conference room there. If I connect to the same web site from both locations, the site can infer I am using the same laptop from the host bits of the IPv6 address. Across time, an attacker can track my activities regardless of where I am physically located, allowing them to correlate my activities. Using the common lower bits, an attacker can also infer my location at any point in time.

Third, knowing what network adapters an organization is likely to use reduces the amount of raw address space that must be scanned to find active devices. If you know an organization uses Juniper routers, and you are trying to find all their routers in a data center or IX fabric, you don’t really need to scan the entire IPv6 address space. All you need to do is probe those addresses which would be formed using SLAAC with OUI’s formed from Juniper MAC addresses.

Beyond RFC7721, many devices also return their MAC address when responding to ICMPv6 probes in the time exceeded response. This directly exposes information about the host, so the attacker does not need to infer information from SLAAC-derived MAC addresses.

What can be done about these sorts of attacks?

The primary solution is to use semantically opaque identifiers when building IPv6 addresses using SLAAC—perhaps even using a cryptographic hash to create the base identifiers from which IPv6 addresses are created. The bottom line is, though, that you should examine the vendor documentation for each kind of system you deploy—especially infrastructure devices—as well as using packet capture tools to understand what kinds of information your IPv6 addresses may be leaking and how to prevent it.

 

The Hedge 12: Cyberinsecurity with Andrew Odlyzko

There is a rising tide of security breaches. There is an even faster rising tide of hysteria over the ostensible reason for these breaches, namely the deficient state of our information infrastructure. Yet the world is doing remarkably well overall, and has not suffered any of the oft-threatened giant digital catastrophes. Andrew Odlyzko joins Tom Ammon and I to talk about cyber insecurity.

download

The original paper referenced in this episode is here.

Simpler is Better… Right?

A few weeks ago, I was in the midst of a conversation about EVPNs, how they work, and the use cases for deploying them, when one of the participants exclaimed: “This is so complicated… why don’t we stick with the older way of doing things with multi-chassis link aggregation and virtual chassis device?” Sometimes it does seem like we create complex solutions when a simpler solution is already available. Since simpler is always better, why not just use them? After all, simpler solutions are easier to understand, which means they are easier to deploy and troubleshoot.

The problem is we too often forget the other side of the simplicity equation—complexity is required to solve hard problems and adapt to demanding environments. While complex systems can be fragile (primarily through ossification), simple solutions can flat out fail just because they can’t cope with changes in their environment.

As an example, consider MLAG. On the surface, MLAG is a useful technology. If you have a server that has two network interfaces but is running an application that only supports a single IP address as a service access point, MLAG is a neat and useful solution. Connect a single (presumably high reliability) server to two different upstream switches through two different network interface cards, which act like one logical Ethernet network. If one of the two network devices, optics, cables, etc., fails, the server still has network connectivity.

Neat.

But MLAG has well-known downsides, as well. There is a little problem with the physical locality of the cables and systems involved. If you have a service split among multiple services, MLAG is no longer useful. If the upstream switches are widely separated, then you have lots of cabling fun and various problems with jitter and delay to look forward to.

There is also the little problem of MLAG solutions being mostly proprietary. When something fails, your best hope is a clueful technical assistance engineer on the other end of a phone line. If you want to switch vendors for some reason, you have the fun of taking the entire server out of operation for a maintenance window to do the switch, along with the vendor lock-in in the case of failure, etc.

EVPN, with its ability to attach a single host through multiple virtual Ethernet connections across an IP network, is a lot more complex on the surface. But looks can be deceiving… In the case of EVPN, you see the complexity “upfront,” which means you can (largely) understand what it is doing and how it is doing it. There is no “MLAG black box;” useful in cases where you must troubleshoot.

And there will be cases where you will need to troubleshoot.

Further, because EVPN is standards-based and implemented by multiple vendors, you can switch over one virtual connection at a time; you can switch vendors without taking the server down. EVPN is a much more flexible solution overall, opening up possibilities around multihoming across different pods in a butterfly fabric, or allowing the use of default MAC addresses to reduce table sizes.
Virtual chassis systems can often solve some of the same problems as EVPN, but again—you are dealing with a black box. The black box will likely never scale to the same size, and cover the same use cases, as a community-built standard like EVPN.

The bottom line

Sometimes starting with a more complex set of base technologies will result in a simpler overall system. The right system will not avoid complexity, but rather reduce it where possible and contain it where it cannot be avoided. If you find your system is inflexible and difficult to manage, maybe its time to go back to the drawing board and start with something a little more complex, or where the complexity is “on the surface” rather than buried in an abstraction. The result might actually be simpler.

The Hedge 11: Roland Dobbins on Working Remotely

I failed to include the right categories the first time, so this didn’t make it into the podcast catcher feeds correctly…

Network engineering and operations are both “mental work” that can largely be done remotely—but working remote is not only great in many ways, it is also often fraught with problems. In this episode of the Hedge, Roland Dobbins joins Tom and Russ to discuss the ins and outs of working remote, including some strategies we have found effective at removing many of the negative aspects.

download