Middleboxes and the End-to-End Principle

The IP suite was always loosely grounded in the end-to-end principle, defined here (a version of this paper is also apparently available here), is quoted in RFC2775 as:

The function in question can completely and correctly be implemented only with the knowledge and help of the application standing at the endpoints of the communication system. Therefore, providing that questioned function as a feature of the communication system itself is not possible. … This principle has important consequences if we require applications to survive partial network failures. An end-to-end protocol design should not rely on the maintenance of state (i.e. information about the state of the end-to-end communication) inside the network.

How are the Internet and (by extension) IP networks in general doing in regards to the end-to-end principle? Perhaps the first notice in IETF drafts is RFC2101, which argues the IPv4 address was originally a locater and an identifier, and that the locater usage has become the primary usage. This is much of the argument around LISP and many other areas of work—but I think 2101 mistates the case a bit. That the original point of an IP address is to locate a topological location in the network is easily argued from two concepts: aggregation and interface level identification. Host can have many IP addresses (one per interface). Further, IP has always had the idea of a “subnet address,” which means a single address space that describes more than one host; this single subnet address has always been carried as a “single thing” in routing. There have always been addresses that describe something smaller than a single host, and there have always been addresses that describe several hosts, so the idea of a single address describing a single host has been built in to IP from the very beginning.

RFC2101 notes that the use of an IP address for both identifier and locater has meant that applications began relying on the IP address to remain constant early on. This resulted in major problems later on, as documented in various drafts around Network Address Translators (NATs). For instance—

RFC3234 relies on the state required to keep NAT translations in a local table for much of its analysis—although other kinds of middle boxes that do not use NAT to function are considered, as well. Most of the middle boxes considered fail miserably in terms of supporting transparent network connectivity because of the local application specific state they keep. A new, interesting draft has been published, however, that pushes back just a little on the IETFs constant publication of documents discussing the many problems of middelboxes. Beneficial Functions of Middleboxes, a draft written by several engineers at Orange, discusses the many benefits of middle boxes; for instance—

  • Middleboxes can contribute to the measurement of packet loss, round trip times, packet reordering, and throughput bottlenecks
  • Middleboxes can aid in the detection and dispersion of Distributed Denial of Service (DDoS) attacks
  • Middleboxes can act as performance enhancing proxies and caches

So which is right? Should we vote with the long line of arguments against middle boxes made throughout the years in the IETF, or should we consider the benefits of middleboxes, and take a view more along the lines of “yes, they add complexity, but the functionality added is sometimes worth it?” Before answering the question, I’d like to point out something I find rather interesting.

The end-to-end rule only works in one direction.

This is never explicitly called out, but the end-to-end rule has always been phrased in terms of the higher layer “looking down” on the lower layer. The lower layer is never allowed to change anything about the packet in the end-to-end model, including addresses, port numbers, fragmentation—even, in some documents, the Quality of Service (QoS) markings. The upper layer, then, is supposed to be completely opaque to the lower layers. On the other hand, there is never any mention made of the reverse. Specifically, the question is never asked: what should the upper layers know about the functioning of the lower layers? Should the upper layer really be using a lower layer address as an identifier?

What is primarily envisioned is a one-way abstraction; the upper layers can see everything in the lower layers, but the lower layers are supposed to treat the upper layers as a opaque blob. But how does this work, precisely? I suspect there is a corollary to the Law of Leaky Abstractions that must say something like this: any abstraction designed to leak in one direction will always, as a matter of course, leak in both directions.

I suspect we are in the bit of mess we are in with regards to NATs and other middle boxes because we really think we can make a one-way mirror, a one way abstraction, actually work. I cannot think of any reason why such a one-way abstraction should work.

To return to the question: I think the answer must be that we must learn to live with middleboxes. We can learn to put some rules around middleboxes that might make them easier to deal with—perhaps something like:

  • Middleboxes would be a lot easier if we stopped trying to stretch layer 2 networks to the entire globe, and the moon beyond
  • Middleboxes are a fact of life that applications developers need to work with; stop using locators as identifiers in future protocol work
  • Network monitoring and management needs to stop relying on IP addresses to uniquely identify devices

At any rate, whatever you think on this topic, you now have the main contours of the debate in the IETF in hand, and direct links to the relevant documents.