We normally encounter four different kinds of addresses in an IP network; we tend to think about each of these as:
- The MAC address identifies an interface on a physical or virtual wire
- The IP address identifies an interface on a host
- The DNS name identifies a host
- The port number identifies an application or service running on the host
There are other address-like things, of course, such as the protocol number, a router ID, an MPLS label, etc. But let’s stick to these four for the moment. Looking through this list, the first thing you should notice is we often use the IP address as if it identified a host—which is generally not a good thing. There have been some efforts in the past to split the locator from the identifier, but the IP protocol suite was designed with a separate locator and identifier already: the IP address is the location and the DNS name is the identifier.
Even if you split up the locator and the identifier, however, the word locator is still quite ambiguous because we often equate the geographical and topological locations. In fact, old police procedural shows used to include scenes where a suspect was tracked down because they were using an IP address “assigned to them” in some other city… When the topic comes up this way, we can see the obvious flaw. In other situations, conflating the IP address with the location of the device is less obvious, and causes more subtle problems.
Consider, for instance, the concept of remote peering. Suppose you want to connect to a cloud provider who has a presence in an IXP that’s just a few hundred miles away. You calculate the costs of placing a router on the IX fabric, add it to the cost of bringing up a new circuit to the IX, and … well, there’s no way you are ever going to get that kind of budget approved. Looking around, though, you find there is a company that already has a router connected to the IX fabric you want to be on, and they offer a remote peering solution, which means they offer to build an Ethernet tunnel across the public Internet to your edge router. Once the tunnel is up, you can peer your local router to the cloud provider’s network using BGP. The cloud provider thinks you have a device physically connected to the local IX fabric, so all is well, right?
In a recent paper, a group of researchers looked at the combination of remote peering and anycast addresses. If you are not familiar with anycast addresses, the concept is simple: take a service which is replicated across multiple locations and advertise every instance of the service using a single IP address. This is clever because when you send packets to the IP address representing the service, you will always reach the closest instance of the service. So long as you have not played games with stretched Ethernet, that is.
In the paper, the researchers used various mechanisms to figure out where remote peering was taking place, and another to discover services being advertised using anycast (normally DNS or CDN services). Using the intersection of these two, they determined if remote peering was impacting the performance of any of these services. I shocked, shocked, to tell you the answer is yes. I would never have expected stretched Ethernet to have a negative impact on performance. 😊
To quote the paper directly:
…we found that 38% (126/332) of RTTs in traceroutes towards anycast prexes potentially aected by remote peering are larger than the average RTT of prexes without remote peering. In these 126 traceroute probes, the average RTT towards prexes potentially aected by remote peering is 119.7 ms while the average RTT of the other prexes is 84.7 ms.
The bottom line: “An average latency increase of 35.1 ms.” This is partially because the two different meanings of the word location come into play when you are interacting with services like CDNs and DNS. These services will always try to serve your requests from a physical location close to you. When you are using Ethernet stretched over IP, however, your topological location (where you connect to the network) and your geographical location (where you are physically located on the face of the Earth) can be radically different. Think about the mental dislocation when you call someone with an area code that is normally tied to an area of the west coast of the US, and yet you know they now live around London, say…
We could probably add in a bit of complexity to solve these problems, or (even better) just include your GPS coordinates in the IP header. After all, what’s the point of privacy? … 🙂 The bottom line is this: remote peering might a good idea when everything else fails, of course, but if you haven’t found the tradeoffs, you haven’t looked hard enough. It might be that application performance across a remote peering session is low enough that paying for the connection might turn out cheaper.
In the meantime, wake me up when we decide that stretching Ethernet over IP is never a good thing.
A few weeks ago, I was in the midst of a conversation about EVPNs, how they work, and the use cases for deploying them, when one of the participants exclaimed: “This is so complicated… why don’t we stick with the older way of doing things with multi-chassis link aggregation and virtual chassis device?” Sometimes it does seem like we create complex solutions when a simpler solution is already available. Since simpler is always better, why not just use them? After all, simpler solutions are easier to understand, which means they are easier to deploy and troubleshoot.
The problem is we too often forget the other side of the simplicity equation—complexity is required to solve hard problems and adapt to demanding environments. While complex systems can be fragile (primarily through ossification), simple solutions can flat out fail just because they can’t cope with changes in their environment.
As an example, consider MLAG. On the surface, MLAG is a useful technology. If you have a server that has two network interfaces but is running an application that only supports a single IP address as a service access point, MLAG is a neat and useful solution. Connect a single (presumably high reliability) server to two different upstream switches through two different network interface cards, which act like one logical Ethernet network. If one of the two network devices, optics, cables, etc., fails, the server still has network connectivity.
But MLAG has well-known downsides, as well. There is a little problem with the physical locality of the cables and systems involved. If you have a service split among multiple services, MLAG is no longer useful. If the upstream switches are widely separated, then you have lots of cabling fun and various problems with jitter and delay to look forward to.
There is also the little problem of MLAG solutions being mostly proprietary. When something fails, your best hope is a clueful technical assistance engineer on the other end of a phone line. If you want to switch vendors for some reason, you have the fun of taking the entire server out of operation for a maintenance window to do the switch, along with the vendor lock-in in the case of failure, etc.
EVPN, with its ability to attach a single host through multiple virtual Ethernet connections across an IP network, is a lot more complex on the surface. But looks can be deceiving… In the case of EVPN, you see the complexity “upfront,” which means you can (largely) understand what it is doing and how it is doing it. There is no “MLAG black box;” useful in cases where you must troubleshoot.
And there will be cases where you will need to troubleshoot.
Further, because EVPN is standards-based and implemented by multiple vendors, you can switch over one virtual connection at a time; you can switch vendors without taking the server down. EVPN is a much more flexible solution overall, opening up possibilities around multihoming across different pods in a butterfly fabric, or allowing the use of default MAC addresses to reduce table sizes.
Virtual chassis systems can often solve some of the same problems as EVPN, but again—you are dealing with a black box. The black box will likely never scale to the same size, and cover the same use cases, as a community-built standard like EVPN.
The bottom line
Sometimes starting with a more complex set of base technologies will result in a simpler overall system. The right system will not avoid complexity, but rather reduce it where possible and contain it where it cannot be avoided. If you find your system is inflexible and difficult to manage, maybe its time to go back to the drawing board and start with something a little more complex, or where the complexity is “on the surface” rather than buried in an abstraction. The result might actually be simpler.
One “sideways” place to look for value in the network is in a place that initially seems far away from infrastructure, data gravity. Data gravity is not something you might often think about directly when building or operating a network, but it is something you think about indirectly. For instance, speeds and feeds, quality of service, and convergence time are all three side effects, in one way or another, of data gravity.
As with all things in technology (and life), data gravity is not one thing, but two, one good and one bad—and there are tradeoffs. Because if you haven’t found the tradeoffs, you haven’t looked hard enough. All of this is, in turn, related to the CAP Theorem.
Data gravity is, first, a relationship between applications and data location. Suppose you have a set of applications that contain customer information, such as forms of payment, a record of past purchases, physical addresses, and email addresses. Now assume your business runs on three applications. The first analyzes past order information and physical location to try to predict potential interests in current products. The second fills in information on an order form when a customer logs into your company’s ecommerce site. The third is used by marketing to build and send email campaigns.
Think about where you would store this information physically, especially if you want the information used by all three applications to be as close to accurate as possible in near-real time. CAP theorem dictates that if you choose to partition the data, say by having one copy of the customer address information in an on-premises fabric for analysis and another in a public cloud service for the ecommerce front-end, you are you going to have to live with either locking one copy of the record or the other while it is being updated, or you are going to have to live with one of the two copies of the record being out-of-date while its being used (the database can either have a real time locking system or be eventually consistent). If you really want the data used by all three applications to be accurate in near-real time, the only solution you have is to run all three applications in the same physical infrastructure so there can be one unified copy of all three data.
This causes the first instance of data gravity. If you want the ordering front end to work well, you need to unify the data stores, and move them physically close to the application. If you want the other applications that interact with this same data to work well, you need to move them to the same physical infrastructure as the data they rely on. Data gravity, in this case, acts like a magnet; the more data that is stored in a location, the more applications will want to be in that same location. The more applications that run on a particular infrastructure, the more data will be stored there, as well. Data follows applications, and applications follow data. Data gravity, however, can often work against the best interests of the business. The public cloud is not the most efficient place to house every piece of data, nor is it the best place to do all processing.
Minimizing the impact of the CAP theorem by building and operating a network that efficiently moves data minimizes the impact of this kind of data gravity.
The second instance of data gravity is sometimes called Kai-Fu Lee’s Virtuous Cycle, or the KL-VC. Imagine the same set of applications, only now look at the analysis part of the system more closely. Clearly, if you knew more about your customers than their location, you could perform better analysis, know more about their preferences, and be able to target your advertising more carefully. This might (or might not—there is some disagreement over just how fine-grained advertising can be before it moves from useful to creepy and productive to counterproductive) lead directly to increased revenues flowing from the ecommerce site.
But obtaining, storing, and processing this data means moving this data, as well—a point not often considered when thinking about how rich of a shopping experience “we might be able to offer.” Again, the network provides the key to moving the data so it can be obtained, stored, and processed.
At first glance, this might all still seem like commodity stuff—it is still just bits on the wire that need to be moved. Deeper inspection, however, reveals that this simply is not true. Understanding where data lives and how applications need to use it, the tradeoffs of CAP against that data, and the changes in the way the company does business if that data could be moved more quickly and effectively. Understanding which data flow has real business impact, and when, is a gateway into understanding how to add business value.
If you could say to your business leaders: let’s talk about where data is today, where it needs to be tomorrow, and how we can build a network where we can consciously balance between network complexity, network cost, and application performance, would that change the game at all? What if you could say: let’s talk about building a network that provides minimal jitter and fixed delay for defined workloads, and yet allows data to be quickly tied to public clouds in a way that allows application developers to choose what is best for any particular use case?
Two things which seem to be universally true in the network engineering space right this moment. The first is that network engineers are convinced their jobs will not exist or there will only be network engineers “in the cloud” within the next five years. The second is a mad scramble to figure out how to add value to the business through the network. These two movements are, of course, mutually exclusive visions of the future. If there is absolutely no way to add value to a business through the network, then it only makes sense to outsource the whole mess to a utility-level provider.
The result, far too often, is for the folks working on the network to run around like they’ve been in the hot aisle so long that your hair is on fire. This result, however, somehow seems less than ideal.
I will suggest there are alternate solutions available if we just learn to think sideways and look for them. Burning hair is not a good look (unless it is an intentional part of some larger entertainment). What sort of sideways thinking am I looking for? Let’s begin by going back to basics by asking a question that might a bit dangerous to ask—do applications really add business value? They certainly seem to. After all, when you want to know or do something, you log into an application that either helps you find the answer or provides a way to get it done.
But wait—what underlies the application? Applications cannot run on thin air (although I did just read someplace that applications running on “the cloud” are the only ones that add business value, implying applications running on-premises do not). They must have data or information, in order to do their jobs (like producing reports, or allowing you to order something). In fact, one of the major problems developers face when switching from one application to handle a task to another one is figuring out how to transfer the data.
This seems to imply that data, rather than applications, is at the heart of the business. When I worked for a large enterprise, one of my favorite points to make in meetings was we are not a widget company… we are a data company. I normally got blank looks from both the IT and the business folks sitting in the room when I said this—but just because the folks in the room did not understand it does not mean it is not true.
What difference does this make? If the application is the center of the enterprise world, then the network is well and truly a commodity that can, and should, be replaced with the cheapest version possible. If, however, data is at the heart of what a business does, then the network and the application are em>equal partners in information technology. It is not that one is “more important” while the other is “less important;” rather, the network and the applications just do different things for and to one of the core assets of the business—information.
After all, we call it information technology, rather than application technology. There must be some reason “information” is in there—maybe it is because information is what really drives value in the business?
How does changing our perspective in this way help? After all, we are still “stuck” with a view of the network that is “just about moving data,” right? And moving data is just about exciting as moving, well… water through pipes, right?
No, not really.
Once information is the core, then the network and applications become “partners” in drawing value out of data in a way that adds value to the business. Applications and the network are but “fungible,” in that they can be replaced with something newer, more effective, better, etc., but neither is really more important than the other.
This post has gone on a bit long in just “setting the stage,” so I’ll continue this line of thought next week.
A post on Martin Fowler’s blog this week started me thinking about lock-in—building a system that only allows you to purchase components from a single vendor so long as the system is running. The point of Martin’s piece is that lock-in exists in all systems, even open source, and hence you need to look at lock-in as a set of tradeoffs, rather than always being a negative outcome. Given that lock-in is a tradeoff, and that lock-in can happen regardless of the systems you decide to deploy, I want to go back to one of the foundational points Martin makes in his post and think about avoiding lock-in a little differently than just choosing between open source and vendor-based solutions.
If cannot avoid lock-in either by choosing a vendor-based solution or by choosing open source, then you have two choices. The first is to just give up and live with the results of lock-in. In fact, I have worked with a lot of companies who have done just this—they have accepted that lock-in is just a part of building networks, that lock-in results in a good transfer of risk to the vendor from the operator, or that lock-in results in a system that is easier to deploy and manage.
Giving in to lock-in, though, does not seem like a good idea on the surface, because architecture is about creating opportunity. If you cannot avoid lock-in and yet lock-in is antithetical to good architecture, what are your other options?
First, you can deepen your understanding of where and how lock-in matters, and when it doesn’t. Architecture often fails in one of three ways. The first way is designing something that is so closely fit-to-purpose today that is inflexible against future requirements. Think of a pair of pants that fit perfectly today; if you gain a few pounds, you might have to start over with your body-covering pant solution after that upcoming holiday. The second way is designing something that is so flexible it is wasteful. Think of a pair of pants that are sized to fit you no matter what your body size might be. Perhaps you might try to create a pair of pants you can wear when you are two, but also when you are seventy-two? You could design such a thing, I’m certain, but the amount of material you’d need to use in creating it would probably involve enough waste that you might as well just go ahead and sew multiple sets of pants across your lifetime. The third way is in building something that is flexible, but only through ossification. Something which can only be changed by adding or multiplying, and never through subtraction or division. Think of a pair of pants that can be changed to any color by adding a thin layer of new material each time.
All three of these are overengineered in one sense or another. They are trying to solve problems that probably don’t exist, that may never exist, or that can only be solved by adding another layer on top. All three of these are also forms of lock-in which should be avoided.
In order to buy the right kind of pants, you have to know how long you expect the pants to fit, and you have to know how your body is going to change over that time period. This is where the architect needs to connect with the business and try to understand what kinds of changes are likely, and what those changes might mean for the network. This is also the place where the architect needs to feed some technical reality back into the business; most of the time business leaders don’t think about technology challenges because no-one has ever explained to them that networks and information technology systems are not infinitely pliable balls of wax.
Second, you can try to contain lock-in, and intentionally refuse to be locked-in above a modular level. For instance, say you build a data center fabric using a marketecture sold by a vendor. If that marketecture can be contained within a single data center fabric, and you have more than one fabric, then you can avoid some degree of lock-in by building your other fabrics with different solutions. If the marketecture doesn’t allow for this, then maybe you should just say “no.” Modular design is not just good for controlling the leakage of state between modules, it can also be good for controlling the leakage of lock-in.
Finally, you can consider proper modularization vertically in your network, from layer to layer in your network stack. Enforce IPv4 or IPv6 as a single “wasp waist,” in the network transport; don’t allow Ethernet to wander all over the place, across modules and layers, as if it were the universal transport. Enforce separate and underlay and overlay control planes. Enforce a single form of storing information (such as all YANG formatted data, or all JSON formatted data, etc.) in your network management systems.
This is what open standards are good for. It will be painful to build to these standards, but these standards represent a point of abstraction where you can split lock-in systems apart, preventing them from entangling with one another in a way that prevents ever replacing one component with another.
Lock-in is not something you can avoid. On the other hand, you can learn to accept it when you know it will not impact your future options, and contain it when you know it will.
I recently spoke at CHINOG on the business value of disaggregation, and participated in a panel on getting involved in the IETF. If you’re interested in these two talks, the videos are linked below.
Over at the Communications of the ACM, Micah Beck has an article up about the hourglass model. While the math is quite interesting, I want to focus on transferring the observations from the realm of protocol and software systems development to network design. Specifically, start with the concept and terminology, which is very useful. Taking a typical design, such as this—
The first key point made in the paper is this—
The thin waist of the hourglass is a narrow straw through which applications can draw upon the resources that are available in the less restricted lower layers of the stack.
A somewhat obvious point to be made here is applications can only use services available in the spanning layer, and the spanning layer can only build those services out of the capabilities of the supporting layers. If fewer applications need to be supported, or the applications deployed do not require a lot of “fancy services,” a weaker spanning layer can be deployed. Based on this, the paper observes—
The balance between more applications and more supports is achieved by first choosing the set of necessary applications N and then seeking a spanning layer sufficient for N that is as weak as possible. This scenario makes the choice of necessary applications N the most directly consequential element in the process of defining a spanning layer that meets the goals of the hourglass model.
Beck calls the weakest possible spanning layer to support a given set of applications the minimally sufficient spanning layer (MSSL). There is one thing that seems off about this definition, however—the correlation between the number of applications supported and the strength of the spanning layer. There are many cases where a network supports thousands of applications, and yet the network itself is quite simple. There are many other cases where a network supports just a few applications, and yet the network is very complex. It is not the number of applications that matter, it is the set of services the applications demand from the spanning layer.
Based on this, we can change the definition slightly: an MSSL is the weakest spanning layer that can provide the set of services required by the applications it supports. This might seem intuitive or obvious, but it is often useful to work these kinds of intuitive things out, so they can be expressed more precisely when needed.
First lesson: the primary driver in network complexity is application requirements. To make the network simpler, you must reduce the requirements applications place on the network.
There are, however, several counter-intuitive cases here. For instance, TCP is designed to emulate (or abstract) a circuit between two hosts—it creates what appears to be a flow controlled, error free channel with no drops on top of IP, which has no flow control, and drops packets. In this case, the spanning layer (IP), or the wasp waist, does not support the services the upper layer (the application) requires.
In order to make this work, TCP must add a lot of complexity that would normally be handled by one of the supporting layers—in fact, TCP might, in some cases, recreate capabilities available in one of the supporting layers, but hidden by the spanning layer. There are, as you might have guessed, tradeoffs in this neighborhood. Not only are the mechanisms TCP must use more complex that the ones some supporting layer might have used, TCP represents a leaky abstraction—the underlying connectionless service cannot be completely hidden.
Take another instance more directly related to network design. Suppose you aggregate routing information at every point where you possibly can. Or perhaps you are using BGP route reflectors to manage configuration complexity and route counts. In most cases, this will mean information is flowing through the network suboptimally. You can re-optimize the network, but not without introducing a lot of complexity. Further, you will probably always have some form of leaky abstraction to deal with when abstracting information out of the network.
Second lesson: be careful when stripping information out of the spanning layer in order to simplify the network. There will be tradeoffs, and sometimes you end up with more complexity than what you started with.
A second counter-intuitive case is that of adding complexity to the supporting layers in order to ultimately simplify the spanning layer. It seems, on the model presented in the paper, that adding more services to the spanning layer will always end up adding more complexity to the entire system. MPLS and Segment Routing (SR), however, show this is not always true. If you need traffic steering, for instance, it is easier to implement MPLS or SR in the support layer rather than trying to emulate their services at the application level.
Third lesson: sometimes adding complexity in a lower layer can simplify the entire system—although this might seem to be counter-intuitive from just examining the model.
The bottom line: complexity is driven by applications (top down), but understanding the full stack, and where interactions take place, can open up opportunities for simplifying the overall system. The key is thinking through all parts of the system carefully, using effective mental models to understand how they interact (interaction surfaces), and the considering the optimization tradeoffs you will make by shifting state to different places.