Deterministic Networking and New IP

For those not following the current state of the ITU, a proposal has been put forward to (pretty much) reorganize the standards body around “New IP.” Don’t be confused by the name—it’s exactly what it sounds like, a proposal for an entirely new set of transport protocols to replace the current IPv4/IPv6/TCP/QUIC/routing protocol stack nearly 100% of the networks in operation today run on. Ignoring, for the moment, the problem of replacing the entire IP infrastructure, what can we learn from this proposal?

What I’d like to focus on is deterministic networking. Way back in the days when I was in the USAF, one of the various projects I worked on was called PCI. The PCI network was a new system designed to unify the personnel processing across the entire USAF, so there were systems (Z100s, 200s, and 250s) to be installed in every location across the base where personnel actions were managed. Since the only wiring we had on base at the time was an old Strowger mainframe, mechanical crossbars at a dozen or so BDFs, and varying sizes of punch-downs at BDFs and IDFs, everything for this system needed to be punched- or wrapped-down as physical circuits.

It was hard to get anything like real bandwidth over paper-wrapped cables with lead shielding that had been installed in the 1960s (or before??), wrap-downs, and 66 blocks. The max we could get in terms of bandwidth was about a T1, and often less. We needed more bandwidth than this, so we installed inverse multiplexers that would combine the bandwidth across multiple physical circuits to something large enough for the PCI system to run on.

The problem we ran into with these inverse multiplexers was they would sometimes fall out of synchronization, meaning frames would be delivered out of order—one of the two cardinal violations of deterministic networks. Since PCI was designed around a purely deterministic networking model, these failures caused havoc with HR. Not a good thing.

The second and third cardinal rules of deterministic networking are that there will be no jitter, and the network will deliver all packets accepted by the network. To make these rules work, there must be some form of entrance gating (circuit setup, generally speaking, with busy signals), fixed packet sizes, and strict quality of service.

In contrast, the rules of packet-switched (non-deterministic) networking are: all packets are accepted, packets can be (almost) any size, there is no guarantee any particular packet will be delivered, and there’s no way to know what the jitter and delay on delivering any particular packet might be.

Some kinds of payloads just need deterministic semantics, while others just need deterministic semantics. The problem, then, is how to build a single network that supports both kinds of semantics. Solving this problem is where thinking through the situation using a problem-solution mindset can help you, as a network engineer (or protocol designer, or software creator) understand what can be done, what cannot be done, and what the limitations are going to be no matter what solution you choose.

There are, at base, only three solutions to this problem. The first is to build a network that supports both deterministic and packet-switched traffic from the control planes to the switching path. The complexity of doing something like this would ultimately outweigh any possible benefit, so let’s leave that solution aside. The second is to emulate a deterministic network on top of a packet-switched network, and the third is to emulate a packet-switched network on top of a deterministic network. Consider the last option first—emulating a packet-switched network on top of a deterministic network. For those who are old enough, this is IP-over-ATM. It didn’t work. The inefficiencies of trying to stuff variably sized packets into the fixed-size frames required to create a deterministic network were so significant that it just … didn’t work. The control planes were difficult to deploy and manage—think ATM LANE, for instance—and the overall network was just really complicated.

Today, what we mostly do is emulate deterministic networks over packet-switched networks. This design allows traffic that does well in a packet-switched environment to run perfectly fine, and traffic that likes “some” deterministic properties, like voice, to work fine with a little effort on the part of the network designer and operator. Building something with all the properties of a genuinely deterministic network on top of a packet-switched network, however, is difficult.

To get there, you need a few things. First, you need some way to strictly control/steer flows, so the mix of packet and deterministic traffic is going allow you to meet deterministic requirements on every link through the network. Second, you need some excellent QoS mechanism that knows how to provide deterministic results while mixing in packet-switched traffic “underneath.” Third, you need to overprovision enough to take up any “slack” and variability, as well as to account for queuing and clock on/clock off delays through switching devices.

The truth is we can come pretty close—witness the ability of IP networks to carry video streaming and what looks like traditional voice today. Can it get better? I’m confident we can build systems that are ever closer to emulating a truly deterministic network on top of a packet-switched network—so long as we mind the tradeoffs and are willing to throw capacity and hardware at the problem.

Can we ever truly emulate packet-switched networks on top of deterministic networks? I don’t see how. It’s not just that we’ve tried and failed; it’s that the math just doesn’t ever seem to “work right” in some way or another.

So while the “new IP” proposal brings up an interesting problem—future applications may need more deterministic networking profiles—it doesn’t explain why we should believe either building a completely parallel deterministic network or trying to flip the stack to emulate a packet-switched network on top of a deterministic network, makes more sense than what we are doing today.

Looking at this from a problem/solution perspective helps clarify the situation, and produce a conclusion about which path is best, without even getting into specific protocols or implementations. Really understanding the problem you are trying to solve, even at an abstract level, and then working through all the possible solutions, even ones that might not have been invented yet (although I can promise you then have been invented), can help you get your mind around the engineering possibilities.

It is probably not how you’re accustomed to looking at network design, protocol selection, etc. But it’s one you should start using.

The 4D Network

I think we can all agree networks have become too complex—and this complexity is a result of the network often becoming the “final dumping ground” of every problem that seems like it might impact more than one system, or everything no-one else can figure out how to solve. It’s rather humorous, in fact, to see a lot of server and application folks sitting around saying “this networking stuff is so complex—let’s design something better and simpler in our bespoke overlay…” and then falling into the same complexity traps as they start facing the real problems of policy and scale.

This complexity cannot be “automated away.” It can be smeared over with intent, but we’re going to find—soon enough—that smearing intent on top of complexity just makes for a dirty kitchen and a sub-standard meal.

While this is always “top of mind” in my world, what brings it to mind this particular week is a paper by Jen Rexford et al. (I know Jen isn’t on the lead position in the author list, but still…) called A Clean Slate 4D Approach to Network Control and Management. Of course, I can appreciate the paper in part because I agree with a lot of what’s being said here. For instance—

We believe the root cause of these problems lies in the control plane running on the network elements and the management plane that monitors and configures them. In this paper, we argue for revisiting the division of functionality and advocate an extreme design point that completely separates a network’s decision logic from the protocols that govern interaction of network elements.

In other words, we’ve not done our modularization homework very well—and our lack of focus on doing modularization right is adding a lot of unnecessary complexity to our world. The four planes proposed in the paper are decision, dissemination, discovery, and data. The decision plane drives network control, including reachability, load balancing, and access control. The dissemination plane “provides a robust and efficient communication substrate” across which the other planes can send information. The discovery plane “is responsible for discovering the physical components of the network,” giving each item an identifier, etc. The data plane carries packets edge-to-edge.

I do have some quibbles with this architecture, of course. To begin, I’m not certain the word “plane” is the right one here. Maybe “layers,” or something else that implies more of a modular concept with interactions, and less a “ships in the night” sort of affair. My more substantial disagreement is with the placement of “interface configuration” and where reachability is placed in the model.

Consider this: reachability and access control are, in a sense, two sides of the same coin. You learn where something is to make it reachable, and then you block access to it by hiding reachability from specific places in the network. There are two ways to control reachability—by hiding the destination, or by blocking traffic being sent to the destination. Each of these has positive and negative aspects.

But notice this paradox—the control plane cannot hide reachability towards something it does not know about. You must know about something to prevent someone from reaching it. While reachability and access control are two sides of the same coin, they are also opposites. Access control relies on reachability to do its job.

To solve this paradox, I would put reachability into discovery rather than decision. Discovery would then become the discovery of physical devices, paths, and reachability through the network. No policy would live here—discovery would just determine what exists. All the policy about what to expose about what exists would live within the decision plane.

While the paper implies this kind of system must wait for some day in the future to build a network using these principles, I think you can get pretty close today. My “ideal design” for a data center fabric right now is (modified) IS-IS or RIFT in the underlay and eVPN in the overlay, with a set of controllers sitting on top. Why?

IS-IS doesn’t rely on IP, so it can serve in a mostly pure discovery role, telling any upper layers where things are, and what is changing. eVPN can provide segmented reachability on top of this, as well as any other policies. Controllers can be used to manage eVPN configuration to implement intent. A separate controller can work with IS-IS on inventory and lifecycle of installed hardware. This creates a clean “break” between the infrastructure underlay and overlays, and pretty much eliminates any dependence the underlay has on the overlay. Replacing the overlay with a more SDN’ish solution (rather than BGP) is perfectly do-able and reasonable.

While not perfect, the link-state underlay/BGP overlay model comes pretty close to implementing what Jen and her co-authors are describing, only using protocols we have around today—although with some modification.

But the main point of this paper stands—a lot of the reason for the complexity we fact today is simply because we modularize using aggregation and summarization and call the job “done.” We aren’t thinking about the network as a system, but rather as a bunch of individual appliances we slap together into places, which we then connect through other places (or strings, like between tin cans).

Something to think about the next time you get that 2AM call.

Smart Network or Dumb?

Should the network be dumb or smart? Network vendors have recently focused on making the network as smart as possible because there is a definite feeling that dumb networks are quickly becoming a commodity—and it’s hard to see where and how steep profit margins can be maintained in a commodifying market. Software vendors, on the other hand, have been encroaching on the network space by “building in” overlay network capabilities, especially in virtualization products. VMWare and Docker come immediately to mind; both are either able to, or working towards, running on a plain IP fabric, reducing the number of services provided by the network to a minimum level (of course, I’d have a lot more confidence in these overlay systems if they were a lot smarter about routing … but I’ll leave that alone for the moment).

How can this question be answered? One way is to think through what sorts of things need to be done in processing packets, and then think through where it makes most sense to do those things. Another way is to measure the accuracy or speed at which some of these “packet processing things” can be done so you can decide in a more empirical way. The paper I’m looking at today, by Anirudh et al., takes both of these paths in order to create a baseline “rule of thumb” about where to place packet processing functionality in a network.

Sivaraman, Anirudh, Thomas Mason, Aurojit Panda, Ravi Netravali, and Sai Anirudh Kondaveeti. “Network Architecture in the Age of Programmability.” ACM SIGCOMM Computer Communication Review 50, no. 1 (March 23, 2020): 38–44. https://doi.org/10.1145/3390251.3390257.

The authors consider six different “things” networks need to be able to do: measurement, resource management, deep packet inspection, network security, network virtualization, and application acceleration. The first of these they measure by setting introducing errors into a network and measuring the dropped packet rate using various edge and in-network measurement tools. What they found was in-network measurement has a lower error rate, particularly as time scales become shorter. For instance, Pingmesh, a packet loss measurement tool that runs on hosts, is useful for measuring packet loss in the minutes—but in-network telemetry can often measure packet loss in the seconds or milliseconds. They observe that in-network telemetry of all kinds (not just packet loss) appears to be more accurate when application performance is more important—so they argue telemetry largely belongs in the network.

Resource management, such as determining which path to take, or how quickly to transmit packets (setting the window size for TCP or QUIC, for instance), is traditionally performed entirely on hosts. The authors, however, note that effective resource management requires accurate telemetry information about flows, link utilization, etc.—and these things are best performed in-network rather than on hosts. For resource management, then, they prefer a hybrid edge/in-network approach.

The argue deep packet inspection and network virtualization are both best done at the edge, in hosts, because these are processor intensive tasks—often requiring more processing power and time than network devices have available. Finally, they argue network security should be located on the host, because the host has the fine-grained service information required to perform accurate filtering, etc.

Based on their arguments, the authors propose four rules of thumb. First, tasks that leverage data only available at the edge should run at the edge. Second, tasks that leverage data naturally found in the network should be run in the network. Third, tasks that require large amounts of processing power or memory should be run on the edge. Fourth, tasks that run at very short timescales should be run in the network.

I have, of course, some quibbles with their arguments … For instance, the argument that security should run on the edge, in hosts, assumes a somewhat binary view of security—all filters and security mechanisms should be “one place,” and nowhere else. A security posture that just moves “the firewall” from the edge of the network to the edge of the host, however, is going to (eventually) face the same vulnerabilities and issues, just spread out over a larger attack surface (every host instead of the entry to the network). Security shouldn’t work this way—the network and the host should work together to provide defense in depth.

The rules of thumb, however, seem to be pretty solid starting points for thinking about the problem. An alternate way of phrasing their result is through the principle of subsidiarity—decisions should be made as close as possible to the information required to make them. While this is really a concept that comes out of ethics and organizational management, it succinctly describes a good rule of thumb for network architecture.

 

 

Tradeoffs Come in Threes

On a Spring 2019 walk in Beijing I saw two street sweepers at a sunny corner. They were beat-up looking and grizzled but probably younger than me. They’d paused work to smoke and talk. One told a story; the other’s eyes widened and then he laughed so hard he had to bend over, leaning on his broom. I suspect their jobs and pay were lousy and their lives constrained in ways I can’t imagine. But they had time to smoke a cigarette and crack a joke. You know what that’s called? Waste, inefficiency, a suboptimal outcome. Some of the brightest minds in our economy are earnestly engaged in stamping it out. They’re winning, but everyone’s losing. —Tim Bray

This, in a nutshell, is what is often wrong with our design thinking in the networking world today. We want things to be efficient, wringing the last little dollar, and the last little bit of bandwidth, out of everything.

This is also, however, a perfect example of the problem of triads and tradeoffs. In the case of the street sweeper, we might thing, “well, we could replace those folks sitting around smoking a cigarette and cracking jokes with a robot, making things much more efficient.” We might notice the impact on the street sweeper’s salaries—but after all, it’s a boring job, and they are better off doing something else anyway, right?

We’re actually pretty good at finding, and “solving” (for some meaning of “solving,” of course), these kinds of immediately obvious tradeoffs. It’s obvious the street sweepers are going to lose their jobs if we replace them with a robot. What might not be so obvious is the loss of the presence of a person on the street. That’s a pair of eyes who can see when a child is being taken by someone who’s not a family member, a pair of ears that can hear the rumble of a car that doesn’t belong in the neighborhood, a pair of hands that can help someone who’s fallen, etc.

This is why these kinds of tradeoffs always come in (at least) threes.

Let’s look at the street sweepers in terms of the SOS triad. Replacing the street sweepers with a robot or machine certainly increases optimization. According to the triad, though, increasing optimization in one area should result in some increase in complexity someplace, and some loss of optimization in other places.

What about surfaces? The robot must be managed, and it must interact with people and vehicles on the street—which means people and vehicles must also interact with the robot. Someone must build and maintain the robot, so there must be some sort of system, with a plethora of interaction surfaces, to make this all happen. So yes, there may be more efficiency, but there are now more interaction surfaces to deal with now, too. These interaction surfaces increase complexity.

What about state? In a sense, there isn’t much change in state other than moving it—purely in terms of sweeping the street, anyway. The sweeper and the robot must both understand when and how to sweep the street, etc., so the state doesn’t seem to change much here.

On the other hand, that extra set of eyes and ears, that extra mind, that is no longer on the street in a personal way represents a loss of state. The robot is an abstraction of the person who was there before, and abstraction always represents a loss of state in some way. Whether this loss of state decreases the optimal handling of local neighborhood emergencies is probably a non-trivial problem to consider.

The bottom line is this—when you go after efficiency, you need to think in terms of efficiency of what, rather than efficiency as a goal-in-itself. That’s because there is no such thing as “efficiency-in-itself,” there is only something you are making more efficient—and a lot of things you potentially making less efficient.

Automate your network, certainly, or even buy a system that solves “all the problems.” But remember there are tradeoffs—often a large number of tradeoffs you might not have thought about—and those tradeoffs have consequences.

It’s not “if you haven’t found the tradeoff, you haven’t looked hard enough…” Don’t stop at one. It’s “if you haven’t found the tradeoffs, you haven’t looked hard enough.” It’s a plural for a reason.

Zero Trust and the Cookie Metaphor

In old presentations on network security (watch this space; I’m working on a new security course for Ignition in the next six months or so), I would use a pair of chocolate chip cookies as an illustration for network security. In the old days, I’d opine, network security was like a cookie that was baked to be crunchy on the outside and gooey on the inside. Now-a-days, however, I’d say network security needs to be more like a store-bought cookie—crunchy all the way through. I always used this illustration to make a point about defense-in-depth. You cannot assume the thin crunchy security layer at the edge of your network—generally in the form of stateful packet filters and the like (okay, firewalls, but let’s leave the appliance world behind for a moment)—is what you really need.

There are such things as insider attacks, after all. Further, once someone breaks through the thin crunchy layer at the edge, you really don’t want them being able to move laterally through your network.

The United States National Institute of Standards and Technology (NIST) has released a draft paper describing Zero Trust Architecture, which addresses many of the same concerns as the cookie that’s crunchy all the way through—the lateral movement of attackers through your network, for instance.

The situation, however, has changed quite a bit since I used the cookie illustration. The problem is no longer that the inside of your network needs to be just as secure as the outside of your network, but rather that there is no “inside” to your network any longer. For this we need to add a third cookie—the kind you get in the soft-baked packages, or even in the jar (or roll) of cookie dough—these cookies are gooey all the way through.

To understand why this is… It used to be, way back when, we had a fairly standard Demilitarized Zone design.

 

 

For those unfamiliar with this design, D is configured to block traffic to C or A’s interfaces, and C is configured as a stateful filter and to block access to A’s addresses. If D is taken over, it should not have access to C or A; if C is taken over, it still should not have access to A. This provides a sort-of defense-in-depth.

Building this kind of DMZ, however, anticipates there will be at most a few ways into the network. These entries are choke points that give the network operator a place to look for anything “funny.”

Moving applications to the cloud, widespread remote work, and many other factors have rendered the “choke point/DMZ” model of security. There just isn’t a hard edge any longer to harden; just because someone is “inside” the topological bounds of your network does not mean they are authorized to be there, or to access data and applications.

The new solution is Zero Trust—moving authentication out to the endpoints. The crux of Zero Trust is to prevent unauthorized access to data or services on a per user, per device basis. There is still an “implied trust zone,” a topology within a sort of DMZ, where user traffic is trusted—but these are small areas with no user-controlled hosts.

If you want to understand Zero Trust beyond just the oft thrown around “microsegmentation,” this paper is well worth reading, as it explains the terminology and concepts in terms even network engineers can understand.

]

The Network is not Free: The Case of the Connected Toaster

Latency is a big deal for many modern applications, particularly in the realm of machine learning applied to problems like determining if someone standing at your door is a delivery person or a … robber out to grab all your smart toasters and big screen television. The problem is networks, particularly in the last mile don’t deal with latency very well. In fact, most of the network speeds and feeds available in anything outside urban areas kindof stinks. The example given by Bagchi et al. is this—

A fixed video sensor may generate 6Mbps of video 24/7, thus producing nearly 2TB of data per month—an amount unsustainable according to business practices for consumer connections, for example, Comcast’s data cap is at 1TB/month and Verizon Wireless throttles traffic over 26GB/month. For example, with DOCSIS 3.0, a widely deployed cable Internet technology, most U.S.-based cable systems deployed today support a maximum of 81Mbps aggregated over 500 home—just 0.16Mbps per home.

Bagchi, Saurabh, Muhammad-Bilal Siddiqui, Paul Wood, and Heng Zhang. “Dependability in Edge Computing.” Communications of the ACM 63, no. 1 (December 2019): 58–66. https://doi.org/10.1145/3362068.

The authors claim a lot of the problem here is just that edge networks have not been built out, but there is a reason these edge networks aren’t built out large enough to support pulling this kind of data load into a centrally located data center: the network isn’t free.

This is something so obvious to network engineers that it almost slips under our line of thinking unnoticed—except, of course, for the constant drive to make the network cost less money. For application developers, however, the network is just a virtual circuit data rides over… All the complexity of pulling fiber out to buildings or curbs, all the work of physically connecting things to the fiber, all the work of figuring out how to make routing scale, it’s all just abstracted away in a single QUIC or TCP session.

If you can’t bring the data to the compute, which is typically contained in some large-scale data center, then you have to bring the computing power to the data. The complexity of bringing the computing power to the data is applications, especially modern micro-services based applications optimized for large-scale, low latency data center fabrics, just aren’t written to be broken into components and spread all over the world.

Let’s consider the case of the smart toaster—the case used in the paper in hand. Imagine a toaster with little cameras to sense the toastiness of the bread, electronically controlled heating elements, an electronically controlled toast lifter, and some sort of really nice “bread storage and moving” system that can pull bread out of a reservoir, load them into the toaster, and make it all work. Imagine being able to get up in the morning to a fresh cup of coffee and a nice bagel fresh and hot just as you hit the kitchen…

But now let’s look at the complexity required to do such a thing. We must have local processing power and storage, along with some communication protocol that periodically uploads and downloads data to improve the toasting process. You have to have some sort of handling system that can learn about new kinds of bread and adapt to them automatically—this is going to require data, as well. You have to have a bread reservoir that will keep the bread fresh for a few days so you don’t have refill it constantly.

Will you save maybe five minutes every morning? Maybe.

Will you spend a lot of time getting this whole thing up and running? Definitely.

What will the MTBF be, precisely? What about the MTTR?

All to save five minutes in the morning? Of course the authors chose a trivial—perhaps even silly—example to use, just to illustrate the kinds of problems IoT devices combined with edge computing are going to encounter. But still … in five years you’re going to see advertisements for this smart toaster out there. There are toasters that already have a few of these features, and refrigerators that go far beyond this.

Sometimes we have to remember the cost of the network is telling us something—just because we can do a thing doesn’t mean we should. If the cost of the network forces us to consider the tradeoffs, that’s a good thing.

And remember that if your toaster makes your bread at the same time every morning, you have to adjust to the machine’s schedule, rather than the machine adjusting to yours…

Research: Off-Path TCP Attacks

I’s fnny, bt yu cn prbbly rd ths evn thgh evry wrd s mssng t lst ne lttr. This is because every effective language—or rather every communication system—carried enough information to reconstruct the original meaning even when bits are dropped. Over-the-wire protocols, like TCP, are no different—the protocol must carry enough information about the conversation (flow data) and the data being carried (metadata) to understand when something is wrong and error out or ask for a retransmission. These things, however, are a form of data exhaust; much like you can infer the tone, direction, and sometimes even the content of conversation just by watching the expressions, actions, and occasional word spoken by one of the participants, you can sometimes infer a lot about a conversation between two applications by looking at the amount and timing of data crossing the wire.

The paper under review today, Off-Path TCP Exploit, uses cleverly designed streams of packets and observations about the timing of packets in a TCP stream to construct an off-path TCP injection attack on wireless networks. Understanding the attack requires understanding the interaction between the collision avoidance used in wireless systems and TCP’s reaction to packets with a sequence number outside the current window.

Beginning with the TCP end of things—if a TCP packet is received with a window falling outside the current window, TCP implementations will send a duplicate of the last ACK it sent back to the transmitter. From the Wireless network side of things, only one talker can use the channel at a time. If a device begins transmitting a packet, and then hears another packet inbound, it should stop transmitting and wait some random amount of time before trying to transmit again. These two things can be combined to guess at the current window size.

Assume an attacker sends a packet to a victim which must be answered, such as a probe. Before the victim can answer, the attacker than sends a TCP segment which includes a sequence number the attacker thinks might be within the victim’s receive window, sourcing the packet from the IP address of some existing TCP session. Unless the IP address of some existing session is used in this step, the victim will not answer the TCP segment. Because the attacker is using a spoofed source address, it will not receive the ACK from this segment, so it must find some other way to infer if an ACK was sent by the victim.

How can the attacker infer this? After sending this TCP sequence, the attacker sends another probe of some kind to the victim which must be answered. If the TCP segment’s sequence number is outside the current window, the victim will attempt to send a copy of its previous ACK. If the attacker times things correctly, the victim will attempt to send this duplicate ACK while the attacker is transmitting the second probe packet; the two packets will collide, causing the victim to back off, slowing the receipt of the probe down a bit from the attacker’s perspective.

If the answer to the second probe is slower than the answer to the first probe, the attacker can infer the sequence number of the spoofed TCP segment is outside the current window. If the two probes are answered in close to the same time, the attacker can infer the sequence number of the spoofed TCP segment is within the current window.

Combining this information with several other well-known aspects of widely deployed TCP stacks, the researchers found they could reliably inject information into a TCP stream from an attacker. While these injections would still need to be shaped in some way to impact the operation of the application sending data over the TCP stream, the ability to inject TCP segments in this way is “halfway there” for the attacker.

There probably never will be a truly secure communication channel invented that does not involve encryption—the data required to support flow control and manage errors will always provide enough information to an attacker to find some clever way to break into the channel.