RESEARCH – rule 11 reader

Hedge 173: If Multicast is the answer, what was the question?

Russ — Thu, 06 Apr 2023 18:03:35 +0000

Multicast hasn’t ever really “gone viral” (In modern terms!) throughout the Internet—in fact, it’s not widely used even in networks supporting enterprises. why not? Join Dirk Trossen, Russ White, and Tom Ammon as we discuss the many facets of multicast, and what the future holds.

Dirk’s paper on multicast can be found here.

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-173.mp3

download

Hedge 141: Improving WAN Router Performance

Russ — Wed, 03 Aug 2022 17:00:48 +0000

Wide area networks in large-scale cores tend to be performance choke-points—partially because of differentials between the traffic they’re receiving from data center fabrics, campuses, and other sources, and the availability of outbound bandwidth, and partially because these routers tend to be a focal point for policy implementation. Rachee Singh joins Tom Ammon, Jeff Tantsura, and Russ White to discuss “Shoofly, a tool for provisioning wide-area backbones that bypasses routers by keeping traffic in the optical domain for as long as possible.”

https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-141.mp3

download

Hedge 121: Computing in the Network with Marie-Jose Montpetit

Russ — Wed, 09 Mar 2022 21:05:46 +0000

Can computation be drawn into the network, rather than always being pushed to the edge of the network? Taking content distribution networks as a starting point, the COIN research group is looking at ways to make networks more content and computationally aware, bringing compute into the network itself. Join Alvaro Retana, Marie-Jose Montpetit, and Russ White, as we discuss the ongoing research around computing in the network.
https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-121.mp3
download

Hedge 109: Edward Lewis and the DNS Core

Russ — Thu, 18 Nov 2021 18:58:11 +0000

What is the “core” of the DNS system, and how has it changed across the years? Edward Lewis joins Tom Ammon and Russ White to discuss his research into what the “core” of the domain name system is and how it has changed—including the rise of the large cloud players to the core of the default free zone.
https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-109.mp3
download

Strong Reactions and Complexity

Russ — Mon, 02 Nov 2020 18:00:13 +0000

In the realm of network design—especially in the realm of security—we often react so strongly against a perceived threat, or so quickly to solve a perceived problem, that we fail to look for the tradeoffs. If you haven’t found the tradeoffs, you haven’t looked hard enough—or, as Dr. Little says, you have to ask what is gained and what is lost, rather than just what is gained. This failure to look at both sides often results in untold amounts of technical debt and complexity being dumped into network designs (and application implementations), causing outages and failures long after these decisions are made.

A 2018 paper on DDoS attacks, A First Joint Look at DoS Attacks and BGP Blackholing in the Wild provides a good example of causing more damage to an attack than the attack itself. Most networks are configured to allow the operator to quickly configure a remote triggered black hole (RTBH) using BGP. Most often, a community is attached to a BGP route that points the next-hop to a local discard route on each eBGP speaker. If used on the route advertising the destination of the attack—the service under attack—the result is the DDoS attack traffic no longer has a destination to flow to. If used on the route advertising the source of the DDoS attack traffic, the result is the DDoS traffic will no pass any reverse-path forwarding policies at the edge of the AS, and hence be dropped. Since most DDoS attacks are reflected, blocking the source traffic still prevents access to some service, generally DNS or something similar.

In either case, then, stopping the DDoS through an RTBH causes damage to services rather than just the attacker. Because of this, remote triggered black holes should really only be used in the most extreme cases, where no other DDoS mitigation strategy will work.

The authors of the Joint Look use publicly avaiable information to determine the answers to several questions. First, what scale of DDoS attacks are RTBHs used against? Second, how long after an attack begins is the RTBH triggered? Third, for how long is the RTBH left in place after the attack has been mitigated?

The answer to the first question should be—the RTBH is only used against the largest-scale attacks. The answer to the second question should be—the RTBH should be put in place very quickly after the attack is detected. The answer to the third question should be—the RTBH should be taken down as soon as the attack has stopped. The researchers found that RTBHs were most often used to mitigate the smallest of DDoS attacks, and almost never to mitigate larger ones. The authors also found that RTBHs were often left in place for hours after a DDoS attack had been mitigated. Both of these imply that current use of RTBH to mitigate DDoS attacks is counterproductive.

How many more design patterns do we follow that are simply counterproductive in the same way? This is not a matter of “following the data,” but rather one of really thinking through what it is you are trying to accomplish, and then how to accomplish that goal with the simplest set of tools available. Think through what it would mean to remove what you have put in, whether you really need to add another layer or protocol, how to minimize configuration, etc.

If you want your network to be less complex, examine the tradeoffs realistically.

Reducing RPKI Single Point of Takedown Risk

Russ — Mon, 21 Sep 2020 17:00:30 +0000

The RPKI, for those who do not know, ties the origin AS to a prefix using a certificate (the Route Origin Authorization, or ROA) signed by a third party. The third party, in this case, is validating that the AS in the ROA is authorized to advertise the destination prefix in the ROA—if ROA’s were self-signed, the security would be no better than simply advertising the prefix in BGP. Who should be able to sign these ROAs? The assigning authority makes the most sense—the Regional Internet Registries (RIRs), since they (should) know which company owns which set of AS numbers and prefixes.

The general idea makes sense—you should not accept routes from “just anyone,” as they might be advertising the route for any number of reasons. An operator could advertise routes to source spam or phishing emails, or some government agency might advertise a route to redirect traffic, or block access to some web site. But … if you haven’t found the tradeoffs, you haven’t looked hard enough. Security, in particular, is replete with tradeoffs.

Every time you deploy some new security mechanism, you create some new attack surface—sometimes more than one. Deploy a stateful packet filter to protect a server, and the device itself becomes a target of attack, including buffer overflows, phishing attacks to gain access to the device as a launch-point into the private network, and the holes you have to punch in the filters to allow services to work. What about the RPKI?

When the RKI was first proposed, one of my various concerns was the creation of new attack services. One specific attack surface is the control a single organization—the issuing RIR—has over the very existence of the operator. Suppose you start a new content provider. To get the new service up and running, you sign a contract with an RIR for some address space, sign a contract with some upstream provider (or providers), set up your servers and service, and start advertising routes. For whatever reason, your service goes viral, netting millions of users in a short span of time.

Now assume the RIR receives a complaint against your service for whatever reason—the reason for the complaint is not important. This places the RIR in the position of a prosecutor, defense attorney, and judge—the RIR must somehow figure out whether or not the charges are true, figure out whether or not taking action on the charges is warranted, and then take the action they’ve settled on.

In the case of a government agency (or a large criminal organization) making the complaint, there is probably going to be little the RIR can do other than simply revoke your certificate, pulling your service off-line.

Overnight your business is gone. You can drag the case through the court system, of course, but this can take years. In the meantime, you are losing users, other services are imitating what you built, and you have no money to pay the legal fees.

A true story—without the names. I once knew a man who worked for a satellite provider, let’s call them SATA. Now, SATA’s leadership decided they had no expertise in accounts receivables, and they were spending too much time on trying to collect overdue bills, so they outsourced the process. SATB, a competing service, decided to buy the firm SATA outsourced their accounts receivables to. You can imagine what happens next… The accounting firm worked as hard as it could to reduce the revenue SATA was receiving.

Of course, SATA sued the accounting firm, but before the case could make it to court, SATA ran out of money, laid off all their people, and shut their service down. SATA essentially went out of business. They won some money later, in court, but … whatever money they won was just given to the investors of various kinds to make up for losses. The business itself was gone, permanently.

Herein lies the danger of giving a single entity like an RIR, even if they are friendly, honest, etc., control over a critical resource.

A recent paper presented at the ANRW at APNIC caught my attention as a potential way to solve this problem. The idea is simple—just allow (or even require) multiple signatures on a ROA. To be more accurate, each authorizing party issues a “partial certificate;” if “enough” pieces of the certificate are found and valid, the route will be validated.

The question is—how many signatures (or parts of the signature, or partial attestations) should be enough? The authors of the paper suggest there should be a “Threshold Signature Module” that makes this decision. The attestations of the various signers are combined in the threshold module to produce a single signature that is then used to validate the route. This way the validation process on the router remains the same, which means the only real change in the overall RPKI system is the addition of the threshold module.

If one RIR—even the one that allocated the addresses you are using—revokes their attestation on your ROA, the remaining attestations should be enough to convince anyone receiving your route that it is still valid. Since there are five regions, you have at least five different choices to countersign your ROA. Each RIR is under the control of a different national government; hence organizations like governments (or criminals!) would need to work across multiple RIRs and through other government organizations to have a ROA completely revoked.

An alternate solutions here, one that follows the PGP model, might be to simply have the threshold signature model consider the number and source of ROAs using the existing model. Local policy could determine how to weight attestations from different RIRs, etc.

This multiple or “shared” attestation (or signature) idea seems like a neat way to work around one of (possibly the major) attack surfaces introduced by the RPKI system. If you are interested in Internet core routing security, you should take a read through the post linked above, and then watch the video.

Smart Network or Dumb?

Russ — Mon, 27 Jul 2020 17:00:33 +0000

Should the network be dumb or smart? Network vendors have recently focused on making the network as smart as possible because there is a definite feeling that dumb networks are quickly becoming a commodity—and it’s hard to see where and how steep profit margins can be maintained in a commodifying market. Software vendors, on the other hand, have been encroaching on the network space by “building in” overlay network capabilities, especially in virtualization products. VMWare and Docker come immediately to mind; both are either able to, or working towards, running on a plain IP fabric, reducing the number of services provided by the network to a minimum level (of course, I’d have a lot more confidence in these overlay systems if they were a lot smarter about routing … but I’ll leave that alone for the moment).

How can this question be answered? One way is to think through what sorts of things need to be done in processing packets, and then think through where it makes most sense to do those things. Another way is to measure the accuracy or speed at which some of these “packet processing things” can be done so you can decide in a more empirical way. The paper I’m looking at today, by Anirudh et al., takes both of these paths in order to create a baseline “rule of thumb” about where to place packet processing functionality in a network.

Sivaraman, Anirudh, Thomas Mason, Aurojit Panda, Ravi Netravali, and Sai Anirudh Kondaveeti. “Network Architecture in the Age of Programmability.” ACM SIGCOMM Computer Communication Review 50, no. 1 (March 23, 2020): 38–44. https://doi.org/10.1145/3390251.3390257.

The authors consider six different “things” networks need to be able to do: measurement, resource management, deep packet inspection, network security, network virtualization, and application acceleration. The first of these they measure by setting introducing errors into a network and measuring the dropped packet rate using various edge and in-network measurement tools. What they found was in-network measurement has a lower error rate, particularly as time scales become shorter. For instance, Pingmesh, a packet loss measurement tool that runs on hosts, is useful for measuring packet loss in the minutes—but in-network telemetry can often measure packet loss in the seconds or milliseconds. They observe that in-network telemetry of all kinds (not just packet loss) appears to be more accurate when application performance is more important—so they argue telemetry largely belongs in the network.

Resource management, such as determining which path to take, or how quickly to transmit packets (setting the window size for TCP or QUIC, for instance), is traditionally performed entirely on hosts. The authors, however, note that effective resource management requires accurate telemetry information about flows, link utilization, etc.—and these things are best performed in-network rather than on hosts. For resource management, then, they prefer a hybrid edge/in-network approach.

The argue deep packet inspection and network virtualization are both best done at the edge, in hosts, because these are processor intensive tasks—often requiring more processing power and time than network devices have available. Finally, they argue network security should be located on the host, because the host has the fine-grained service information required to perform accurate filtering, etc.

Based on their arguments, the authors propose four rules of thumb. First, tasks that leverage data only available at the edge should run at the edge. Second, tasks that leverage data naturally found in the network should be run in the network. Third, tasks that require large amounts of processing power or memory should be run on the edge. Fourth, tasks that run at very short timescales should be run in the network.

I have, of course, some quibbles with their arguments … For instance, the argument that security should run on the edge, in hosts, assumes a somewhat binary view of security—all filters and security mechanisms should be “one place,” and nowhere else. A security posture that just moves “the firewall” from the edge of the network to the edge of the host, however, is going to (eventually) face the same vulnerabilities and issues, just spread out over a larger attack surface (every host instead of the entry to the network). Security shouldn’t work this way—the network and the host should work together to provide defense in depth.

The rules of thumb, however, seem to be pretty solid starting points for thinking about the problem. An alternate way of phrasing their result is through the principle of subsidiarity—decisions should be made as close as possible to the information required to make them. While this is really a concept that comes out of ethics and organizational management, it succinctly describes a good rule of thumb for network architecture.

The Network is not Free: The Case of the Connected Toaster

Russ — Mon, 29 Jun 2020 17:00:10 +0000

Latency is a big deal for many modern applications, particularly in the realm of machine learning applied to problems like determining if someone standing at your door is a delivery person or a … robber out to grab all your smart toasters and big screen television. The problem is networks, particularly in the last mile don’t deal with latency very well. In fact, most of the network speeds and feeds available in anything outside urban areas kindof stinks. The example given by Bagchi et al. is this—

A fixed video sensor may generate 6Mbps of video 24/7, thus producing nearly 2TB of data per month—an amount unsustainable according to business practices for consumer connections, for example, Comcast’s data cap is at 1TB/month and Verizon Wireless throttles traffic over 26GB/month. For example, with DOCSIS 3.0, a widely deployed cable Internet technology, most U.S.-based cable systems deployed today support a maximum of 81Mbps aggregated over 500 home—just 0.16Mbps per home.

Bagchi, Saurabh, Muhammad-Bilal Siddiqui, Paul Wood, and Heng Zhang. “Dependability in Edge Computing.” Communications of the ACM 63, no. 1 (December 2019): 58–66. https://doi.org/10.1145/3362068.

The authors claim a lot of the problem here is just that edge networks have not been built out, but there is a reason these edge networks aren’t built out large enough to support pulling this kind of data load into a centrally located data center: the network isn’t free.

This is something so obvious to network engineers that it almost slips under our line of thinking unnoticed—except, of course, for the constant drive to make the network cost less money. For application developers, however, the network is just a virtual circuit data rides over… All the complexity of pulling fiber out to buildings or curbs, all the work of physically connecting things to the fiber, all the work of figuring out how to make routing scale, it’s all just abstracted away in a single QUIC or TCP session.

If you can’t bring the data to the compute, which is typically contained in some large-scale data center, then you have to bring the computing power to the data. The complexity of bringing the computing power to the data is applications, especially modern micro-services based applications optimized for large-scale, low latency data center fabrics, just aren’t written to be broken into components and spread all over the world.

Let’s consider the case of the smart toaster—the case used in the paper in hand. Imagine a toaster with little cameras to sense the toastiness of the bread, electronically controlled heating elements, an electronically controlled toast lifter, and some sort of really nice “bread storage and moving” system that can pull bread out of a reservoir, load them into the toaster, and make it all work. Imagine being able to get up in the morning to a fresh cup of coffee and a nice bagel fresh and hot just as you hit the kitchen…

But now let’s look at the complexity required to do such a thing. We must have local processing power and storage, along with some communication protocol that periodically uploads and downloads data to improve the toasting process. You have to have some sort of handling system that can learn about new kinds of bread and adapt to them automatically—this is going to require data, as well. You have to have a bread reservoir that will keep the bread fresh for a few days so you don’t have refill it constantly.

Will you save maybe five minutes every morning? Maybe.

Will you spend a lot of time getting this whole thing up and running? Definitely.

What will the MTBF be, precisely? What about the MTTR?

All to save five minutes in the morning? Of course the authors chose a trivial—perhaps even silly—example to use, just to illustrate the kinds of problems IoT devices combined with edge computing are going to encounter. But still … in five years you’re going to see advertisements for this smart toaster out there. There are toasters that already have a few of these features, and refrigerators that go far beyond this.

Sometimes we have to remember the cost of the network is telling us something—just because we can do a thing doesn’t mean we should. If the cost of the network forces us to consider the tradeoffs, that’s a good thing.

And remember that if your toaster makes your bread at the same time every morning, you have to adjust to the machine’s schedule, rather than the machine adjusting to yours…

Research: Off-Path TCP Attacks

Russ — Mon, 22 Jun 2020 17:00:58 +0000

I’s fnny, bt yu cn prbbly rd ths evn thgh evry wrd s mssng t lst ne lttr. This is because every effective language—or rather every communication system—carried enough information to reconstruct the original meaning even when bits are dropped. Over-the-wire protocols, like TCP, are no different—the protocol must carry enough information about the conversation (flow data) and the data being carried (metadata) to understand when something is wrong and error out or ask for a retransmission. These things, however, are a form of data exhaust; much like you can infer the tone, direction, and sometimes even the content of conversation just by watching the expressions, actions, and occasional word spoken by one of the participants, you can sometimes infer a lot about a conversation between two applications by looking at the amount and timing of data crossing the wire.

The paper under review today, Off-Path TCP Exploit, uses cleverly designed streams of packets and observations about the timing of packets in a TCP stream to construct an off-path TCP injection attack on wireless networks. Understanding the attack requires understanding the interaction between the collision avoidance used in wireless systems and TCP’s reaction to packets with a sequence number outside the current window.

Beginning with the TCP end of things—if a TCP packet is received with a window falling outside the current window, TCP implementations will send a duplicate of the last ACK it sent back to the transmitter. From the Wireless network side of things, only one talker can use the channel at a time. If a device begins transmitting a packet, and then hears another packet inbound, it should stop transmitting and wait some random amount of time before trying to transmit again. These two things can be combined to guess at the current window size.

Assume an attacker sends a packet to a victim which must be answered, such as a probe. Before the victim can answer, the attacker than sends a TCP segment which includes a sequence number the attacker thinks might be within the victim’s receive window, sourcing the packet from the IP address of some existing TCP session. Unless the IP address of some existing session is used in this step, the victim will not answer the TCP segment. Because the attacker is using a spoofed source address, it will not receive the ACK from this segment, so it must find some other way to infer if an ACK was sent by the victim.

How can the attacker infer this? After sending this TCP sequence, the attacker sends another probe of some kind to the victim which must be answered. If the TCP segment’s sequence number is outside the current window, the victim will attempt to send a copy of its previous ACK. If the attacker times things correctly, the victim will attempt to send this duplicate ACK while the attacker is transmitting the second probe packet; the two packets will collide, causing the victim to back off, slowing the receipt of the probe down a bit from the attacker’s perspective.

If the answer to the second probe is slower than the answer to the first probe, the attacker can infer the sequence number of the spoofed TCP segment is outside the current window. If the two probes are answered in close to the same time, the attacker can infer the sequence number of the spoofed TCP segment is within the current window.

Combining this information with several other well-known aspects of widely deployed TCP stacks, the researchers found they could reliably inject information into a TCP stream from an attacker. While these injections would still need to be shaped in some way to impact the operation of the application sending data over the TCP stream, the ability to inject TCP segments in this way is “halfway there” for the attacker.

There probably never will be a truly secure communication channel invented that does not involve encryption—the data required to support flow control and manage errors will always provide enough information to an attacker to find some clever way to break into the channel.

Is QUIC really Quicker?

Russ — Mon, 08 Jun 2020 17:00:43 +0000

QUIC is a relatively new data transport protocol developed by Google, and currently in line to become the default transport for the upcoming HTTP standard. Because of this, it behooves every network engineer to understand a little about this protocol, how it operates, and what impact it will have on the network. We did record a History of Networking episode on QUIC, if you want some background.

In a recent Communications of the ACM article, a group of researchers (Kakhi et al.) used a modified implementation of QUIC to measure its performance under different network conditions, directly comparing it to TCPs performance under the same conditions. Since the current implementations of QUIC use the same congestion control as TCP—Cubic—the only differences in performance should be code tuning in estimating the round-trip timer (RTT) for congestion control, QUIC’s ability to form a session in a single RTT, and QUIC’s ability to carry multiple streams in a single connection. The researchers asked two questions in this paper: how does QUIC interact with TCP flows on the same network, and does UIC perform better than TCP in all situations, or only some?

To answer the first question, the authors tried running QUIC and TCP over the same network in different configurations, including single QUIC and TCP sessions, a single QUIC session with multiple TCP sessions, etc. In each case, they discovered that QUIC consumed about 50% of the bandwidth; if there were multiple TCP sessions, they would be starved for bandwidth when running in parallel with the QUIC session. For network folk, this means an application implemented using QUIC could well cause performance issues for other applications on the network—something to be aware of. This might mean it is best, if possible, to push QUIC-based applications into a separate virtual or physical topology with strict bandwidth controls if it causes other applications to perform poorly.

Does QUIC’s ability to consume more bandwidth mean applications developed on top of it will perform better? According to the research in this paper, the answer is how many balloons fit in a bag? In other words, it all depends. QUIC does perform better when its multi-stream capability comes into play and the network is stable—for instance, when transferring variably sized objects (files) across a network with stable jitter and delay. In situations with high jitter or delay, however, TCP consistently outperforms QUIC.

TCP outperforming QUIC is a bit of a surprise in any situation; how is this possible? The researchers used information from their additional instrumentation to discover QUIC does not tolerate out-of-order packet delivery very well because of its fast packet retransmission implementation. Presumably, it should be possible to modify these parameters somewhat to make QUIC perform better.

This would still leave the second problem the researchers found with QUIC’s performance—a large difference between its performance on desktop and mobile platforms. The difference between these two comes down to where QUIC is implemented. Desktop devices (and/or servers) often have smart NICs which implement TCP in the ASIC to speed packet processing up. QUIC, because it runs in user space, only runs on the main processor (it seems hard to see how a user space application could run on a NIC—it would probably require a specialized card of some type, but I’ll have to think about this more). The result is that QUIC’s performance depends heavily on the speed of the processor. Since mobile devices have much slower processors, QUIC performs much more slowly on mobile devices.

QUIC is an interesting new transport protocol—one everyone involved in designing or operating networks is eventually going to encounter. This paper gives good insight into the “soul” of this new protocol.

Understanding DC Fabric Complexity

Russ — Mon, 11 May 2020 17:00:34 +0000

When I think of complexity, I mostly consider transport protocols and control planes—probably because I have largely worked in these areas from the very beginning of my career in network engineering. Complexity, however, is present in every layer of the networking stack, all the way down to the physical. I recently ran across an interesting paper on complexity in another part of the network I had not really thought about before: the physical plant of a data center fabric.

Some researchers at USC and VMWare have thought about complexity in the physical infrastructure, however, and they wrote a rather interesting paper about it.

The paper begins by defining what complexity in the physical infrastructure of a DC fabric looks like. They focus on packaging, or the layout of the switches in the fabric, the bundles of cabling required to wire the topology, and the number and locations of patch panels required. The packaging and patch panels impact the length and complexity of the cable runs (whether optical or copper), which represents a base complexity for the entire topology.

The second thing they consider is the lifecycle of the physical fabric infrastructure. What steps are required to upgrade the fabric from a smaller configuration to a larger one? Or from a lower speed (higher oversubscription) to a higher speed (lower oversubscription)? The result is the ability to put a number on the overall complexity of each topology.

The first class of topologies they consider are spine-and-leaf, such as the Clos, Benes, and butterfly fabrics. They call all kinds of spine-and-leaf fabrics Clos fabrics. Spine-and-leaf fabrics, they note generally have very low cabling complexity because their symmetry encourages consistent bundling and hardware placement. They call the second kind of topology expander fabrics; the most common fabric in this class is the dragonfly. These topologies are more difficult to wire but simpler to scale out because they can be expanded largely by modifying just the edge of the fabric. Their analysis shows these classes of fabric rate equally on their complexity scale.

A side note they don’t consider in the paper—their complexity computation implies that if you are building a fabric with a somewhat fixed range of sizes, and you can preplan the location of spines leaving enough room for the maximum sized fabric on the first day, spine-and-leaf fabrics are less complex than the fancier topologies you might hear about from time to time. Since most data center fabrics do, in fact, fall into these kinds of constraints (given a good day one designer!), this seems to validate the widespread use of butterfly and Clos fabrics for most applications. This feels like a significant result for most common data center fabric designs.

Finally, they describe an interesting topology they call FatClique, which is an interesting blend of spine-and-lead and edge expander topologies; I’ve screen grabbed the image from the paper below.

Overall, it’s well worth spending the time to read the entire paper if you have an in-depth interest in fabric design.The way this topology is described feels very much like a Benes to me, or a butterfly where the fabric routers are replaced by fabrics (making a seven-stage fabric). It’s hard to tell how useful this topology would be in real deployments—but that researchers are looking into alternatives other than the venerable spine-and-leaf is interesting in its own right.

Understanding Internet Peering

Russ — Mon, 06 Apr 2020 17:00:41 +0000

The world of provider interconnection is a little … “mysterious” … even to those who work at transit providers. The decision of who to peer with, whether such peering should be paid, settlement-free, open, and where to peer is often cordoned off into a separate team (or set of teams) that don’t seem to leak a lot of information. A recent paper on current interconnection practices published in ACM SIGCOMM sheds some useful light into this corner of the Internet, and hence is useful for those just trying to understand how the Internet really works.

To write the paper, the authors sent requests to fill out a survey through a wide variety of places, including NOG mailing lists and blogs. They ended up receiving responses from all seven regions (based on the RIRs, who control and maintain Internet numbering resources like AS numbers and IP addresses), 70% from ISPs, 14% from content providers, and 7% from “Enterprise” and infrastructure operators. Each of these kinds of operators will have different interconnection needs—I would expect ISPs to engage in more settlement-free peering (with roughly equal traffic levels), content providers to engage in more open (settlement-free connections with unequal traffic levels), IXs to do mostly local peering (not between regions), and “enterprises” to engage mostly in paid peering. The survey also classified respondents by their regional footprint (how many regions they operate in) and size (how many customers they support).

The survey focused on three facets of interconnection: time required to form a connection, the reasons given for interconnecting, and parameters included in the peering agreement. These largely describe the status quo in peering—interconnections as they are practiced today. As might be expected, connections at IXs are the quickest to form. Since IXs are normally set up to enable peering; it makes sense that the preset processes and communications channels enabled by an IX would make the peering process a lot faster. According to the survey results, the most common timeframe to complete peering is days, with about a quarter taking weeks.

Apparently, the vast majority (99%!) of peering arrangements are by “handshake,” which means there is no legal contract behind them. This is one reason Network Operator Groups (NOGs) are so important (a topic of discussion in the Hedge 31, dropping next week); the peering workshops are vital in building and keeping the relationships behind most peering arrangements.

On-demand connectivity is a new trend in inter-AS peering. For instance, interxion recently worked with LINX and several other IXs to develop a standard set of APIs allowing operators to peer with one another in a standard way, often reducing the technical side of the peering process to minutes rather than hours (or even days). Companies are moving into this space, helping operators understand who they should peer with, and building pre-negotiated peering contracts with many operators. While current operators seem to be aware of these options, they do not seem to be using these kinds of services yet.

While this paper is interesting, it does leave many corners of the inter-AS peering world un-exposed. For instance—I would like to know how correct my assumptions are about the kinds of peering used by each of the different classes of providers is, and whether there are regional differences in the kinds of peering. While its interesting to survey the reasons providers pursue peering, it would be interesting to understand the process of making a peering determination more fully. What kinds of tools are available, and how are they used? These would be useful bits of information for an operator who only connects to the Internet, rather than being part of the Internet infrastructure (perhaps a “non-infrastructure operator,” rather than “enterprise”) in understanding how their choice of upstream provider can impact the performance of their applications and network.

Note: this is another useful, but slightly older, paper on the topic of peering.

The Hedge 27: New directions in network and computing systems

Russ — Wed, 18 Mar 2020 17:00:47 +0000

On this episode of the Hedge, Micah Beck joins us to discuss a paper he wrote recently considering a new model of compute, storage, and networking. Micah Beck is Associate Professor in computer science at the University of Tennessee, Knoxville, where he researches and publishes in the area of networking technologies, including the hourglass model and the end-to-end principle.

If you are interested in the paper we are discussing on this episode, or Micah’s other work, you can find it at his personal site.
https://media.blubrry.com/hedge/content.blubrry.com/hedge/hedge-027.mp3
download

Whither Cyber-Insurance?

Russ — Mon, 02 Mar 2020 18:00:34 +0000

Note: I’m off in the weeds a little this week thinking about cyber-insurance because of a paper that landed in one of my various feeds—while this isn’t something we often think about as network operators, it does impact the overall security of the systems we build.

When you go to the doctor for a yearly checkup, do you think about health or insurance? You probably think about health, but the practice of going to the doctor for regular checkups began because of large life insurance companies in the United States. These companies began using statistical methods to make risk, or to build actuarial tables they could use to set the premiums properly. Originally, life insurance companies relied on the “hunches” of their salesmen, combined with some checking by people in the “back office,” to determine the correct premium. Over time, they developed networks of informers in local communities, such as doctors, lawyers, and even local politicians, who could describe the life of anyone in their area, providing the information the company needed to set premiums correctly.

Over time, however, statistical methods came into play, particularly relying on an initial visit with a doctor. The information these insurance companies gathered, however, gave them insight into what habits increased or decreased longevity—they decided they should use this information to help shape people’s lives so they would live longer, rather than just using it to discover the correct premiums. To gather more information, and to help people live better lives, life insurance companies started encouraging yearly doctor visits, even setting up non-profit organizations to support the doctors who gave these examinations. Thus was born the yearly doctor’s visit, the credit rating agencies, and a host of other things we take for granted in modern life.

You can read about the early history of life insurance and its impact on society in How Our Days Became Numbered.

What does any of this have to do with networks? Only this—we are in much the same position in the cyber-insurance market right now as the life insurance market in the late 1800s through the mid-1900s—insurance agents interview a company and make a “hunch bet” on how much to charge the company for cyber-insurance. Will cyber-insurance ever mature to the same point as life insurance? According to a recent research paper, the answer is “probably not.” Why not?

First, legal restrictions will not allow a solution such as the one imposed by payment processors. Second, there does not seem to be a lot of leverage in cyber-insurance premiums. The cost of increasing security is generally much higher than any possible premium discount, making it cheaper for companies just to pay the additional premium than to improve their security posture. Third, there is no real evidence tying the use of specific products to reductions in security breaches. Instead, network and data security tend to be tied to practices rather than products, making it harder for an insurer to precisely specify what a company can and should to improve their posture.

Finally, the largest problem is measurement. What does it look like for a company to “go to the doctor” regularly? Does this mean regular penetration tests? Standardizing penetration tests is difficult, and it can be far too easy to counter pentests without improving the overall security posture. Like medical care in the “early days,” there is no way to know you have gathered enough information on the population to know if you correctly understand the kinds of things that improve “health”—but there is no way to compel reporting (much less accurate reporting), nor is there any way to compel insurance companies to share the information they have about cyber incidents.

Will cyber-insurance exist as a “separate thing” in the future? The authors largely answer in the negative. The pressures of “race to the bottom,” providing maximal coverage with minimal costs (which they attribute to the structure of the cyber-insurance market), combined with lack of regulatory clarity and inaccurate measurements, will probably end up causing cyber-insurance to “fold into” other kinds of insurance.

Whether this is a positive or negative result is a matter of conjecture—the legacy of yearly doctor’s visits and public health campaigns is not universally “good,” after all.

Nines are not enough

Russ — Mon, 27 Jan 2020 18:00:34 +0000

How many 9’s is your network? How about your service provider’s? Now, to ask the not-so-obvious question—why do you care? Does the number of 9’s actually describe the reliability of the network? According to Jeffery Mogul and John Wilkes, nines are not enough. The question is—while this paper was written for commercial relationships and cloud providers, is it something you can apply to running your own network? Let’s dive into the meat of the paper and find out.

While 5 9’s is normally given as a form of Service Level Agreement (SLA), there are two other measures of reliability a network operator needs to consider—the Service Level Objective (SLO), and the Service Level Indicator (SLI). The SLO defines a set of expectations about the level of service; internal SLO’s define “trigger points” where actions should be taken to prevent an external SLO from failing. For instance, if the external SLO says no more than 2% of the traffic will be dropped on this link, the internal SLO might say if more than 1% of the traffic on this link is dropped, you need to act. The SLA, on the other hand, says if more than 2% of the traffic on this link is dropped, the operator will rebate (some amount) to the customer. The SLI says this is how I am going to measure the percentage of packets dropped on this link.

Splitting these three concepts apart helps reveal what is wrong with the entire 5 9’s way of thinking, because it enables you to ask questions like—can my telemetry system measure and report on the amount of traffic dropped on this link? Across what interval should this SLI apply? If I combine all the SLI’s across my entire network, what does the monitoring system need to look like? Can I support the false positives likely to occur with such a monitoring system?

These questions might be obvious, of course, but there are more non-obvious ones, as well. For instance—how do my internal and external SLO’s correlate to my SLI’s? Measuring the amount of traffic dropped on a link is pretty simple (in theory). Measuring something like this application will not perform at less than 50% capacity because of network traffic is going to be much, much harder.

The point Mogul and Wilkes make in this paper is that we just need to rethink the way we write SLO’s and their resulting SLA’s to be more realistic—in particular, we need to think about whether or not the SLI’s we can actually measure and act on can cash the SLO and SLA checks we’re writing. This means we probably need to expose more, rather than less, of the complexity of the network itself—even though this cuts against the grain of the current move towards abstracting the network down to “ports and packets.” To some degree, the consumer of networking services is going to need to be more informed if we are to build realistic SLA’s that can be written and kept.

How does this apply to the “average enterprise network engineer?” At first glance, it might seem like this paper is strongly oriented towards service providers, since there are definite contracts, products, etc., in play. If you squint your eyes, though, you can see how this would apply to the rest of the world. The implicit promise you make to an application developer or owner that their application will, in fact, run on the network with little or no performance degradation is, after all, an SLO. Your yearly review examining how well the network has met the needs of the organization is an SLA of sorts.

The kind of thinking represented here, if applied within an organization, could turn the conversation about whether to out- or in-source on its head. Rather than talking about the 5 9’s some cloud provider is going to offer, it opens up discussions about how and what to measure, even within the cloud service, to understand the performance being offered, and how more specific and nuanced results can be measured against a fuller picture of value added.

This is a short paper—but well worth reading and considering.

Lessons in Location and Identity through Remote Peering

Russ — Mon, 02 Dec 2019 18:00:03 +0000

We normally encounter four different kinds of addresses in an IP network; we tend to think about each of these as:

The MAC address identifies an interface on a physical or virtual wire

The IP address identifies an interface on a host

The DNS name identifies a host

The port number identifies an application or service running on the host

There are other address-like things, of course, such as the protocol number, a router ID, an MPLS label, etc. But let’s stick to these four for the moment. Looking through this list, the first thing you should notice is we often use the IP address as if it identified a host—which is generally not a good thing. There have been some efforts in the past to split the locator from the identifier, but the IP protocol suite was designed with a separate locator and identifier already: the IP address is the location and the DNS name is the identifier.

Even if you split up the locator and the identifier, however, the word locator is still quite ambiguous because we often equate the geographical and topological locations. In fact, old police procedural shows used to include scenes where a suspect was tracked down because they were using an IP address “assigned to them” in some other city… When the topic comes up this way, we can see the obvious flaw. In other situations, conflating the IP address with the location of the device is less obvious, and causes more subtle problems.

Consider, for instance, the concept of remote peering. Suppose you want to connect to a cloud provider who has a presence in an IXP that’s just a few hundred miles away. You calculate the costs of placing a router on the IX fabric, add it to the cost of bringing up a new circuit to the IX, and … well, there’s no way you are ever going to get that kind of budget approved. Looking around, though, you find there is a company that already has a router connected to the IX fabric you want to be on, and they offer a remote peering solution, which means they offer to build an Ethernet tunnel across the public Internet to your edge router. Once the tunnel is up, you can peer your local router to the cloud provider’s network using BGP. The cloud provider thinks you have a device physically connected to the local IX fabric, so all is well, right?

In a recent paper, a group of researchers looked at the combination of remote peering and anycast addresses. If you are not familiar with anycast addresses, the concept is simple: take a service which is replicated across multiple locations and advertise every instance of the service using a single IP address. This is clever because when you send packets to the IP address representing the service, you will always reach the closest instance of the service. So long as you have not played games with stretched Ethernet, that is.

In the paper, the researchers used various mechanisms to figure out where remote peering was taking place, and another to discover services being advertised using anycast (normally DNS or CDN services). Using the intersection of these two, they determined if remote peering was impacting the performance of any of these services. I shocked, shocked, to tell you the answer is yes. I would never have expected stretched Ethernet to have a negative impact on performance.

To quote the paper directly:

…we found that 38% (126/332) of RTTs in traceroutes towards anycast prexes potentially aected by remote peering are larger than the average RTT of prexes without remote peering. In these 126 traceroute probes, the average RTT towards prexes potentially aected by remote peering is 119.7 ms while the average RTT of the other prexes is 84.7 ms.

The bottom line: “An average latency increase of 35.1 ms.” This is partially because the two different meanings of the word location come into play when you are interacting with services like CDNs and DNS. These services will always try to serve your requests from a physical location close to you. When you are using Ethernet stretched over IP, however, your topological location (where you connect to the network) and your geographical location (where you are physically located on the face of the Earth) can be radically different. Think about the mental dislocation when you call someone with an area code that is normally tied to an area of the west coast of the US, and yet you know they now live around London, say…

We could probably add in a bit of complexity to solve these problems, or (even better) just include your GPS coordinates in the IP header. After all, what’s the point of privacy? … The bottom line is this: remote peering might a good idea when everything else fails, of course, but if you haven’t found the tradeoffs, you haven’t looked hard enough. It might be that application performance across a remote peering session is low enough that paying for the connection might turn out cheaper.

In the meantime, wake me up when we decide that stretching Ethernet over IP is never a good thing.

Research: Securing Linux with a Faster and Scalable IPtables

Russ — Mon, 25 Nov 2019 19:51:16 +0000

If you haven’t found the trade-offs, you haven’t looked hard enough.

A perfect illustration is the research paper under review, Securing Linux with a Faster and Scalable Iptables. Before diving into the paper, however, some background might be good. Consider the situation where you want to filter traffic being transmitted to and by a virtual workload of some kind, as shown below.

To move a packet from the user space into the kernel, the packet itself must be copied into some form of memory that processes on “both sides of the divide” can read, then the entire state of the process (memory, stack, program execution point, etc.) must be pushed into a local memory space (stack), and control transferred to the kernel. This all takes time and power, of course.

In the current implementation of packet filtering, netfilter performs the majority of filtering within the kernel, while iptables acts as a user frontend as well as performing some filtering actions in the user space. Packets being pushed from one interface to another must make the transition between the user space and the kernel twice. Interfaces like XDP aim to make the processing of packets faster by shortening the path from the virtual workload to the PHY chipset.

What if, instead of putting the functionality of iptables in the user space you could put it in the kernel space? This would make the process of switching packets through the device faster, because you would not need to pull packets out of the kernel into a user space process to perform filtering.

But there are trade-offs. According to the authors of this paper, there are three specific challenges that need to be addressed. First, users expect iptables filtering to take place in the user process. If a packet is transmitted between virtual workloads, the user expects any filtering to take place before the packet is pushed to the kernel to be carried across the bridge, and back out into user space to the second process, Second, a second process, contrack, checks the existence of a TCP connection, which iptables then uses to determine whether a packet that is set to drop because there no existing connection. This give iptables the ability to do stateful filtering. Third, classification of packets is very expensive; classifying packets could take too much processing power or memory to be done efficiently in the kernel.
To resolve these issues, the authors of this paper propose using an in-kernel virtual machine, or eBPF. They design an architecture which splits iptables into to pipelines, and ingress and egress, as shown in the illustration taken from the paper below.

As you can see, the result is… complex. Not only are there more components, with many more interaction surfaces, there is also the complexity of creating in-kernel virtual machines—remembering that virtual machines are designed to separate out processing and memory spaces to prevent cross-application data leakage and potential single points of failure.
That these problems are solvable is not in question—the authors describe how they solved each of the challenges they laid out. The question is: are the trade-offs worth it?

The bottom line: when you move filtering from the network to the host, you are not moving the problem from a place where it is less complex. You may make the network design itself less complex, and you may move filtering closer to the application, so some specific security problems are easier to solve, but the overall complexity of the system is going way up—particularly if you want a high performance solution.

IPv6 Backscatter and Address Space Scanning

Russ — Wed, 09 Oct 2019 17:00:39 +0000

Backscatter is often used to detect various kinds of attacks, but how does it work? The paper under review today, Who Knocks at the IPv6 Door, explains backscatter usage in IPv4, and examines how effectively this technique might be used to detect scanning of IPv6 addresses, as well. The best place to begin is with an explanation of backscatter itself; the following network diagram will be helpful—

Assume A is scanning the IPv4 address space for some reason—for instance, to find some open port on a host, or as part of a DDoS attack. When A sends an unsolicited packet to C, a firewall (or some similar edge filtering device), C will attempt to discover the source of this packet. It could be there is some local policy set up allowing packets from A, or perhaps A is part of some domain none of the devices from C should be connecting to. IN order to discover more, the firewall will perform a reverse lookup. To do this, C takes advantage of the PTR DNS record, looking up the IP address to see if there is an associated domain name (this is explained in more detail in my How the Internet Really Works webinar, which I give every six months or so). This reverse lookup generates what is called a backscatter—these backscatter events can be used to find hosts scanning the IP address space. Sometimes these scans are innocent, such as a web spider searching for HTML servers; other times, they could be a prelude to some sort of attack.

Kensuke Fukuda and John Heidemann. 2018. Who Knocks at the IPv6 Door?: Detecting IPv6 Scanning. In Proceedings of the Internet Measurement Conference 2018 (IMC ’18). ACM, New York, NY, USA, 231-237. DOI: https://doi.org/10.1145/3278532.3278553

Scanning the IPv6 address space is much more difficult because there are 2¹²⁸ addresses rather than 2³². The paper under review here is one of the first attempts to understand backscatter in the IPv6 address space, which can lead to a better understanding of the ways in which IPv6 scanners are optimizing their search through the larger address space, and also to begin understanding how backscatter can be used in IPv6 for many of the same purposes as it is in IPv4.

The researchers begin by setting up a backscatter testbed across a subset of hosts for which IPv4 backscatter information is well-known. They developed a set of heuristics for identifying the kind of service or host performing the reverse DNS lookup, classifying them into major services, content delivery networks, mail servers, etc. They then examined the number of reverse DNS lookups requested versus the number of IP packets each received.

It turns out that about ten times as many backscatter incidents are reported for IPv4 than IPv6, which either indicates that IPv6 hosts perform reverse lookup requests about ten times less often than IPv4 hosts, or IPv6 hosts are ten times less likely to be monitored for backscatter events. Either way, this result is not promising—it appears, on the surface, that IPv6 hosts will be less likely to cause backscatter events, or IPv6 backscatter events are ten times less likely to be reported. This could indicate that widespread deployment of IPv6 will make it harder to detect various kinds of attacks on the DFZ. A second result from this research is that using backscatter, the researchers determined IPv6 scanning is increasing over time; while the IPv6 space is not currently a prime target for attacks, it might become more so over time, if the scanning rate is any indicator.

The bottom line is—IPv6 hosts need to be monitored as closely, or more closely than IPv6 hosts, for scanning events. The techniques used for scanning the IPv6 address space are not well understood at this time, either.

The Floating Point Fix

Russ — Mon, 15 Jul 2019 17:00:22 +0000

Floating point is not something many network engineers think about. In fact, when I first started digging into routing protocol implementations in the mid-1990’s, I discovered one of the tricks you needed to remember when trying to replicate the router’s metric calculation was always round down. When EIGRP was first written, like most of the rest of Cisco’s IOS, was written for processors that did not perform floating point operations. The silicon and processing time costs were just too high.

What brings all this to mind is a recent article on the problems with floating point performance over at The Next Platform by Michael Feldman. According to the article:

While most programmers use floating point indiscriminately anytime they want to do math with real numbers, because of certain limitations in how these numbers are represented, performance and accuracy often leave something to be desired.

For those who have not spent a lot of time in the coding world, a floating point number is one that has some number of digits after the decimal. While integers are fairly easy to represent and calculate over in the binary processors use, floating point numbers are much more difficult, because floating point numbers are very difficult to represent in binary. The number of bits you have available to represent the number makes a very large difference in accuracy. For instance, if you try to store the number 101.1 in a float, you will find the number stored is actually 101.099998 To store 101.1, you need a double, which is twice as long as a float

Okay—this is all might be fascinating, but who cares? Scientists, mathematicians, and … network engineers do, as a matter of fact. Fist, carrying around double floats to store numbers with higher precision means a lot more network traffic. Second, when you start looking at timestamps and large amounts of telemetry data, the efficiency and accuracy of number storage becomes a rather big deal.


Okay, so the current floating point storage format, called IEEE754, is inaccurate and rather inefficient. What should be done about this? According to the article, John Gustafson, a computer scientist, has been pushing for the adoption of a replacement called posits. Quoting the article once again:
It does this by using a denser representation of real numbers. So instead of the fixed-sized exponent and fixed-sized fraction used in IEEE floating point numbers, posits encode the exponent with a variable number of bits (a combination of regime bits and the exponent bits), such that fewer of them are needed, in most cases. That leaves more bits for the fraction component, thus more precision.
Did you catch why this is more efficient? Because it uses a variable length field. In other words, posits replaces a fixed field structure (like what was originally used in OSPFv2) with a variable length field (like what is used in IS-IS). While you must eat some space in the format to carry the length, the amount of "unused space" in current formats overwhelms the space wasted, resulting in an improvement in accuracy. Further, many numbers that require a double today can be carried in the size of a float. Not only does using a TLV format increase accuracy, it also increases efficiency.
From the perspective of the State/Optimization/Surface (SOS) tradeoff, there should be some increase in complexity somewhere in the overall system—if you have not found the tradeoffs, you have not looked hard enough. Indeed, what we find is there is an increase in the amount of state being carried in the data channel itself; there is additional state, and additional code that knows how to deal with this new way of representing numbers.
It's always interesting to find situations in other information technology fields where discussions parallel to discussions in the networking world are taking place. Many times, you can see people encountering the same design tradeoffs we see in network engineering and protocol design.



Design Intelligence from the Hourglass Model
Russ — Mon, 08 Jul 2019 17:00:23 +0000
Over at the Communications of the ACM, Micah Beck has an article up about the hourglass model. While the math is quite interesting, I want to focus on transferring the observations from the realm of protocol and software systems development to network design. Specifically, start with the concept and terminology, which is very useful. Taking a typical design, such as this—

The first key point made in the paper is this—
The thin waist of the hourglass is a narrow straw through which applications can draw upon the resources that are available in the less restricted lower layers of the stack.
A somewhat obvious point to be made here is applications can only use services available in the spanning layer, and the spanning layer can only build those services out of the capabilities of the supporting layers. If fewer applications need to be supported, or the applications deployed do not require a lot of “fancy services,” a weaker spanning layer can be deployed. Based on this, the paper observes—
The balance between more applications and more supports is achieved by first choosing the set of necessary applications N and then seeking a spanning layer sufficient for N that is as weak as possible. This scenario makes the choice of necessary applications N the most directly consequential element in the process of defining a spanning layer that meets the goals of the hourglass model.
Beck calls the weakest possible spanning layer to support a given set of applications the minimally sufficient spanning layer (MSSL). There is one thing that seems off about this definition, however—the correlation between the number of applications supported and the strength of the spanning layer. There are many cases where a network supports thousands of applications, and yet the network itself is quite simple. There are many other cases where a network supports just a few applications, and yet the network is very complex. It is not the number of applications that matter, it is the set of services the applications demand from the spanning layer.
Based on this, we can change the definition slightly: an MSSL is the weakest spanning layer that can provide the set of services required by the applications it supports. This might seem intuitive or obvious, but it is often useful to work these kinds of intuitive things out, so they can be expressed more precisely when needed.
First lesson: the primary driver in network complexity is application requirements. To make the network simpler, you must reduce the requirements applications place on the network.
There are, however, several counter-intuitive cases here. For instance, TCP is designed to emulate (or abstract) a circuit between two hosts—it creates what appears to be a flow controlled, error free channel with no drops on top of IP, which has no flow control, and drops packets. In this case, the spanning layer (IP), or the wasp waist, does not support the services the upper layer (the application) requires.
In order to make this work, TCP must add a lot of complexity that would normally be handled by one of the supporting layers—in fact, TCP might, in some cases, recreate capabilities available in one of the supporting layers, but hidden by the spanning layer. There are, as you might have guessed, tradeoffs in this neighborhood. Not only are the mechanisms TCP must use more complex that the ones some supporting layer might have used, TCP represents a leaky abstraction—the underlying connectionless service cannot be completely hidden.
Take another instance more directly related to network design. Suppose you aggregate routing information at every point where you possibly can. Or perhaps you are using BGP route reflectors to manage configuration complexity and route counts. In most cases, this will mean information is flowing through the network suboptimally. You can re-optimize the network, but not without introducing a lot of complexity. Further, you will probably always have some form of leaky abstraction to deal with when abstracting information out of the network.
Second lesson: be careful when stripping information out of the spanning layer in order to simplify the network. There will be tradeoffs, and sometimes you end up with more complexity than what you started with.
A second counter-intuitive case is that of adding complexity to the supporting layers in order to ultimately simplify the spanning layer. It seems, on the model presented in the paper, that adding more services to the spanning layer will always end up adding more complexity to the entire system. MPLS and Segment Routing (SR), however, show this is not always true. If you need traffic steering, for instance, it is easier to implement MPLS or SR in the support layer rather than trying to emulate their services at the application level.
Third lesson: sometimes adding complexity in a lower layer can simplify the entire system—although this might seem to be counter-intuitive from just examining the model.
The bottom line: complexity is driven by applications (top down), but understanding the full stack, and where interactions take place, can open up opportunities for simplifying the overall system. The key is thinking through all parts of the system carefully, using effective mental models to understand how they interact (interaction surfaces), and the considering the optimization tradeoffs you will make by shifting state to different places.



DORA, DevOps, and Lessons for Network Engineers
Russ — Mon, 01 Jul 2019 17:00:28 +0000
DevOps Research and Assessment (DORA) released their 2018 Accelerate report on the state of DevOps at the end of 2018; I’m a little behind in my reading, so I just got around to reading it, and trying to figure out how to apply their findings to the infrastructure (networking) side of the world.
DORA found organizations that outsource entire functions, such as building an entire module or service, tend to perform more poorly than organizations that outsource by integrating individual developers into existing internal teams (page 43). It is surprising companies still think outsourcing entire functions is a good idea, given the many years of experience the IT world has with the failures of this model. Outsourced components, it seems, too often become a bottleneck in the system, especially as contracts constrain your ability to react to real-world changes. Beyond this, outsourcing an entire function not only moves the work to an outside organization, but also the expertise. Once you have lost critical mass in an area, and any opportunity for employees to learn about that area, you lose control over that aspect of your system.
DORA also found a correlation between faster delivery of software and reduced Mean Time To Repair (MTTR) (page 19). On the surface, this makes sense. Shops that delivery software continuously are bound to have faster, more regularly exercised processes in place for developing, testing, and rolling out a change. Repairing a fault or failure requires change; anything that improves the speed of rolling out a change is going to drive MTTR down.
Organizations that emphasize monitoring and observability tended to perform better than others (page 55). This has major implications for network engineering, where telemetry and management are often “bolted on” as an afterthought, much like security. This is clearly not optimal, however—telemetry and network management need to be designed and operated like any other application. Data sources, stores, presentation, and analysis need to be segmented into separate services, so new services can be tried out on top of existing data, and new sources can feed into existing services. Network designers need to think about how telemetry will flow through the management system, including where and how it will originate, and what it will be used for.
These observations about faster delivery and observability should drive a new way of thinking about failure domains; while failure domains are often primarily thought of as reducing the “blast radius” when a router or link fails, they serve two much larger roles. First, failure domain boundaries are good places to gather telemetry because this is where information flows through some form of interaction surface between two modules. Information gathered at a failure domain boundary will not tend to change as often, and it will often represent the operational status of the entire module.
Second, well places failure domain boundaries can be used to stake out areas where “new things” can be put in operation with some degree of confidence. If a network has well-designed failure domain boundaries, it is much easier to deploy new software, hardware, and functionality in a controlled way. This enables a more agile view of network operations, including the ability to roll out changes incrementally through a canary process, and to use processes like chaos monkey to understand and correct unexpected failure modes.
Another interesting observation is the j-curve of adoption (page 3):

This j-curve shows the “tax” of building the underlying structures needed to move from a less automated state to a more automated one. Keith’s Law:
In a complex system, the cumulative effect of a large number of small optimizations is externally indistinguishable from a radical leap.
…operates in part because of this j-curve. Do not be discouraged if it seems to take a lot of work to make small amounts of progress in many stages of system development—the results will come later.
The bottom line: it might seem like a report about software development is too far outside the realm of network engineering to be useful—but the reality is network engineers can learn a lot about how to design, build, and operate a network from software engineers.



Research: Legal Barriers to RPKI Deployment
Russ — Wed, 09 Jan 2019 18:00:26 +0000
Much like most other problems in technology, securing the reachability (routing) information in the internet core as much or more of a people problem than it is a technology problem. While BGP security can never be perfect (in an imperfect world, the quest for perfection is often the cause of a good solution’s failure), there are several solutions which could be used to provide the information network operators need to determine if they can trust a particular piece of routing information or not. For instance, graph overlays for path validation, or the RPKI system for origin validation. Solving the technical problem, however, only carries us a small way towards “solving the problem.”
One of the many ramifications of deploying a new system—one we do not often think about from a purely technology perspective—is the legal ramifications. Assume, for a moment, that some authority were to publicly validate that some address, such as 2001:db8:3e8:1210::/64, belongs to a particular entity, say bigbank, and that the AS number of this same entity is 65000. On receiving an update from a BGP peer, if you note the route to x:1210::/64 ends in AS 65000, you might think you are safe in using this path to reach destinations located in bigbank’s network.
What if the route has been hijacked? What if the validator is wrong, and has misidentified—or been fooled into misidentifying—the connection between AS65000 and the x:1210::/64 route? What if, based on this information, critical financial information is transmitted to an end point which ultimately turns out to be an attacker, and this attacker uses this falsified routing information to steal millions (or billions) of dollars?
Yoo, Christopher S., and David A. Wishnick. 2019. “Lowering Legal Barriers to RPKI Adoption.” SSRN Scholarly Paper ID 3308619. Rochester, NY: Social Science Research Network. https://papers.ssrn.com/abstract=3309813.
Who is responsible? This legal question ultimately plays into the way numbering authorities allow the certificates they issue to be used. Numbering authorities—specifically ARIN, which is responsible for numbering throughout North America—do not want the RPKI data misused in a way that can leave them legally responsible for the results. Some background is helpful.
The RPKI data, in each region, is stored in a database; each RPKI object (essentially and loosely) contains an origin AS/IP address pair. These are signed using a private key and can be validated using the matching public key. Somehow the public key itself must be validated; ultimately, there is a chain, or hierarchy, of trust, leading to some sort of root. The trust anchor is described in a file called the Trust Anchor Locator, or TAL. ARIN wraps access to their TAL in a strong indemnification clause to protect themselves from the sort of situation described above (and others). Many companies, particularly in the United States, will not accept the legal contract involved without a thorough investigation of their own culpability in any given situation involving misrouting traffic, which ultimately means many companies will simply not use the data, and RPKI is not deployed.
The essential point the paper makes is: is this clause really necessary? Thy authors make several arguments towards removing the strict legal requirements around the use of the data in the TAL provided by ARIN. First, they argue the bounds of potential liability are uncertain, and will shift as the RPKI is more widely deployed. Second, they argue the situations where harm can come from use of the RPKI data needs to be more carefully framed and understood, and how these kinds of legal issues have been used in the past. To this end, the authors argue strict liability is not likely to be raised, and negligence liability can probably be mitigated. They offer an alternative mechanism using straight contract law to limit the liability to ARIN in situations where the RPKI data is misused or incorrect.
Whether this paper causes ARIN to rethink its legal position or not is yet to be seen. At the same time, while these kinds of discussions often leave network engineers flat-out bored, the implications for the Internet are important. This is an excellent example of an intersection between technology and policy, a realm network operators and engineers need to pay more attention to.



Research: BGP Routers and Parrots
Russ — Wed, 05 Dec 2018 16:00:02 +0000
The BGP specification suggests implementations should have three tables: the adj-rib-in, the loc-rib, and the adj-rib-out. The first of these three tables should contain the routes (NLRIs and attributes) transmitted by each of the speaker’s peers. The second table should contain the calculated best paths; these are the routes that will be (or are) installed in the local routing table and used to build a forwarding table. The third table contains the routes which have been sent to each peering speaker. Why three tables? Routing protocols standards are (sometimes—not always) written to provide the maximum clarity to how the protocol works to someone who is writing an implementation. Not every table or process described in the specification is implemented, or implemented the way it is described.
What happens when you implement things in a different way than the specification describes? In the case of BGP and the three RIBs, you can get duplicated BGP updates. What do parrots and BGP have in common describes two situations where the lack of a adj-rib-out can cause duplicate BGP updates to be sent.
David Hauweele, Bruno Quoitin, Cristel Pelsser, and Randy Bush. 2016. “What Do Parrots and BGP Rotuers Have in Common?” Computer Communications Review, July. http://ccracmsigcomm.info.ucl.ac.be/wp-content/uploads/2016/07/sigcomm-ccr-paper26.pdf.
The authors of this paper begin by observing BGP updates from a full feed off the default free zone. The configuration of the network, however, is designed to provide not only the feed from a BGP speaker, but also the routes received by a BGP speaker, as shown in the illustration below.

In this figure, all the labeled routers are in separate BGP autonomous systems, and the links represent physical connections as well as eBGP sessions. The three BGP updates received by D are stored in three different logs which are time stamped so they can be correlated. The researchers found two instances where duplicate BGP updates were received at D.
In the first case, the best path at C switches between A and B because of the Multiple Exit Discriminator (MED), but the remainder of the update remains the same. C, however, strips the MED before transmitting the route to D, so D simply sees what appears to be duplicate updates. In the second case, the next hop changes because of an implicit withdraw based on a route change for the previous best path. For instance, C might choose A as the best path, but then A implicitly withdraws its path, leaving the path through B as the best. When this occurs, C recalculates the best path and sends it to D; since the next hop is stripped when C advertises the new route to D, this appears to be a duplicate at D.
In both of these cases, if C had an adj-rib-out, it would find the duplicate advertisement and squash it. However, since C has no record of what it has sent to D in the past, it must send information about all local best path changes to D. While this might seem like a trivial amount of processing, these additional updates can add enough load during link flap situations to make a material difference in processor utilization or speed of convergence.
Why do implementors decide not to include an adj-rib-out in their implementations, or why, when one is provided, do operators disable the adj-rib-out? Primarily because the adj-rib-out consumes local memory; it is cheaper to push the work to a peer than it is to keep local state that might only rarely be used. This is a classic case of reducing the complexity of the local implementation by pushing additional state (and hence complexity) into the overall system. The authors of the paper suggest a better balance might be achieved if implementations kept a small cache of the most recent updates transmitted to an adjacent speaker; this would allow the implementation to reduce memory usage, while also allowing it to prevent repeating recent updates.



CAA Records and Site Security
Russ — Mon, 19 Nov 2018 18:00:16 +0000
The little green lock—now being deprecated by some browsers—provides some level of comfort for many users when entering personal information on a web site. You probably know the little green lock means the traffic between the host and the site is encrypted, but you might not stop to ask the fundamental question of all cryptography: using what key? The quality of an encrypted connection is no better than the quality and source of the keys used to encrypt the data carried across the connection. If the key is compromised, then entire encrypted session is useless.
So where does the key pair come from to encrypt the session between a host and a server? The session key used for symmetric cryptography on each session is obtained using the public key of the server (thus through asymmetric cryptography). How is the public key of the server obtained by the host? Here is where things get interesting.
The older way of doing things was for a list of domains who were trusted to provide a public key for a particular server was carried in HTTP. The host would open a session with a server, which would then provide a list of domains where its public key could be found in the opening HTTP packets. The host would then find one of those hosts, and hence the server’s public key. From there, the host could create the correct nonce and other information to form a session key with the server. If you are quick on the security side, you might note a problem with this solution: if the HTTP session itself is somehow hijacked early in the setup process, a man-in-the-middle could substitute its own host list for the one the server provides. Once this substitution is done, the MITM could set up perfectly valid encrypted sessions with both the host and the server, funneling traffic between them. The MITM now has full access to the unencrypted data flowing through the session, even though the traffic is encrypted as it flows over the rest of the ‘net.
To solve this problem, a new method for finding the server’s public key was designed around 2010. In this method, the host requests the Certificate Authority Authorization (CAA) record from the server’s DNS server. This record lists the domains who are authorized to provide a public key, or certificate, for the servers within a domain. Thus, if you purchase your certificates from BigCertProvider, you would list BigCertProvider’s domain in your CAA. The host can then find the correct DNS record, and retrieve the correct certificate from the DNS system. This cuts out the possibility of a MITM attacking the HTTP session during the initial setup phases. If DNSSEC is deployed, the DNS records should also be secured, preventing MITM attacks from that angle, as well.
The paper under review today examines the deployment of CAA records in the wild, to determine how widely CAAs are deployed and used.
Scheitle, Quirin, Taejoong Chung, Jens Hiller, Oliver Gasser, Johannes Naab, Roland van Rijswijk-Deij, Oliver Hohlfeld, et al. 2018. “A First Look at Certification Authority Authorization (CAA).” SIGCOMM Comput. Commun. Rev. 48 (2): 10–23. https://doi.org/10.1145/3213232.3213235.
In this paper, a group of researchers put the CAA system to the test to see just how reliable the information is. In their first test, they attempted to request certificates that would cause the issuer to issue invalid certificates in some way; they found that many certificate providers will, in fact, issue such invalid certificates for various reasons. For instance, in one case, they discovered a defect in the provider’s software that allowed their automated system to issue invalid certificates.
In their second test, they examined the results of DNS queries to determine if DNS operators were supporting and returning CAA certificates. They discovered that very few certificate authorities deploy security controls on CAA lookups, leaving open the possibility of the lookups themselves being hijacked. Finally, they examine the deployment of CAA in the wild by web site operators. They found CAA is not widely deployed, with CAA records covering around 40,000 domains. DNSSEC and CAA deployment generally overlap, pointing to a small section of the global ‘net that is concerned about the security of their web sites.
Overall, the results of this study were not heartening for the overall security of the ‘net. While the HTTP based mechanism of discovering a server’s certificate is being deprecated, not many domains have started deploying the CAA infrastructure to replace it—in fact, only a small number of DNS providers support users entering their CAA certificate into their domain records.



Research: Measuring IP Liveness
Russ — Mon, 12 Nov 2018 18:00:50 +0000
Of the 4.2 billion IPv4 addresses available in the global space, how many are used—or rather, how many are “alive?” Given the increasing usage of IPv6, it might seem this is an unimportant question. Answering the question, however, resolves to another question that is actually more important: how can you determine whether or not an IP address is in use? This question might seem easy to answer: ping every address in the address space. This, however, turns out to be the wrong answer.
Scanning the Internet for Liveness. SIGCOMM Comput. Commun. Rev. 48, 2 (May 2018), 2-9. DOI: https://doi.org/10.1145/3213232.3213234
This answer is wrong because a substantial number of systems do not respond to ICMP requests. According to this paper, in fact, some 16% of the hosts they discovered that would respond to a TCP SYN, and another 2% that would respond to a UDP packet shaped to connect to a service, do not respond to ICMP requests. There are a number of possible reasons for this situation, including hosts being placed behind devices that block ICMP packets, hosts being configured not to respond to ICMP requests, or a server sitting behind a PAT or CGNAT device that only passes through service requests rather than all packets. 
The paper begins by building a taxonomy of liveness, describing the process they use to determine if an address is in use or not, as shown in the image replicated from the paper. 


One problem of note is that address usage can shift over time; between trying to use ICMP and a TCP SYN to determine if an IP address is in use, the device connected to that address can change. To limit the impact of this problem, the researchers sent each kind of liveness test to the same address close together in time. The authors then attempt to cross reference the liveness indicated using different techniques to an overall view of liveness for a particular address.
The research resulted in a number of interesting observations, such as the 16% of hosts that respond to TCP SYN probes on some port, but do not respond to ICMP requests. The kinds of ICMP and TCP responses was also quite interesting; many TCP implementations do not seem compliant to the TCP specification in how they respond to a SYN request.
Along the way, the authors added new capabilities to ZMap which allow them to perform these measurements. The tool they used has a web based frontend, and can be accessed here. 
The results are interesting for network operators because they indicate the kinds of work required to find all the devices attached to a network using IP addresses—a mass ping utility is simply not enough. The tools developed here, and the lessons learned, can be added to the set of tools used by operators in all networks to better understand their IP address usage, and the shape of their networks.



BGP Hijacks: Two more papers consider the problem
Russ — Mon, 05 Nov 2018 18:00:32 +0000
The security of the global Default Free Zone DFZ) has been a topic of much debate and concern for the last twenty years (or more). Two recent papers have brought this issue to the surface once again—it is worth looking at what these two papers add to the mix of what is known, and what solutions might be available. The first of these—
Demchak, Chris, and Yuval Shavitt. 2018. “China’s Maxim – Leave No Access Point Unexploited: The Hidden Story of China Telecom’s BGP Hijacking.” Military Cyber Affairs 3 (1). https://doi.org/10.5038/2378-0789.3.1.1050.
—traces the impact of Chinese “state actor” effects on BGP routing in recent years. 
cross posted to CircleID
Whether these are actual attacks, or mistakes from human error for various reasons generally cannot be known, but the potential, at least, for serious damage to companies and institutions relying on the DFZ is hard to overestimate. This paper lays out the basic problem, and the works through a number of BGP hijacks in recent years, showing how they misdirected traffic in ways that could have facilitated attacks, whether by mistake or intentionally. For instance, quoting from the paper—

Starting from February 2016 and for about 6 months, routes from Canada to Korean government sites were hijacked by China Telecom and routed through China.
On October 2016, traffic from several locations in the USA to a large Anglo-American bank
headquarters in Milan, Italy was hijacked by China Telecom to China.
Traffic from Sweden and Norway to the Japanese network of a large American news organization was hijacked to China for about 6 weeks in April/May 2017.

What impact could such a traffic redirection have? If you can control the path of traffic while a TLS or SSL session is being set up, you can place your server in the middle as an observer. This can, in many situations, be avoided if DNSSEC is deployed to ensure the certificates used in setting up the TLS session is valid, but DNSSEC is not widely deployed, either. Another option is to simply gather encrypted traffic and either attempt to break the key, or use data analytics to understand what the flow is doing (a side channel attack).
What can be done about these kinds of problems? The “simplest”—and most naïve—answer is “let’s just secure BGP.” There are many, many problems with this solution. Some of them are highlighted in the second paper under review—
Bonaventure, Olivier. n.d. “A Survey among Network Operators on BGP Prefix Hijacking – Computer Communication Review.” Accessed November 3, 2018. https://ccronline.sigcomm.org/2018/ccr-january-2018/a-survey-among-network-operators-on-bgp-prefix-hijacking/.
—which illustrates the objections providers have to the many forms of BGP security that have been proposed to this point. The first is, of course, that it is expensive. The ROI of the systems proposed thus far are very low; the cost is high, and the benefit to the individual provider is rather low. There is both a race to perfection problem here, as well as a tragedy of the commons problem. The race to perfection problem is this: we will not design, nor push for the deployment of, any system which does not “solve the problem entirely.” This has been the mantra behind BGPSEC, for instance. But not only is BGPSEC expensive—I would say to the point of being impossible to deploy—it is also not perfect.
The second problem in the ROI space is the tragedy of the commons. I cannot do much to prevent other people from misusing my routes. All I can really do is stop myself and my neighbors from misusing other people’s routes. What incentive do I have to try to make the routing in my neighborhood better? The hope that everyone else will do the same. Thus, the only way to maintain the commons of the DFZ is for everyone to work together for the common good. This is difficult. Worse than herding cats.
A second point—not well understood in the security world—is this: a core point of DFZ routing is that when you hand your reachability information to someone else, you lose control over that reachability information. There have been a number of proposals to “solve” this problem, but it is a basic fact that if you cannot control the path traffic takes through your network, then you have no control over the profitability of your network. This tension can be seen in the results of the survey above. People want security, but they do not want to release the information needed to make security happen. Both realities are perfectly rational!
Part of the problem with the “more strict,” and hence (considered) “more perfect” security mechanisms proposed is simply this: they are not quiet enough. They expose far too much information. Even systems designed to prevent information leakage ultimately leak too much.
So… what do real solutions on the ground look like?
One option is for everyone to encrypt all traffic, all the time. This is a point of debate, however, as it also damages the ability of providers to optimize their networks. One point where the plumbing allegory for networking breaks down is this: all bits of water are the same. Not all bits on the wire are the same.
Another option is to rely less on the DFZ. We already seem to be heading in this direction, if Geoff Huston and other researchers are right. Is this a good thing, or a bad one? It is hard to tell from this angle, but a lot of people think it is a bad thing.
Perhaps we should revisit some of the proposed BGP security solutions, reshaping some of them into something that is more realistic and deployable? Perhaps—but the community is going to let go of the “but it’s not perfect” line of thinking, and start developing some practical, deployable solutions that don’t leak so much information.
Finally, there is a solution Leslie Daigle and I have been tilting at for a couple of years now. Finding a way to build a set of open source tools that will allow any operator or provider to quickly and cheaply build an internal system to check the routing information available in their neighborhood on the ‘net, and mix local policy with that information to do some bare bones work to make their neighborhood a little cleaner. This is a lot harder than “just build some software” for various reasons; the work is often difficult—as Leslie says, it is largely a matter of herding cats, rather than inventing new things.



Ossification and Fragmentation: The Once and Future ‘net
Russ — Mon, 29 Oct 2018 17:00:26 +0000
Mostafa Ammar, out of Georgia Tech (not my alma mater, but many of my engineering family are alumni there), recently posted an interesting paper titled The Service-Infrastructure Cycle, Ossification, and the Fragmentation of the Internet. I have argued elsewhere that we are seeing the fragmentation of the global Internet into multiple smaller pieces, primarily based on the centralization of content hosting combined with the rational economic decisions of the large-scale hosting services. The paper in hand takes a slightly different path to reach the same conclusion.
cross posted at CircleID
TL;DR[time-span]

Networks are built based on a cycle of infrastructure modifications to support services
When new services are added, pressure builds to redesign the network to support these new services
Networks can ossify over time so they cannot be easily modified to support new services
This causes pressure, and eventually a more radical change, such as the fracturing of the network


 

The author begins by noting networks are designed to provide a set of services. Each design paradigm not only supports the services it was designed for, but also allows for some headroom, which allows users to deploy new, unanticipated services. Over time, as newer services are deployed, the requirements on the network change enough that the network must be redesigned.

This cycle, the service-infrastructure cycle, relies on a well-known process of deploying something that is “good enough,” which allows early feedback on what does and does not work, followed by quick refinement until the protocols and general design can support the services placed on the network. As an example, the author cites the deployment of unicast routing protocols. He marks the beginning of this process as 1962, when Prosser was first deployed, and then as 1995, when BGPv4 was deployed. Across this time routing protocols were invented, deployed, and revised rapidly. Since around 1995, however—a period of over 20 years at this point—routing has not changed all that much. So there were around 35 years of rapid development, followed by what is now over 20 years of stability in the routing realm.
Ossification, for those not familiar with the term, is a form of hardening. Petrified wood is an ossified form of wood. An interesting property of petrified wood is that is it fragile; if you pound a piece of “natural” wood with a hammer, it dents, but does not shatter. Petrified, or ossified, wood shatters, like glass.
Multicast routing is held up as an opposite example. Based on experience with unicast routing, the designers of multicast attempted to “anticipate” the use cases, such that early iterations were clumsy, and failed to attain the kinds of deployment required to get the cycle of infrastructure and services started. Hence multicast routing has largely failed. In other words, multicast ossified too soon; the cycle of experience and experiment was cut short by the designers trying to anticipate use cases, rather than allowing them to grow over time.
Some further examples might be:

IETF drafts and RFCs were once short, and used few technical terms, in the sense of a term defined explicitly within the context of the RFC or system. Today RFCs are veritable books, and require a small dictionary to read.
BGP security, which is mentioned by the author as a victim of ossification, is actually another example of early ossification destroying the experiment/enhancement cycle. Early on, a group of researchers devised the “perfect” BGP security system (which is actually by no means perfect—it causes as many security problems as it resolves), and refused to budge once “perfection” had been reached. For the last twenty years, BGP security has not notably improved; the cycle of trying and changing things has been stopped this entire time.

There are also weaknesses in this argument, as well. It can be argued that the reason for the failure of widespread multicast is because the content just wasn’t there when multicast was first considered—in fact, that multicast content still is not what people really want. The first “killer app” for multicast was replacing broadcast television over the Internet. What has developed instead is video on demand; multicast is just not compelling when everyone is watching something different whenever they want to.
The solution to this problem is novel: break the Internet up. Or rather, allow it to break up. The creation of a single network from many networks was a major milestone in the world of networking, allowing the open creation of new applications. If the Internet were not ossified through business relationships and the impossibility of making major changes in the protocols and infrastructure, it would be possible to undertake radical changes to support new challenges.
The new challenges offered include IoT, the need for content providers to have greater control over the quality of data transmission, and the unique service demands of new applications, particularly gaming. The result has been the flattening of the Internet, followed by the emergence of bypass networks—ultimately leading to the fragmentation of the Internet into many different networks.
Is the author correct? It seems the Internet is, in fact, becoming a group of networks loosely connected through IXPs and some transit providers. What will the impact be on network engineers? One likely result is deeper specialization in sets of technologies—the “enterprise/provider” divide that had almost disappeared in the last ten years may well show up as a divide between different kinds of providers. For operators who run a network that indirectly supports some other business goal (what we might call “enterprise”), the result will be a wide array of different ways of thinking about networks, and an expansion of technologies.
But one lesson engineers can certainly take away is this: the concept of agile must reach beyond the coding realm, and into the networking realm. There must be room “built in” to experiment, deploy, and enhance technologies over time. This means accepting and managing risk rather than avoiding it, and having a deeper understanding of how networks work and why they work that way, rather than the blind focus on configuration and deployment we currently teach.



Research: Tail Attacks on Web Applications
Russ — Wed, 12 Sep 2018 17:00:26 +0000
When you think of a Distributed Denial of Service (DDoS) attack, you probably think about an attack which overflows the bandwidth available on a single link; or overflowing the number of half open TCP sessions a device can have open at once, preventing the device from accepting more sessions. In all cases, a DoS or DDoS attack will involve a lot of traffic being pushed at a single device, or across a single link.
TL;DR[time-span]

Denial of service attacks do not always require high volumes of traffic
An intelligent attacker can exploit the long tail of service queues deep in a web application to bring the service down
These kinds of attacks would be very difficult to detect


 
But if you look at an entire system, there are a lot of places where resources are scarce, and hence are places where resources could be consumed in a way that prevents services from operating correctly. Such attacks would not need to be distributed, because they could take much less traffic than is traditionally required to deny a service. These kinds of attacks are called tail attacks, because they attack the long tail of resource pools, where these pools are much thinner, and hence much easier to attack.
There are two probable reasons these kinds of attacks are not often seen in the wild. First, they require an in-depth knowledge of the system under attack. Most of these long tail attacks will take advantage of the interaction surface between two subsystems within the larger system. Each of these interaction surfaces can also be attack surfaces if an attacker can figure out how to access and take advantage of them. Second, these kinds of attacks are difficult to detect, because they do not require large amounts of traffic, or other unusual traffic flows, to launch.
The paper under review today, Tail Attacks on Web Applications, discusses a model for understanding and creating tail attacks in a multi-tier web application—the kind commonly used for any large-scale frontend service, such as ecommerce and social media.
Huasong Shan, Qingyang Wang, and Calton Pu. 2017. Tail Attacks on Web Applications. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS ’17). ACM, New York, NY, USA, 1725-1739. DOI: https://doi.org/10.1145/3133956.3133968
The figure below illustrates a basic service of this kind for those who are not familiar with it.

The typical application at scale will have at least three stages. The first stage will terminate the user’s session and render content; this is normally some form of modified web server. The second stage will gather information from various backend services (generally microservices), and pass the information required to build the page or portal to the rendering engine. The microservices, in turn, build individual parts of the page, and rely on various storage and other services to supply the information needed.
If you can find some way to clog up the queue at one of the storage nodes, you can cause every other service along the information path to wait on the prior service to fulfill its part of the job in hand. This can cause a cascading effect through the system, where a single node struggling because of full queues can cause an entire set of dependent nodes to become effectively unavailable, cascading to a larger set of nodes in the next layer up. For instance, in the network illustrated, if an attacker can somehow cause the queues at storage service 1 to fill up, even for a moment, this can cascade into a backlog of work at services 1 and 2, cascading into a backlog at the front-end service, ultimately slowing—or even shutting—the entire service down. The queues at storage service 1 may be the same size as every other queue in the system (although they are likely smaller, as they face internal, rather than external, services), but storage system 1 may be servicing many hundreds, perhaps thousands, of copies of services 1 and 2.
The queues at storage service 1—and all the other storage services in the system—represent a hidden bottleneck in the overall system. If an attacker can, for a few moments at a time, cause these internal, intra-application queue to fill up, the overall service can be made to slow down to the point of being almost unusable.
How plausible is this kind of attack? The researchers modeled a three-stage system (most production systems have more than three stages) and examined the total queue path through the system. By examining the queue depths at each stage, they devised a way to fill the queues at the first stage in the system by sending millibursts of valid sessions requests to the rend engine, or the use facing piece of the application. Even if these millibursts are spread out across the edge of the application, so long as they are all the same kind of requests, and timed correctly, they can bring the entire system down. In the paper, the researchers go further and show that once you understand the architecture of one such system, it is possible to try different millibursts on a running system, causing the same DoS effect.
This kind of attack, because it is built out of legitimate traffic, and it can be spread across the entire public facing edge of an application, would be nearly impossible to detect or counter at the network edge. One possible counter to this kind of attack would be increasing capacity in the deeper stages of the application. This countermeasure could be expensive, as the data must be stored on a larger number of servers. Further, data synchronized across multiple systems will subject to CAP limitations, which will ultimately limit the speed at which the application can run anyway. Operators could also consider fine grained monitoring, which increases the amount of telemetry that must be recovered from the network and processed—another form of monetary tradeoff.
 



Research: DNSSEC in the Wild
Russ — Wed, 05 Sep 2018 17:00:43 +0000
The DNS system is, unfortunately, rife with holes like Swiss Cheese; man-in-the-middle attacks can easily negate the operation of TLS and web site security. To resolve these problems, the IETF and the DNS community standardized a set of cryptographic extensions to cryptographically sign all DNS records. These signatures rely on public/private key pairs that are transitively signed (forming a signature chain) from individual subdomains through the Top Level Domain (TLD). Now that these standards are in place, how heavily is DNSSEC being used in the wild? How much safer are we from man-in-the-middle attacks against TLS and other transport encryption mechanisms?
TL;DR[time-span]

DNSSEC is enabled on most top level domains
However, DNSSEC is not widely used or deployed beyond these TLDs


 
Crossposted at CircleID
Three researchers published an article in Winter ;login; describing their research into answering this question (membership and login required to read the original article). The result? While more than 90% of the TLDs in DNS are DNSEC enabled, DNSSEC is still not widely deployed or used. To make matter worse, where it is deployed, it isn’t well deployed. The article mentions two specific problems that appear to plague DNSSEC implementations.
First, on the server side, a number of domains either deploy weak or expired keys. An easily compromised key is often worse than having no key at all; there is no way to tell the difference between a key that has or has not been compromised. A weak key that has been compromised does not just impact the domain in question, either. If the weakly protected domain has subdomains, or its key is used to validate other domains in any way, the entire chain of trust through the weak key is compromised. Beyond this, there is a threshold over which a system cannot pass without the entire system, itself, losing the trust of its users. If 30% of the keys returned in DNS are compromised, for instance, most users would probably stop trusting any DNSSEC signed information. While expired keys are more obvious that weak keys, relying on expired keys still works against user trust in the system.
Second, DNSSEC is complex. The net result of a complex protocol combined with low deployment and demand on the server side is poor implementations in client implementations. Many implementations, according to the research in this paper, simply ignore failures in the certification validation process. Some of the key findings of the paper are—

One-third of the DNSSEC enabled domains produce responses that cannot be validated
While TLD operators widely support DNSSEC, registrars who run authoritative servers rarely support DNSSEC; thus the chain of trust often fails at the fist hop in the resolution process beyond the TLD
Only 12% of the resolvers that request DNSSEC records in the query process validate them

To discover the deployment of DNSSEC, the researchers built an authoritative DNS server and a web server to host a few files. They configured subdomains on the authoritative server; some subdomains were configured correctly, while others were configured incorrectly (a certificate was missing, expired, malformed, etc.). By examining DNS requests for the subdomains they configured, they could determine which DNS resolvers were using the included DNSSEC information, and which were not.
Based on their results, the authors of this paper make some specific recommendations, such as enabling DNSSEC on all resolvers, such as the recursive servers your company probably operates for internal and external use. Owners of domain names should also ask their registrars to support DNSSEC on their authoritative servers.
Ultimately, it is up to the community of operators and users to make DNSSEC a reality in the ‘net.