Weekend Reads 071318: Nice to Haves

I had about four hours of highway driving yesterday. Even though I probably could’ve navigated it on my own, I opted to use Apple Maps, which is integrated with my car’s Apple CarPlay “infotainment center.” It was nice. It told me how many miles I had remaining and my expected time of arrival. But it wasn’t a life changer. @The Old Reader

More than ever before Internet users are now interacting with people living/working in other economies. And as a result of these interactions, there are an increasing number of ‘legal contracts’ (intentional or not). Internet policy researchers and academics debate about the changing landscape and the boundaries of the international and domestic laws, without conclusive agreements. —Yeseul Kim @APNIC

The plague that is Spectre continues to evolve and adapt, showing up in two new variants this week dubbed Spectre 1.1 and Spectre 1.2 that follow the original Spectre’s playbook while expanding on the ways they can do damage. —Curtis Franklin Jr. @Dark Reading

These vast routing events that are propagated globally already provide a hint that some ISPs do not set filters at all, or there are vastly malformed AS-SETs. We decided to measure the number of filters that were already bypassed by routing anomalies. To do so, we checked the way route leaks were propagated: if a route leak is received from a customer link and it does not belong to the customer cone then IRR filters were malformed. —Alexander Azimov @APNIC

Recently, a CEO of a roaring unicorn in Silicon Valley drew my attention to the following: “If you compare Amazon’s stock price over the recent years against the cost of housing and the rise of homelessness in Seattle, the progression is identical…” —Frederic Filloux @MondayNote

Why do many problems in life seem to stubbornly stick around, no matter how hard people work to fix them? It turns out that a quirk in the way human brains process information means that when something becomes rare, we sometimes see it in more places than ever. —David Levari @The Conversation

Two web-based attacks against IoT devices made the rounds this week. Researchers Craig Young and Brannon Dorsey showed that a well known attack technique called “DNS rebinding” can be used to control your smart thermostat, detect your home address or extract unique identifiers from your IoT devices. —Gunes Acar

Recent BGP Peering Enhancements

BGP is one of the foundational protocols that make the Internet “go;” as such, it is a complex intertwined system of different kinds of functionality bundled into a single set of TLVs, attributes, and other functionality. Because it is so widely used, however, BGP tends to gain new capabilities on a regular basis, making the Interdomain Routing (IDR) working group in the Internet Engineering Task Force (IETF) one of the consistently busiest, and hence one of the hardest to keep up with. In this post, I’m going to spend a little time talking about one area in which a lot of work has been taking place, the building and maintenance of peering relationships between BGP speakers.

The first draft to consider is Mitigating the Negative Impact of Maintenance through BGP Session Culling, which is a draft in an operations working group, rather than the IDR working group, and does not make any changes to the operation of BGP. Rather, this draft considers how BGP sessions should be torn down so traffic is properly drained, and the peering shutdown has the minimal effect possible. The normal way of shutting down a link for maintenance would be to for administrators to shut down BGP on the link, wait for traffic to subside, and then take the link down for maintenance. However, many operators simply do not have the time or capability to undertake scheduled shutdowns of BGP speakers. To resolve this problem, graceful shutdown capability was added to BGP in RFC8326. Not all implementations support graceful shutdown, however, so this draft suggests an alternate way to shut down BGP sessions, allowing traffic to drain, before a link is shut down: use link local filtering to block BGP traffic on the link, which will cause any existing BGP sessions to fail. Once these sessions have failed, traffic will drain off the link, allowing it to be safely shut down for maintenance. The draft discusses various timing issues in using this technique to reduce the impact of link removal due to maintenance (or other reasons).

Graceful shutdown, itself, is also in line to receive some new capabilities through Extended BGP Administrative Shutdown Communication. This draft is rather short, as it simply allows an operator to send a short freeform message (presumably in text format) along with the standard BGP graceful shutdown notification. This message can be printed on the console, or saved to syslog, to provide an operator with more information about why a particular BGP has been shut down, whether it coming back up again, how long the shutdown is expected to last, etc.

Graceful Restart (GR) is a long available feature in many BGP implementations that aims to prevent the disruption of traffic flow; the original purpose was to handle a route processor restart in a router where the line cards could continue forwarding traffic based on local forwarding tables (the FIB), including cases where one route processor fails, causing the router switches to a backup route processor in the same chassis. Over time, GR began to be applied to NOTIFICATION messages in BGP. For instance, if a BGP speaker receives a malformed message, it is required (by the BGP RFCs) to send a NOTIFICATION, which will cause the BGP session to be torn down and restarted. GR has been adapted to these situations, so traffic flow is either not impacted, or minimally impacted through the NOTIFICATION/session restart process. This same processing takes place for a hold timer timeout in BGP.

The problem is that only one of the two speakers in a restarting pair will normally retain its local forwarding information. The sending speaker will normally flush its local routing tables, and with them its local forwarding tables, on sending a BGP NOTIFICATION. Notification Message support for BGP Graceful Restart changes this processing, allowing both speakers to enter the “receiving speaker” mode, so both speakers would retain their local forwarding information. A signal is provided to allow the sending speaker to indicate the sessions should be hard reset, rather than gracefully reset, if needed.

Finally, BGP allows speakers to send a route with a next hop other than themselves; this is called a third party next hop, and is illustrated in the figure below.

In this network, router C’s best path to 2001:db8:3e8:100::/64 might be through A, but the operator may prefer this traffic pass through B. While it is possible to change the preferences so C chooses the path through B, there are some situations where it is better for A to advertise C as the next hop towards the destination (for instance, a route server would not normally advertise itself as the nexthop towards a destination). The problem with this situation is that B might not have the same capabilities as a BGP speaker as A. If B, for instance, cannot forward for IPv6, the situation shown in the illustration would clearly not work.

To resolve this, BGP Next-Hop dependent capabilities allows a speaker to advertise the capabilities of these alternate next hops to peered BGP speakers.

Complexity Sells

According to Roman philosophers, simplicity is the hallmark of truth. And yet, networks have become ever more complex over time. Why is this? Because complexity sells. In this short take, I talk about why complexity sells, and some of the mental habits you can use to overcome our natural tendency to prefer the complex.

Weekend Reads 070618

Our security analysis of the mobile communication standard LTE ( Long-Term Evolution, also know as 4G) on the data link layer (so called layer two) has uncovered three novel attack vectors that enable different attacks against the protocol. —David Rupprecht, Katharina Kohls, Thorsten Holz, and Christina Pöpper

To be fair, the tech sector has been the United States’ economic pride and joy in recent decades, a seemingly endless wellspring of innovation. The speed and power of Google’s search engine is breathtaking, putting extraordinary knowledge at our fingertips. Internet telephony allows friends, relatives, and co-workers to interact face to face from halfway around the world, at very modest cost. Yet, despite all this innovation, the pace of productivity growth in the broader economy remains lackluster. —Kenneth Rogoth @MarketWatch

The world of scholarly communication is broken. Giant, corporate publishers with racketeering business practices and profit margins that exceed Apple’s treat life-saving research as a private commodity to be sold at exorbitant profits. Only around 25 per cent of the global corpus of research knowledge is ‘open access’, or accessible to the public for free and without subscription, which is a real impediment to resolving major problems, such as the United Nations’ Sustainable Development Goals. —John Tennant @Intellectual Takeout

It’s become increasingly impossible to talk about spectrum policy without getting into the fight over whether 5G is a miracle technology that will end poverty, war and disease or an evil marketing scam by wireless carriers to extort concessions in exchange for magic beans. @Wetmachine

Research: P Fat Trees

Link speeds in data center fabrics continue to climb, with 10g, 25g, 40g, and 100g widely available, and 400g promised in just a few short years. What isn’t so obvious is how these higher speeds are being reached. A 100g link, for instance, is really four 25g links bundled as a single link at the physical layer. If the optics are increasing in speed, and the processors are increasing in their ability to switch traffic, why are these higher speed links being built in this way? According to the paper under investigation today, the reason is the speed of the chips that serialize traffic from and deserialize traffic off the optical medium. The development of the Complementary metal–oxide–semiconductor, of CMOS, chips required to build ever faster optical interfaces seems to have stalled out at around 25g, which means faster speeds must be achieved by bundling multiple lower speed links.

Mellette, William M., Alex C. Snoeren, and George Porter. “P-FatTree: A Multi-Channel Datacenter Network Topology.” In Proceedings of the 15th ACM Workshop on Hot Topics in Networks, 78–84. HotNets ’16. New York, NY, USA: ACM, 2016. https://doi.org/10.1145/3005745.3005746.

The authors then point out that many data operators have moved towards some form of chassis device in order to reduce the costs of cabling and optics. Chassis devices most often use some form of spine and leaf internally to switch traffic between the input and output ports across a short run copper fabric, resulting in a switching path within the chassis router that looks something like the following figure.

The spine and leaf in connecting the switching ASICs are one of the main reasons data center operators move away from chassis devices; the number of hops through the network becomes unstable with the addition of these internal spine and leaf fabrics, the backpressure and quality of service is essentially unmanageable across this fabric on most devices, and there is little in the way of traffic analysis that can be done on this internal fabric. The authors do not address these problems, however.

Rather, they address the added set of switching ASICs in the spine layer of the internal spine and leaf network. As it turns out, the switching ASICs themselves are a major consumer of power, and heat generator, in switches. They argue that removing this internal spine layer would greatly reduce the amount of power required in a fabric, as well as the amount of heat generated.
To do this, they propose unbundling the links attached to each SerDes CMOS chip, exposing them as individual links to the control plane. This would allow the switching path to be shortened to something like the figure below.

Exposing the unbundled links to the external control plane allows each stage of the internal fabric to be treated as another hop in the network, and hence for “normal” ECMP to choose the path through the chassis fabric.

The authors suggest the four unbundled links attached to a single switching ASIC can be treated as a member of a different “switching plane,” which, in effect, creates four virtual topologies across the fabric, each of which is one quarter the speed of the total fabric bandwidth. Each virtual topology could run its own control plane, producing four somewhat redundant networks, and the ability to steer traffic onto any given plane at the edge of the network for traffic engineering, policy separation, or any other purpose. The result is a fabric that is more flexible in use, while retaining a fixed hop count through the fabric, and reducing the ASIC count in the fabric by around one third.

This is an interesting concept, but it would require an entire fabric to be built this way from the ground up; there is little chance of a brown field deployment of this kind of design. One tradeoff in this kind of design would be the additional control plane state, including assigning four addresses to each host (although this might be mitigated by the clever use of anycast), and the maintenance of four control planes, etc. Another design tradeoff would be the shared risk link groups involved in splitting a single optical fiber and ASIC into four circuits—these aren’t exactly “virtual circuits,” but they share many of the same characteristics.


May 2018