Weekend Reads 052518

Without adtech, the EU’s GDPR (General Data Protection Regulation) would never have happened. But the GDPR did happen, and as a result websites all over the world are suddenly posting notices about their changed privacy policies, use of cookies, and opt-in choices for “relevant” or “interest-based” (translation: tracking-based) advertising. Email lists are doing the same kinds of things. @Doc Searl’s Weblog

A newly-uncovered form of DDoS attack takes advantage of a well-known, yet still exploitable, security vulnerability in the Universal Plug and Play (UPnP) networking protocol to allow attackers to bypass common methods for detecting their actions. —Danny Palmer @ZDNet

Today, that’s coming in the form of imperceptible musical signals that can be used to take control of smart devices like Amazon’s Alexa or Apple’s Siri to unlock doors, send money, or any of the other things that we give these wicked machines the authority to do. That’s according to a New York Times report, which says researchers in China and the United States have proven that they’re able to “send hidden commands” to smart devices that are “undetectable to the human ear” simply by playing music. —Sam Barsanti @AVI News

In a paper we recently presented at the Passive and Active Measurement Conference 2018 [PDF 652 KB], we analyzed the certificate ecosystem using CT logs. To perform this analysis we downloaded 600 million certificates from 30 CT logs. This vast certificate set gives us insight into the ecosystem itself and allows us to analyze various certificate characteristics. —Oliver Gasser @APNIC

With cybercrime skyrocketing over the past two decades, companies that do business online — whether retailers, banks, or insurance companies — have devoted increasing resources to improving security and combatting Internet fraud. But sophisticated fraudsters do not limit themselves to the online channel, and many organizations have been slow to adopt effective measures to mitigate the risk of fraud carried out through other channels, such as customer contact centers. In many ways, the phone channel has become the weak link. —Patrick Cox @Dark Reading

Just a few years after Bitcoin emerged, startups began racing to build ASICs for mining the currency. Nearly all of those companies have gone belly-up, however—except Bitmain. The company is estimated to control more than 70 percent of the market for Bitcoin-mining hardware. It also uses its hardware to mine bitcoins for itself. A lot of bitcoins: according to Blockchain.info, Bitmain-affiliated mining pools make up more than 40 percent of the computing power available for Bitcoin mining —Mike Orcutt @Technology Review

Short Take: Security as a Tradeoff

We often treat security as an absolute, “that which must be done, and done perfectly, or is of no value at all.” It’s time to take this myth head on, and think about how we should really think about security.

The Network Collective: State of the Podcast

In this edition of the Network Collective, Eyvonne, Jordan, and I talk about where the ‘cast has been, and share some thoughts on where it is going. While we like technology as much as anyone else, the NC is really all about community.

In particular, we discuss the upcoming subscription service. We have a lot of new, exciting, material being recorded around the skills needed to be a better engineer exclusively for the subscription service. For instance, we’ve started a series on communication that does not take the standard line, but looks at how to communicate from the perspective of our experience in living on every possible side of the network engineering world, and developing and delivering every possible kind of content. And we have our first Q&A guest lined up, as well as a lot of fantastic material from Rachel Traylor already being recorded. This is going to be fantastic material, designed to push your career forward in a way that includes technology, but goes beyond technical skills, as well.

Research: Robustness in Complex Systems

While the network engineering world tends to use the word resilience to describe a system that will support rapid change in the real world, another word often used in computer science is robustness. What makes a system robust or resilient? If you ask a network engineer this question, the most likely answer you will get is something like there is no single point of failure. This common answer, however, does not go “far enough” in describing resilience. For instance, it is at least sometimes the case that adding more redundancy into a network can actually harm MTTR. A simple example: adding more links in parallel can cause the control plane to converge more slowly; at some point, the time to converge can be reduced enough to offset the higher path availability.

In other cases, automating the response to a change in the network can harm MTTR. For instance, we often nail a static route up and redistribute that, rather than redistributing live routing information between protocols. Experience shows that sometimes not reacting automatically is better than reacting automatically.

This post will look at a paper that examines robustness more deeply, Robustness in Complexity Systems,” by Steven Gribble. While this is an older paper—it was written in 2000—it remains a worthwhile read for the lessons in distributed system design. The paper is based on the deployment of a cluster based Distributed Data Structure (DDS). A more convenient way for readers to think of this is as a distributed database. Several problems discovered when building and deploying this DDS are considered, including—

  • A problem with garbage collection, which involved timeouts. The system was designed to allocate memory as needed to form and synchronize records. After the record had been synchronized, any unneeded memory would be released, but would not be reallocated immediately by some other process. Rather, a garbage collection routine would coallesce memory into larger blocks where possible, rearranging items and placing memory back into available pools. This process depends on a timer. What the developers discovered is their initial “guess” at a a good timer was ultimately an order of a magnitude too small, causing some of the nodes to “fall behind” other nodes in their synchronization. Once a node fell behind, the other nodes in the system were required to “take up the slack,” causing them to fail at some point in the future. This kind of cascading failure, triggered by a simple timer setting, is common in a distributed system.
  • A problem with a leaky abstraction from TCP into the transport. The system was designed to attempt to connect on TCP, and used fairly standard timeouts for building TCP connections. However, a firewall in the network was set to disallow inbound TCP sessions. Another process connecting on TCP was passing through this firewall, causing the TCP session connection to fail, and, in turn, causing the TCP stack on the nodes to block for 15 minutes. This interaction of different components caused nodes to fall out of availability for long periods of time.

Gribble draws several lessons from these, and other, outages in the system.

First, he states that for a system to be truly robust, it must use some form of admission control. The load on the system, in other words, must somehow be controlled so more work cannot be given than the system can support. This has been a contentious issue in network engineering. While circuit switched networks can control the amount of work offered to the network (hence a Clos can be non-blocking in a circuit switched network), admission control in a packet switched network is almost impossible. The best you can do is some form of Quality of Service marking and dropping, such as traffic shaping or traffic policing, along the edge. This does highlight the importance of such controls, however.

Second, he states that systems must be systematically overprovisioned. This comes back to the point about redundant links. The caution above, however, still applies; systematic overprovisioning needs to be balanced against other tools to build a robust system. Far too often, overprovisioning is treated as “the only tool in the toolbox.”

Third, he states introspection must be built into the system. The system must be designed to be monitorable from its inception. In network engineering, this kind of thinking is far too often taken to say “everything must be measureable.” This does not go far enough. Network engineers need to think about not only how to measure, but also what they expect normal to look like, and how to tell when “normal” is no longer “normal.” The system must be designed within limits. Far too often, we just build “as large as possible,” and run it to see what happens.

Fourth, Gribbles says adaptivitiy must be provided through a closed control loop. This is what we see in routing protocols, in the sense that the control plane reacts to topology changes in a specific way, or rather within a specific state machine. Learning this part of the network is a crucial, but often skimmed over, part of network engineering.

This is an excellent paper, well worth reading for those who are interested in classic work around the area of robustness and distributed systems.

Weekend Reads 051818: Botnets and Throwhammer

The Facebook freak-out provides an outlet for fears regarding the digital environment we inhabit. A few companies control most channels of information. The gadgets that we use for convenience and entertainment also create the mechanisms for near-total surveillance, from tracking devices in our pockets to wiretaps in our homes—hi, Alexa! Someone besides Santa is watching and knows whether you have been naughty or nice. —Nathanael Blake @Public Discourse

Within just 10 days of the disclosure of two critical vulnerabilities in GPON router at least 5 botnet families have been found exploiting the flaws to build an army of million devices. Security researchers from Chinese-based cybersecurity firm Qihoo 360 Netlab have spotted 5 botnet families, including Mettle, Muhstik, Mirai, Hajime, and Satori, making use of the GPON exploit in the wild. —Swati Khandelwal @The Hacker News

Exploitation of Rowhammer attack just got easier. Dubbed ‘Throwhammer,’ the newly discovered technique could allow attackers to launch Rowhammer attack on the targeted systems just by sending specially crafted packets to the vulnerable network cards over the local area network. Known since 2012, Rowhammer is a severe issue with recent generation dynamic random access memory (DRAM) chips in which repeatedly accessing a row of memory can cause “bit flipping” in an adjacent row, allowing anyone to change the contents of computer memory. —Mohit Kumar @The Hacker News


May 2018

April 2018

The Universal Fat Tree

Have you ever wondered why spine-and-leaf networks are the "standard" for data center networks? While the answer has a lot

Whatever is vOLT-HA?

Many network engineers find the entire world of telecom to be confusing—especially as papers are peppered with a lot of