Papers

  • Science and Complexity

    Complexity theory, in many ways, has its origin in this classic paper from 1948.

    Local Copy
    Public Copy

  • On the criteria to be used in decomposing systems into modules

    This paper discusses modularization as a mechanism for improving the flexibility and comprehensibility of a system while allowing the shortening of its development time. The effectiveness of a "modularization" is dependent upon the criteria used in dividing the system into modules. Two system design problems are presented, and for each, both a conventional and unconventional decomposition are described. It is shown that the unconventional decompositions have distinct advantages for the goals outlined. The criteria used in arriving at the decompositions are discussed. The unconventional decomposition, if implemented with the conventional assumption that a module consists of one or more subroutines, will be less efficient in most cases. An alternative approach to implementation which does not have this effect is sketched.

    Local Copy
    Public Copy

  • End-to-end Arguments in System Design

    This paper presents a design principle that helps guide placement of functions among the modules of a distributed computer system. The principle, called the end-to-end argument, suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level. Examples discussed in the paper include bit error recovery, security using encryption, duplicate message suppression, recovery from system crashes, and delivery acknowledgement. Low level mechanisms to support these functions are justified only as performance enhancements.

    Local Copy
    Public Copy

  • Hints for Computer System Design

    Studying the design and implementation of a number of computer has led to some general hints for system design. They are described here and illustrated by many examples, ranging from hardware such as the Alto and the Dorado to application programs such as Bravo and Star.

    Local Copy
    Public Copy

  • No Silver Bullet: Essence and Accidents of Software Engineering

    All software construction involves essential tasks, the fashioning of the complex conceptual structures that compose the abstract software entity, and accidental tasks, the representation of these abstract entities in programming languages and the mapping of these onto machine languages within space and speed constraints. Most of the big past gains in software productivity have come from removing artificial barriers that have made the accidental tasks inordinately hard, such as severe hardware constraints, awkward programming languages, lack of machine time. How much of what software engineers now do is still devoted to the accidental, as opposed to the essential? Unless it is more than 9/10 of all effort, shrinking all the accidental activities to zero time will not give an order of magnitude improvement.

    Local Copy
    Public Copy

  • Robustness in Complex Systems

    This paper argues that a common design paradigm for systems is fundamentally flawed, resulting in unstable, unpredictable behavior as the complexity of the system grows. In this flawed paradigm, designers carefully attempt to predict the operating environment andfailure modes of the system in order to design its basic operational mechanisms. However, as a system grows in complexity, the diffuse coupling between the components in the system inevitably leads to the butterfly effect, in which small perturbations can result in large changes in behavior. We explore this in the context of distributed data structures, a scalable, cluster-based storage server. We then consider a number of design techniques that help a system to be robust in the face of the unexpected, including overprovisioning, admission control, introspection, adaptivity through closed control loops. Ultimately, however, all complex systems eventually must contend with the unpredictable. Because of this, we believe systems should be designed to cope with failure gracefully.

    Local Copy
    Public Copy

  • On Understanding Software Agility: A Social Complexity Point Of View

    Over the last decade, the field of so-called Agile software development has grown to be a major force in the socio-economic arena of delivering quality software on time, on budget, and on spec. The acceleration in changing needs brought on by the rise in popularity of the Internet has helped push Agile practices far beyond their original boundaries, and possibly into domains where their application is not the optimal solution to the problems at hand. The question of where Agile software development practices and techniques make sense, and where are they out of place, is a valid one. It can be addressed by looking at software development as a complex endeavour, and using tools and techniques from the Cynefin method and other models of social complexity.

    Local Copy
    Public Copy

  • High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities
  • The Cloud Begins With Coal: Big Data, Big Networks, Big Infrastructure, and Big Power

    The information economy is a blue-whale economy with its energy uses mostly out of sight. Based on mid-range estimate, the world's Information-Communications-Technologies (ICT) ecosystem uses about 1,500 TWh of electricity annually, equal to all the electric generation of Japan and Germany combined - as much electricity as was used for global illumination in 1985. The ICT ecosystem now approaches 10% of world electricity generation. Or in other energy terms - the zettabyte era already uses about 50% more energy than global aviation.

    Local Copy
    Public Copy

  • The Network is Reliable

    "The network is reliable" tops Peter Deutsch's classic list, "Eight fallacies of distributed computing" (https://blogs.oracle.com/jag/resource/Fallacies.html), "all [of which] prove to be false in the long run and all [of which] cause big trouble and painful learning experiences." Accounting for and understanding the implications of network behavior is key to designing robust distributed programs—in fact, six of Deutsch's "fallacies" directly pertain to limitations on networked communications.

    Local Copy
    Public Copy

  • Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network

    We present our approach for overcoming the cost, operational complexity, and limited scale endemic to datacenter networks a decade ago. Three themes unify the five generations of datacenter networks detailed in this paper. First, multi-stage Clos topologies built from commodity switch silicon can support cost-effective deployment of building-scale networks. Second, much of the general, but complex, decentralized network routing and management protocols supporting arbitrary deployment scenarios were overkill for single-operator, pre-planned datacenter networks. We built a centralized control mechanism based on a global configuration pushed to all datacenter switches. Third, modular hardware design coupled with simple, robust software allowed our design to also support inter-cluster and wide-area networks. Our datacenter networks run at dozens of sites across the planet, scaling in capacity by lOOx over ten years to more than lPbps of bisection bandwidth.

    Local Copy
    Public Copy

  • Hey, You Have Given Me Too Many Knobs!

    Configuration problems are not only prevalent, but also severely impair the reliability of today's system software. One fundamental reason is the ever-increasing complexity of configuration, reflected by the large number of configuration parameters ("knobs"). With hundreds of knobs, configuring system software to ensure high reliability and performance becomes a daunting, error-prone task. This paper makes a first step in understanding a fundamental question of configuration design: "do users really need so many knobs?"

    ACM Digital Library

  • Quantum algorithms: an overview

    Quantum computers are designed to outperform standard computers by running quantum algorithms. Areas in which quantum algorithms can be applied include cryptography, search and optimisation, simulation of quantum systems, and solving large systems of linear equations. Here we briefly survey some known quantum algorithms, with an emphasis on a broad overview of their applications rather than their technical details. We include a discussion of recent developments and near-term applications of quantum algorithms.

    Local Copy
    Public Copy

  • A Universal Approach to Data Center Network Design

    This paper proposes an approach to the design of large-scale general-purpose data center networks based on the notions of volume and area universality introduced by Leiserson in the 1980's in the context of VLSI design. In particular, we suggest that the principle goal of the network designer should be to build a single network that is provably competitive, for any application, with any network that can be built for the same amount of money. After describing our approach, we survey the technology choices available to network designers today, and examine several existing commercial data center networks. In the most recent of these networks resources are allocated roughly as we suggest in this paper.

    ACM Digital Library

  • Evolve or Die: High-Availability Design Principles Drawn from Google’s Network Infrastructure

    Maintaining the highest levels of availability for content providers is challenging in the face of scale, network evolution, and complexity. Little, however, is known about the network failures large content providers are susceptible to, and what mechanisms they employ to ensure high availability. From a detailed analysis of over 100 high-impact failure events within Google’s network, encompassing many data centers and two WANs, we quantify several dimensions of availability failures. We find that failures are evenly distributed across different network types and across data, control, and management planes, but that a large number of failures happen when a network management operation is in progress within the network. We discuss some of these failures in detail, and also describe our design principles for high availability motivated by these failures. These include using defense in depth, maintaining consistency across planes, failing open on large failures, carefully preventing and avoiding failures, and assessing root cause quickly. Our findings suggest that, as networks become more complicated, failures lurk everywhere, and, counter-intuitively, continuous incremental evolution of the network can, when applied together with our design principles, result in a more robust network.

    Local Copy
    Public Copy

  • Loop-Free Alternates with Loop Detection for Fast Reroute in Software-Defined Carrier and Data Center Networks

    Loop-Free Alternates (LFAs) are a local fast-reroute mechanism defined for IP networks. They are simple but suffer from two drawbacks. Firstly, some flows cannot be protected due to missing LFAs, i.e., this concept does not provide full protection coverage, which depends on network topology. Secondly, some LFAs cause loops in case of node or multiple failures. Avoiding those LFAs decreases the protection coverage even further. In this work, we propose to apply LFAs to OpenFlow-based networks. We suggest a method for loop detection so that loops can be avoided without decreasing protection coverage. We propose an implementation with OpenFlow that requires only a single additional flow rule per switch.

    Local Copy
    Public Copy

  • Towards an Open, Disaggregated Network Operating System

    This white paper provides an overview of AT&T's vision for an Open Architecture for a Disaggregated Network Operating System (dNOS). Our goal is to start an industry discussion on technical feasibility, build interest in participating in the formulation of technical detail, and determine suitable vehicles (standards bodies, open source efforts, consortia, etc.) for common specification and architectural realization.

    Local Copy
    Public Copy

  • Knowledge-Defined Networking

    The research community has considered in the past the application of Artificial Intelligence (AI) techniques to control and operate networks. A notable example is the Knowledge Plane proposed by D.Clark et al. However, such techniques have not been extensively prototyped or deployed in the held yet. In this paper, we explore the reasons for the lack of adoption and posit that the rise of two recent paradigms: Software-Defined Networking (SDN) and Network Analytics (NA), will facilitate the adoption of AI techniques in the context of network operation and control. We describe a new paradigm that accommodates and exploits SDN. NA and AI. and provide use-cases that illustrate its applicability and benefits. We also present simple experimental results that support, for some relevant use-cases, its feasibility. We refer to this new paradigm as Knowledge-Defined Networking (KDN).

    Local Copy
    Public Copy

  • Internet of Pwnable Things Challenges in Embedded Binary Security

    Embedded systems are everywhere, from consumer electronics to critical infrastructure, and with the rise of the Internet of Things (IoT), such systems are set to proliferate throughout all aspects of everyday life. Due to their ubiquitous and often critical nature, such systems have myriad security and privacy concerns, but proper attention to these aspects in the embedded world is often sorely lacking. In this article I will discuss how embedded binary security in particular tends to lag behind what is commonly expected of modern general purpose systems, why bridging this gap is non-trivial, and offer some suggestions for promising defensive research directions.

    Local Copy
    Public Copy

  • The QUIC Transport Protocol: Design and Internet-Scale Deployment

    We present our experience with QUIC, an encrypted, multiplexed, and low-latency transport protocol designed from the ground up to improve transport performance for HTTPS traffic and to enable rapid deployment and continued evolution of transport mechanisms. QUIC has been globally deployed at Google on thousands of servers and is used to serve traffic to a range of clients including a widely-used web browser (Chrome) and a popular mobile video streaming app (YouTube). We estimate that 7% of Internet traffic is now QUIC. We describe our motivations for developing a new transport, the principles that guided our design, the Internet-scale process that we used to perform iterative experiments on QUIC, performance improvements seen by our various services, and our experience deploying QUIC globally. We also share lessons about transport design and the Internet ecosystem that we learned from our deployment.

    ACM Digital Library

    Local Copy
    Public Copy

  • IPv6 Router Advertisement Flags, RDNSS and DHCPv6 Conflicting Configurations

    The IPv6 world is a complicated one, there is no doubt about that. The numerous new features and possibilities added to the protocol, either from its very first specification, RFC 2460 [1], to its later enhancements, certainly increase flexibility and capabilities, but they may also introduce operational issues, at least in "contradicting scenarios" which may lead even to security implications.

    Local Copy
    Public Copy

  • Security, Moore's law, and the anomaly of cheap complexity

    A keybote talk discussing the relationship between agile development the increasing availability of general purpose processors, complexity, and security

    Local Copy
    Public Copy

    Recording
  • On Purpose and by Necessity: Compliance under the GDPR*

    Abstract. The European General Data Protection Regulation (GDPR) gives primacy to purpose: Data may be collected and stored only when (i) end-users have consented, often explicitly, to the purposes for which that data is collected, and (ii) the collected data is actually necessary for achieving these purposes. This development in data protection regulations begets the question: how do we audit a computer system’s adherence to a purpose? We propose an approach that identifies a purpose with a business process, and show how formal models of interprocess communication can be used to audit or even derive privacy policies. Based on this insight, we propose a methodology for auditing GDPR compliance. Moreover, we show how given a simple interprocess dataflow model, aspects of GDPR compliance can be determined algorithmically.

    Local Copy
    Public Copy

  • Decrypting the Encryption Debate: A Framework for Decision Makers

    Government access to the plaintext of encrypted communications and stored data presents difficult, important, and controversial issues that reveal conflicting values within the government and society at large. The debate over efforts to ensure that access is very polarized. Critics of government access, even as they acknowledge the importance of effective law enforcement, cite legal and practical objections, including risks to security, privacy and civil liberties, and U.S. commercial interests. Government officials acknowledge the value of encryption to protect privacy and confidential information but also express the need to be able to access information relevant to investigations when properly authorized.

    Local Copy
    Public Copy

  • Policy Approaches to the Encryption Debate

    A fierce debate has been ongoing for many years over strong computer encryption of communications and data, which can both deliver security and privacy for individuals but also make it difficult for the intelligence and law enforcement communities to perform their surveillance and investigative duties. In particular, the question of whether encryption systems should be required to have a “backdoor” to give the government special access to encrypted information remains divisive.

    Local Copy
    Public Copy

  • The Service-Infrastructure Cycle, Ossification, and the Fragmentation of the Internet

    In this article I will first argue that a Service-Infrastructure Cycle is fundamental to networking evolution. Networks are built to accommodate certain services at an expected scale. New applications and/or a significant increase in scale require a rethinking of network mechanisms which results in new deployments. Four decades-worth of iterations of this process have yielded the Internet as we know it today, a common and shared global networking infrastructure that delivers almost all services. I will further argue, using brief historical case studies, that success of network mechanism deployments often hinges on whether or not mechanism evolution follows the iterations of this Cycle. Many have observed that this network, the Internet, has become ossified and unable to change in response to new demands. In other words, after decades of operation, the Service-Infrastructure Cycle has become stuck. However, novel service requirements and scale increases continue to exert significant pressure on this ossified infrastructure. The result, I will conjecture, will be a fragmentation, the beginnings of which are evident today, that will ultimately fundamentally change the character of the network infrastructure. By ushering in a ManyNets world, this fragmentation will lubricate the Service-Infrastructure Cycle so that it can continue to govern the evolution of networking. I conclude this article with a brief discussion of the possible implications of this emerging ManyNets world on networking research.

    Local Copy
    Public Copy

  • The Emperor's Old Clothes

    The last section of our inquiry into the failure dealt with the criteria of quality of software. "In the recent struggle to deliver any software at all, the first casualty has been consideration of the quality of the software delivered. The quality of software is measured by a number of totally incompatible criteria, which must be carefully balanced in the design and implementation of every program." We then made a list of no less than seventeen criteria which has been published in a guest editorial in Volume 2 of the journal, Software Practice and Experience.

    Public Copy

  • A Quick Method for Finding Shortest Pairs of Disjoint Paths

    Let G be a directed graph containing n vertices, one of which is a distinguished source s, and m edges, each with a non‐negative cost. We consider the problem of finding, for each possible sink vertex v, a pair of edge‐disjoint paths from s to v of minimum total edge cost. Suurballe has given an O(n2 logn)‐time algorithm for this problem. We give an implementation of Suurballe's algorithm that runs in O(m log(1+ m/n)n) time and O(m) space. Our algorithm builds an implicit representation of the n pairs of paths; given this representation, the time necessary to explicitly construct the pair of paths for any given sink is O(1) per edge on the paths.

    Public Copy

  • Loop-Free Routing Using Diffusing Computations

    The DUAL Algorithm, on which EIGRP and BABEL are based

    A family of distributed algorithms for the dynamic computation of the shortest paths in a computer network or internet is presented, validated, and analyzed. According to these algorithms, each node maintains a vector with its distance to every other node. Update messages from a node are sent only to its neighbors; each such message contains a distance vector of one or more entries, and each entry specifies the length of the selected path to a network destination, as well as an indication of whether the entry constitutes an update, a query, or a reply to a previous query.

    IEEE Public Copy

  • On Compact Routing for the Internet

    The Internet’s routing system is facing stresses due to its poor fundamental scaling properties. Compact routing is a research field that studies fundamental limits of routing scalability and designs algorithms that try to meet these limits. In particular, compact routing research shows that shortest-path routing, forming a core of traditional routing algorithms, cannot guarantee routing table (RT) sizes that on all network topologies grow slower than linearly as functions of the network size. However, there are plenty of compact routing schemes that relax the shortest-path requirement and allow for improved, sublinear RT size scaling that is mathematically provable for all static network topologies. In particular, there exist compact routing schemes designed for grids, trees, and Internet-like topologies that offer RT sizes that scale logarithmically with the network size.

    Public Copy

  • Contrasting Views of Complexity and Their Implications For Network-Centric Infrastructures

    There exists a widely recognized need to better understand and manage complex “systems of systems,” ranging from biology, ecology, and medicine to network-centric technologies. This is motivating the search for universal laws of highly evolved systems and driving demand for new mathematics and methods that are consistent, integrative, and predictive. However, the theoretical frameworks available today are not merely fragmented but sometimes contradictory and incompatible. We argue that complexity arises in highly evolved biological and technological systems primarily to provide mechanisms to create robustness. However, this complexity itself can be a source of new fragility, leading to “robust yet fragile” tradeoffs in system design. We focus on the role of robustness and architecture in networked infrastructures, and we highlight recent advances in the theory of distributed control driven by network technologies. This view of complexity in highly organized technological and biological systems is fundamentally different from the dominant perspective in the mainstream sciences, which downplays function, constraints, and tradeoffs, and tends to minimize the role of organization and design.

    Public Copy

  • Tail Attacks on Web Applications

    As the extension of Distributed Denial-of-Service (DDoS) attacks to application layer in recent years, researchers pay much interest in these new variants due to a low-volume and intermittent pattern with a higher level of stealthiness, invaliding the state-of-the-art DDoS detection/defense mechanisms. We describe a new type of low-volume application layer DDoS attack--Tail Attacks on Web Applications. Such attack exploits a newly identified system vulnerability of n-tier web applications (millibottlenecks with sub-second duration and resource contention with strong dependencies among distributed nodes) with the goal of causing the long-tail latency problem of the target web application (e.g., 95th percentile response time > 1 second) and damaging the long-term business of the service provider, while all the system resources are far from saturation, making it difficult to trace the cause of performance degradation.

    ACM Digital Library

  • An Internet-Wide Analysis of Traffic Policing

    Large flows like videos consume significant bandwidth. Some ISPs actively manage these high volume flows with techniques like policing, which enforces a flow rate by dropping excess traffic. While the existence of policing is well known, our contribution is an Internet-wide study quantifying its prevalence and impact on video quality metrics. We developed a heuristic to identify policing from server-side traces and built a pipeline to deploy it at scale on traces from a large online content provider, collected from hundreds of servers worldwide. Using a dataset of 270 billion packets served to 28,400 client ASes, we find that, depending on region, up to 7% of lossy transfers are policed. Loss rates are on average six times higher when a trace is policed, and it impacts video playback quality. We show that alternatives to policing, like pacing and shaping, can achieve traffic management goals while avoiding the deleterious effects of policing.

    Public Copy

  • A Universal Approach to Data Center Network Design

    This paper proposes an approach to the design of large-scale general-purpose data center networks based on the notions of volume and area universality introduced by Leiserson in the 1980's in the context of VLSI design. In particular, we suggest that the principle goal of the network designer should be to build a single network that is provably competitive, for any application, with any network that can be built for the same amount of money. After describing our approach, we survey the technology choices available to network designers today, and examine several existing commercial data center networks. In the most recent of these networks resources are allocated roughly as we suggest in this paper.

    ACM Digital Library

  • BBR: Congestion-Based Congestion Control

    By all accounts, today's Internet is not moving data as well as it should. Most of the world's cellular users experience delays of seconds to minutes; public Wi-Fi in airports and conference venues is often worse. Physics and climate researchers need to exchange petabytes of data with global collaborators but find their carefully engineered multi-Gbps infrastructure often delivers at only a few Mbps over intercontinental distances.

    ACM Digital Library

  • P-FatTree: A multi-channel datacenter network topology

    The bandwidth and latency requirements of next-generation datacenter networks stress the limits of CMOS manufacturing. A key trend in their design will be a move from single-channel links and switches to multi-channel links and switches. Today's network topologies erase this distinction, providing the illusion of a unified network fabric. In this work we propose P-FatTree, which is a FatTree topology designed specifically for the future multi-channel reality. P-FatTree requires fewer switch chips and as a result has lower cost, power consumption, and latency than existing approaches. Furthermore, by embracing the parallel nature of the network itself, it enables compelling new ways to better manage and deliver application traffic.

    ACM Digital Library