The Network Sized Holes in Serverless

Until about 2017, the cloud was going to replace all on-premises data centers. As it turns out, however, the cloud has not replaced all on-premises data centers. Why not? Based on the paper under review, one potential answer is because containers in the cloud are still too much like “serverfull” computing. Developers must still create and manage what appear to be virtual machines, including:

  • Machine level redundancy, including georedundancy
  • Load balancing and request routing
  • Scaling up and down based on load
  • Monitoring and logging
  • System upgrades and security
  • Migration to new instances

Serverless solves these problems by placing applications directly onto the cloud, or rather a set of libraries within the cloud.

Jonas, Eric, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, et al. “Cloud Programming Simplified: A Berkeley View on Serverless Computing.” ArXiv:1902.03383 [Cs], February 9, 2019.

The authors define serverless by contrasting it with serverfull computing. While software is run based on an event in serverless, software runs until stopped in a cloud environment. While an application does not have a maximum run time in a serverfull environment, there is some maximum set by the provider in a serverless environment. The server instance, operating system, and libraries are all chosen by the user in a serverfull environment, but they are chosen by the provider in a serverless environment. The serverless environment is a higher-level abstraction of compute and storage resources than a cloud instance (or an on-premises solution, even a private cloud).

These differences add up to faster application development in a serverless environment; application developers are completely freed from any system administration tasks to focus entirely on developing and deploying useful software. This should, in theory, free application developers to focus on solving business problems, rather than worrying about any of the infrastructure. Two key points the authors point out in the serverless realm are the complex software techniques used to bring serverless processes up quickly (such as preloading and holding the VM instances that back services), and the security isolation provided through VM level separation.

The authors provide a section on challenges in serverless environments and the workarounds to these challenges. For instance, one problem with real-time video compression is the object store used to communicate between processes running on a serverless infrastructure is too slow to support fine-grained communication, while the functions are too course-grained to support some of the required tasks. To solve this problem, they propose using function-to-function communication, which moves the object store out of the process. This provides dramatic processing speedups, as well as reducing the costs of serverless to a fraction of a cloud instance.

One of the challenges discussed here is the problem of communication patterns, including broadcast, aggregation, and shuffle. Each of these, of course rely on the underlying network to transport data between the compute nodes on which serverless functions are running. Since the serverless user cannot determine where a particular function will run, the performance of the underlying transport is—of course–quite variable. The authors say: “Since the application cannot control the location of the cloud functions, a serverless computing application may need to send two and four orders of magnitude more data than an equivalent VM-based solution.”

And this is where the network sized hole in serverless comes into play. It is common fare today to say the network is “just a commodity.” Speeds are feeds are so high, and so easy to build, that we do not need to worry about building software that knows how to use a network efficiently, or even understands the network at all. That matching network to software requirements is a thing of the past—bandwidth is all a commodity now.

The law of leaky abstractions, however, will always have its say—a corollary here is higher level abstractions will always have larger and more consequential leaks. The solutions offered to each of the challenges listed in the paper are all, in fact, resolved by introducing layering violations which allow the developer to “work around” an inefficiency at some lower layer in the abstraction. Ultimately, such work arounds will compound into massive technical debt, and some “next new thing” will come along to “solve the problems.”

Moving data ultimately still takes time, still takes energy; the network still (often) needs to be tuned to the data being moved. Serverless is a great technology for some solutions—but there is ultimately no way to abstract out the hard work of building an entire system tuned to do a particular task and do it well. When you face abstraction, you should always ask: what is gained, and what is lost?