The Network Sized Holes in Serverless

Until about 2017, the cloud was going to replace all on-premises data centers. As it turns out, however, the cloud has not replaced all on-premises data centers. Why not? Based on the paper under review, one potential answer is because containers in the cloud are still too much like “serverfull” computing. Developers must still create and manage what appear to be virtual machines, including:

  • Machine level redundancy, including georedundancy
  • Load balancing and request routing
  • Scaling up and down based on load
  • Monitoring and logging
  • System upgrades and security
  • Migration to new instances

Serverless solves these problems by placing applications directly onto the cloud, or rather a set of libraries within the cloud.

Jonas, Eric, Johann Schleier-Smith, Vikram Sreekanti, Chia-Che Tsai, Anurag Khandelwal, Qifan Pu, Vaishaal Shankar, et al. “Cloud Programming Simplified: A Berkeley View on Serverless Computing.” ArXiv:1902.03383 [Cs], February 9, 2019. http://arxiv.org/abs/1902.03383.

The authors define serverless by contrasting it with serverfull computing. While software is run based on an event in serverless, software runs until stopped in a cloud environment. While an application does not have a maximum run time in a serverfull environment, there is some maximum set by the provider in a serverless environment. The server instance, operating system, and libraries are all chosen by the user in a serverfull environment, but they are chosen by the provider in a serverless environment. The serverless environment is a higher-level abstraction of compute and storage resources than a cloud instance (or an on-premises solution, even a private cloud).

These differences add up to faster application development in a serverless environment; application developers are completely freed from any system administration tasks to focus entirely on developing and deploying useful software. This should, in theory, free application developers to focus on solving business problems, rather than worrying about any of the infrastructure. Two key points the authors point out in the serverless realm are the complex software techniques used to bring serverless processes up quickly (such as preloading and holding the VM instances that back services), and the security isolation provided through VM level separation.

The authors provide a section on challenges in serverless environments and the workarounds to these challenges. For instance, one problem with real-time video compression is the object store used to communicate between processes running on a serverless infrastructure is too slow to support fine-grained communication, while the functions are too course-grained to support some of the required tasks. To solve this problem, they propose using function-to-function communication, which moves the object store out of the process. This provides dramatic processing speedups, as well as reducing the costs of serverless to a fraction of a cloud instance.

One of the challenges discussed here is the problem of communication patterns, including broadcast, aggregation, and shuffle. Each of these, of course rely on the underlying network to transport data between the compute nodes on which serverless functions are running. Since the serverless user cannot determine where a particular function will run, the performance of the underlying transport is—of course–quite variable. The authors say: “Since the application cannot control the location of the cloud functions, a serverless computing application may need to send two and four orders of magnitude more data than an equivalent VM-based solution.”

And this is where the network sized hole in serverless comes into play. It is common fare today to say the network is “just a commodity.” Speeds are feeds are so high, and so easy to build, that we do not need to worry about building software that knows how to use a network efficiently, or even understands the network at all. That matching network to software requirements is a thing of the past—bandwidth is all a commodity now.

The law of leaky abstractions, however, will always have its say—a corollary here is higher level abstractions will always have larger and more consequential leaks. The solutions offered to each of the challenges listed in the paper are all, in fact, resolved by introducing layering violations which allow the developer to “work around” an inefficiency at some lower layer in the abstraction. Ultimately, such work arounds will compound into massive technical debt, and some “next new thing” will come along to “solve the problems.”

Moving data ultimately still takes time, still takes energy; the network still (often) needs to be tuned to the data being moved. Serverless is a great technology for some solutions—but there is ultimately no way to abstract out the hard work of building an entire system tuned to do a particular task and do it well. When you face abstraction, you should always ask: what is gained, and what is lost?

Weekend Reads 031519

Most medium to large companies now runs A/B tests and new feature experiments on segments of their user base. They are a great way to check whether a feature will have long time success, and get observable metrics on the repercussion of their changes. —JonLuca DeCaro

There are myriad theories as to why software remains insecure after we’ve spend decades trying to solve the problem. —Daniel Miessler

Ask yourself this: do you find yourself becoming outraged or saying “ho-hum” every time you hear about the latest record data breach? Society seems to be agreeing with the latter answer. —Roger A. Grimes

Why is privacy so hard? Why is it, after so much negative press about it, are we still being constantly tracked on the web and on our smartphones? —Jason Hong

As you can see in the video, a malicious website that looks like Airbnb prompts users to authenticate using Facebook login, but upon clicking, the page displays a fake tab switching animation video aimed to trick users into thinking that their browsers are behaving normally. —Mohit Kumar

The hyperscalers and cloud builders of the world build things that often look and feel like supercomputers if you squint your eyes a little, but if you look closely, you can often see some pretty big differences. —Timothy Pricket Morgan

MWC 2019 was billed as the year that 5G left the slides and started to hit the ground with real equipment. However my impression is that this is still a slow-burn. —David Stokes

70 service providers (mobile, fixed, cable and wholesale) from around the world provided inputs to the questions posed by ACG Research. All of the participants reported being knowledgeable of their organizations 5G plans and decisions, many of them holding senior positions within the organization. —Sigal Biran-Nagar

With the explosion of both structured and unstructured data coming on the heels of smartphones and IoT devices comes the need to be able to work with massive amounts of data, mine it and make it accessible. —TC Currie

I’d like to take this opportunity to outline some of the often misunderstood nuances of the open source world — and offer some food for thought for how we can continue enabling and sustaining open source, a movement that has largely changed the world for the better. —Sharone Revah Zitzman

In fact, quite a few use feedback and metrics synonymously, where they present feedback from teams or customers as a bunch of numbers or a graphical representation of those numbers. —Ranjith Varakantam

The sentiment is noble, and, as a private entity, Pinterest can block search results as it sees fit. But the move is part of a larger trend in social-media content policing, and it is a silly one. —Abe Greenwald

Facebook is unlikely to ever own a media production company, just as Airbnb and Uber will not soon own a hotel or a physical taxi company. But if they can, they’ll own every square foot of demand that feeds those industries. —Antonio Garcia Martinez

On the ‘net: Directory Services

Have you ever decided you needed to contact someone in another area of the world, only to discover you do not have their email address, nor any way to find their email address? You could go to your favorite social media site, of course, but if they are not on any form of social media? Or if they do not put their contact information there? Or—perhaps—this little situation happened before the rise of social media? GASP! From some of the younger readers.

Bridging the Gap

Mike Bushong and Denise Donohue join Eyvonne, Jordan, and I to discuss the gap between network engineering and “the business,” and give us some thoughts on bridging it.

Outro Music:
Danger Storm Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License
http://creativecommons.org/licenses/by/3.0/

Research: Practical Challenge-Response for DNS

Because the speed of DNS is so important to the performance of any connection on the ‘net, a lot of thought goes into making DNS servers fast, including optimized software that can respond to queries in milliseconds, and connecting DNS servers to the ‘net through high bandwidth links. To set the stage for massive DDoS attacks based in the DNS system, add a third point: DNS responses tend to be much larger than DNS queries. In fact, a carefully DNS response can be many times larger than the query.

To use a DNS server as an amplifier in a DDoS attack, then, the attacker sends a query to some number of publicly accessible DNS servers. The source of this query is the address of the system to be attacked. If the DNS query is carefully crafted, the attacker can send small packets that cause a number of DNS servers to send large responses to a single IP address, causing large amounts of traffic to the system under attack.

Rami Al-Dalky, Michael Rabinovich, and Mark Allman. 2018. Practical Challenge-Response for DNS. In Proceedings of the Applied Networking Research Workshop (ANRW ’18). ACM, New York, NY, USA, 74-74. DOI: https://doi.org/10.1145/3232755.3232765

Carrying DNS over TCP is one way to try to resolve this problem, because TCP requires a three-way handshake. When the attacker sends a request with a spoofed source address, the server attempts to build a TCP session with the system who owns the spoofed address, which will fail. A key point: TCP three-way handshake packets are much smaller than most DNS responses, which means the attacker’s packet stream is not being amplified (in size) by the DNS server.

DNS over TCP is problematic, however. For instance, many DNS resolvers cannot reach an authoritative DNS server using TCP because of stateful packet filters, network address translators, and other processes that either modify or block TCP sessions in the network. What about DNSSEC? This does not prevent the misuse of a DNS server; it only validates the records contained in the DNS database. DNSSEC just means the attacker can send even larger really secure DNS records towards an unsuspecting system.

Another option is to create a challenge-response system much like the TCP handshake, but embed it in DNS. The most obvious place to embed such a challenge response system is in CNAME records. Assume a recursive DNS server requests a particular record; an authoritative server can respond with a CNAME record effectively telling the recursive server to ask someone else. When the recursive server sends the second query, presumably to a different server, it includes the response information it has in order to give the second server the context of its request.

To build a challenge-request system, the authoritative server sends back a CNAME telling the recursive server to contact the very same authoritative server. In order to ensure the three-way handshake is effective, the source IP address of the querying recursive DNS server is encoded into the CNAME response. When the authoritative server receives the second query, it can check the source address encoded in the second resolution request against the source of the packet containing the new query. If they do not match, the authoritative server can drop the second request; the three-way handshake failed.

If the source of the original request is spoofed, this causes the victim to receive a CNAME response telling it to ask again for the answer—which the victim will never respond to, because it did not send the original request. Since CNAME responses are small, this tactic removes the amplification the attacker is hoping for.

There is one problem with this solution, however: DNS resolvers are often pooled behind a single anycast address. Consider a resolving DNS server pool with two servers labeled A and B. Server A receives a DNS request from a host, and finding it has no cache entry for the destination, recursively sends a request to an authoritative server. The authoritative server, in turn, sends a challenge to the IP address of server A. This address, however, is an anycast address assigned to the entire pool of recursive servers. For whatever reason, the challenge—a CNAME response asking the recursive server to ask at a different location—is directed to B.

If the DNS software is set up correctly, B will respond to the request. However, this response will be sourced from B’s IP address, rather than A’s. Remember the source of the original query is encoded in the CNAME response from the responding server. Since the address encoded in the follow-on query will not match B’s address, the authoritative server will drop the request.

To solve this problem, the authors of this paper suggest a chained response. Rather than dropping the request with an improperly encoded source address, encode the new source address in the packet and send another challenge in the form of a CNAME response. Assuming there are only two servers in the pool, the next query with the encoded list of IP addresses from the CNAME response will necessarily match one of the two available source addresses, and the authoritative server can respond with the correct information.

What if the pool of recursive servers is very large—on the order of hundreds or thousands of servers? While one or two “round trips” in the form of a three-way handshake might not have too much of a performance impact, thousands could be a problem. To resolve this issue, the authors suggest taking advantage of the observation that once the packets being transmitted between the requester and the server are as large as the request itself, any amplification gain an attacker might take advantage of has been erased. Once the CNAME packet grows to the same size as a DNS request by adding source addresses observed in the three-way handshake process, the server should just answer the query. This (generally) reduces the number of round trips down to three or four before the DNS is not going to generate any more data than the attacker could send to the victim directly, and dramatically improves the performance of the scheme.

I was left with one question after reading this paper: there are carefully crafted DNS queries that can cause very large, multipacket responses. These are not mentioned at all in the paper; this seems like an area that would need to be considered and researched more deeply. Overall, however, this seems like it would be an effective system to reduce or eliminate the use of authoritative servers in DDoS reflection attacks.