CAA Records and Site Security

The little green lock—now being deprecated by some browsers—provides some level of comfort for many users when entering personal information on a web site. You probably know the little green lock means the traffic between the host and the site is encrypted, but you might not stop to ask the fundamental question of all cryptography: using what key? The quality of an encrypted connection is no better than the quality and source of the keys used to encrypt the data carried across the connection. If the key is compromised, then entire encrypted session is useless.

So where does the key pair come from to encrypt the session between a host and a server? The session key used for symmetric cryptography on each session is obtained using the public key of the server (thus through asymmetric cryptography). How is the public key of the server obtained by the host? Here is where things get interesting.

The older way of doing things was for a list of domains who were trusted to provide a public key for a particular server was carried in HTTP. The host would open a session with a server, which would then provide a list of domains where its public key could be found in the opening HTTP packets. The host would then find one of those hosts, and hence the server’s public key. From there, the host could create the correct nonce and other information to form a session key with the server. If you are quick on the security side, you might note a problem with this solution: if the HTTP session itself is somehow hijacked early in the setup process, a man-in-the-middle could substitute its own host list for the one the server provides. Once this substitution is done, the MITM could set up perfectly valid encrypted sessions with both the host and the server, funneling traffic between them. The MITM now has full access to the unencrypted data flowing through the session, even though the traffic is encrypted as it flows over the rest of the ‘net.

To solve this problem, a new method for finding the server’s public key was designed around 2010. In this method, the host requests the Certificate Authority Authorization (CAA) record from the server’s DNS server. This record lists the domains who are authorized to provide a public key, or certificate, for the servers within a domain. Thus, if you purchase your certificates from BigCertProvider, you would list BigCertProvider’s domain in your CAA. The host can then find the correct DNS record, and retrieve the correct certificate from the DNS system. This cuts out the possibility of a MITM attacking the HTTP session during the initial setup phases. If DNSSEC is deployed, the DNS records should also be secured, preventing MITM attacks from that angle, as well.

The paper under review today examines the deployment of CAA records in the wild, to determine how widely CAAs are deployed and used.

Scheitle, Quirin, Taejoong Chung, Jens Hiller, Oliver Gasser, Johannes Naab, Roland van Rijswijk-Deij, Oliver Hohlfeld, et al. 2018. “A First Look at Certification Authority Authorization (CAA).” SIGCOMM Comput. Commun. Rev. 48 (2): 10–23. https://doi.org/10.1145/3213232.3213235.

In this paper, a group of researchers put the CAA system to the test to see just how reliable the information is. In their first test, they attempted to request certificates that would cause the issuer to issue invalid certificates in some way; they found that many certificate providers will, in fact, issue such invalid certificates for various reasons. For instance, in one case, they discovered a defect in the provider’s software that allowed their automated system to issue invalid certificates.

In their second test, they examined the results of DNS queries to determine if DNS operators were supporting and returning CAA certificates. They discovered that very few certificate authorities deploy security controls on CAA lookups, leaving open the possibility of the lookups themselves being hijacked. Finally, they examine the deployment of CAA in the wild by web site operators. They found CAA is not widely deployed, with CAA records covering around 40,000 domains. DNSSEC and CAA deployment generally overlap, pointing to a small section of the global ‘net that is concerned about the security of their web sites.

Overall, the results of this study were not heartening for the overall security of the ‘net. While the HTTP based mechanism of discovering a server’s certificate is being deprecated, not many domains have started deploying the CAA infrastructure to replace it—in fact, only a small number of DNS providers support users entering their CAA certificate into their domain records.

Research: Covert Cache Channels in the Public Cloud

One of the great fears of server virtualization is the concern around copying information from one virtual machine, or one container, to another, through some cover channel across the single processor. This kind of channel would allow an attacker who roots, or otherwise is able to install software, on one of the two virtual machines, to exfiltrate data to another virtual machine running on the same processor. There have been some successful attacks in this area in recent years, most notably meltdown and spectre. These defects have been patched by cloud providers, at some cost to performance, but new vulnerabilities are bound to be found over time. The paper I’m looking at this week explains a new attack of this form. In this case, the researchers use the processor’s cache to transmit data between two virtual machines running on the same physical core.

The processor cache is always very small for several reasons. First, the processor cache is connected to a special bus, which normally has limits in the amount of memory it can address. This special bus avoids reading data through the normal system bus, and this is (from a networking perspective) at least one hop, and often several hops, closer to the processor on the internal bus (which is essentially an internal network). Second, the memory used in the cache is normally faster than the main memory.

The question is: since caches are at the processor level, so multiple virtual processes share the same cache, is it possible for one process to place information in the cache that another process can read? Since the cache is small and fast, it is used to store information that is accessed frequently. As processes, daemons, threads, and pthreads enter and exit, they access difference parts of main memory, causing the contents of the cache to change rapidly. Because of this constant churn, many researchers have assumed you cannot build a covert channel through the cache in this way. In fact, there have been attempts in the past; each of these has failed.

The authors of this paper argue, however, these failures are not because building a covert channel through the cache is not possible, but rather because previous attempts at doing so have operated on bad assumptions, attempting to use standard error correction mechanisms.

The first problem with using standard error correction mechanisms is that entire sections of data can be lost due to a cache entry being deleted. Assume you have two processes running on a single processor; you would like to build a covert channel between these processes. You write some code that inserts information into the cache, ensuring it is written in a particular memory location. This is the “blind drop.” The second process now runs and attempts to read this information. Normally this would work, but between the first and second process running, the information in the cache has been overwritten by some third process you do not know about. Because the entire data block is gone, the second process, which is trying to gather the information from the blind drop location, cannot tell there was ever any information at the drop point at all. There is no data across which the error correction code can run, because the data has been completely overwritten.

A possible solution to this problem is to use something like a TCP window of one; the transmitter resends the information until it receives an acknowledgement of receipt. The problem with this solution, in the case of a cache, is that the sender and receiver have no way to synchronize their clocks. Hence there is no way to form any sense of a serialized channel between the two processes.

To overcome these problems, the researchers use techniques used in wireless networks to ensure reliable delivery over unreliable channels. For instance, they send each symbol (a 0 or a 1) multiple times, using different blind drops (or cache locations), such that the receiver can compare these multiple transmit instances, and decide what the sender intended. The broader the number of blind drops used, the more likely information is to be carried across the process divide through the cache, as there are very few instances where all the cache entries representing blind drops will be invalidated and replaced at once. The researchers increase the rate at which this newly opened covert channel can operate by reverse engineering some aspects of a particular model processor’s caching algorithm. This allows them to guess which lines of cache will be invalidated first, how the cache sets are arranged, etc., and hence to place the blind drops more effectively.

By taking these steps, and using some strong error correction coding, a 42K covert channel was created between two instances running in an Amazon EC2 instance. This might not sound like a like, but it is higher speed than some of the fastest modems in use before DSL and other subscriber lines were widely available, and certainly fast enough to transfer a text-based file of passwords between two processes.

There will probably be some counter to this vulnerability in the future, but for now the main protection against this kind of attack is to prevent unknown or injected code from running on your virtual machines.

Reaction: The Power of Open APIs

Disaggregation, in the form of splitting network hardware from network software, is often touted as a way to save money (as if network engineering were primarily about saving money, rather than adding value—but this is a different soap box). The primary connections between disaggregation and saving money are the ability to deploy white boxes, and the ability to centralize the control plane to simplify the network (think software defined networks here—again, whether or not both of these are true as advertised is a different discussion).

But drivers that focus on cost miss more than half the picture. A better way to drive the value of disaggregation, and the larger value of networks within the larger network technology sphere, is through increased value. What drives value in network engineering? It’s often simplest to return to Tannenbaum’s example of the station wagon full of VHS backup tapes. To bring the example into more modern terms, it is difficult to beat the bandwidth of an overnight box full of USB thumb drives in terms of pure bandwidth.

In this view, networks can primarily be seen as a sop to human impatience. They are a way to get things done more quickly. In the case of networks quantity—speed—often becomes a form of quality—increased value.

But what does disaggregation have to do with speed? The connection is the open API.

When you disaggregate a network device into hardware and software, you necessarily create a stable, openly accessible API between the software and the hardware. Routing protocols, and other control plane elements must be able to build a routing table that is somehow then passed on to the forwarding hardware, so packets can be forwarded through the network. A fortuitous side effect of this kind of open API is that anyone can use it to control the forwarding software.

Enter the new release of ScyllaDB. According to folks who test these things, and should know, ScyllaDB is much faster than Cassandra, another industry leading open source database system. How much faster? Five to ten times faster. A five- to ten-fold improvement in database performance is, potentially, a point of quantity that can easily have a different quality. How much faster could your business handle orders, or customer service calls, or many other things, if you could speed the database end of the problem up even five-fold? How many new ways of processing information to gain insight from that data about business operations, customers, etc.

How does Scylla provide these kinds of improvements over Cassandra? In the first place, the newer database system is written in a faster language, C++ rather than Java. Scylla also shards processing across processor cores more efficiently. It doesn’t rely on the page cache.

None of this has to do with network disaggregation—but there is one way the Scylla developers improved the performance of their database that does relate to network disaggregation: ScyllaDB writes directly to the network interface card using DPDK. The interesting point, from a network engineering point of view, is that simply would not be possible without disaggregation between hardware and software opening up DPDK as an interface for a database to directly push packets to the hardware.

The side effects of disaggregation are only beginning to be felt in the network engineering world; the ultimate effects could reshape the way we think about application performance on the network, and the entire realm of network engineering.

Whatever is vOLT-HA?

Many network engineers find the entire world of telecom to be confusing—especially as papers are peppered with a lot of acronyms. If any part of the networking world is more obsessed with acronyms than any other, the telecom world, where the traditional phone line, subscriber access, and network engineering collide, reigns as the “king of the hill.”

Recently, while looking at some documentation for the CORD project, which stands for Central Office Rearchitected as a Data Center, I ran across an acronym I had not seen before—vOLT-HA. An acronym with a dash in the middle—impressive! But what is, exactly? To get there, we must begin in the beginning, with a PON.

There are two kinds of optical networks in the world, Active Optical Networks (AONs), and Passive Optical Networks (PONs). The primary difference between the two is whether the optical gear used to build the network amplifies (or even electronically rebuilds, or repeats) the optical signal as it passes through. In AONs, optical signals are amplified, while ins PONs, optical signals are not amplified. This means that in a PON, the optical equipment can be said to be passive, in that it does not modify the optical signal in any way. Why is this important? Because passive equipment is less complex, and does not require as much power to operate, so a PON is much less expensive to build and maintain than an AON. Hence a PON is often more economically realistic when serving a large number of customers, such as in providing service to residential or small office customers.

A PON uses optical splitters to divide out the signal among the various connected customers. Like any other shared bandwidth medium, every customer receives all the data on the downstream side, switching only traffic destined for the local network onto the copper (usually Ethernet) network beyond the optical termination point (called an OLT, or Optical Line Terminal). In a PON, the upstream signal is divided up into timeslots, so the system uses Time Division Multiplexing (TDM) to provide (a much slower) path from the end device into the provider’s network. As signals from each edn device reach the splitters in the network, the path is reversed, and the splitter ends up becoming a power combiner, which means the signal can “gain power” on the way up towards the central office (CO). These kinds of systems are typically sold as Fiber to the Home, which is abbreviated FTTH (of course!).

Is your head dizzy yet? I hope not, because we are just getting started with the acronyms. 🙂

The Optical Line Terminal, or OLT, must reside in some piece of physical hardware, called an Optical Network Unit (ONU). The OLT, like a server, or an Ethernet port on a router or switch, can be virtualized, so multiple logical OLTs reside on a single physical hardware interface. Just like a VRF or VLAN, this allows a single physical interface to be used for multiple logical connections. In the case, the resulting logical interface is called a vOLT, or a virtual Optical Line Terminal.

Now we are finally getting to the answer to the original question. vOLT must somehow relate to virtualizing the OLT, but how? The answer lies in the idea of disaggregation in passive optical networks (remember, this is a PON). One of the key components of disaggregation is being able to run any software—especially open source software—on any hardware—so-called “white box” hardware in particular. To get to this point, you must have some sort of “open Application Programming Interface,” or API, to connect the software to the hardware. You might think the HA in vOLT-HA stands for “high availability, but then you’d be wrong. 🙂 It actually stands for Hardware Abstraction.

So vOLT-HA, sometimes spelled VOLTHA, is actually a hardware abstraction layer that allows the disaggregation of vOLTs in an ONU in a PON.

Got it?

Reaction: DNS Complexity Lessons

Recently, Bert Hubert wrote of a growing problem in the networking world: the complexity of DNS. We have two systems we all use in the Internet, DNS and BGP. Both of these systems appear to be able to handle anything we can throw at them and “keep on ticking.”

this article was crossposted to CircleID

But how far can we drive the complexity of these systems before they ultimately fail? Bert posted this chart to the APNIC blog to illustrate the problem—

I am old enough to remember when the entire Cisco IOS Software (classic) code base was under 150,000 lines; today, I suspect most BGP and DNS implementations are well over this size. Consider this for a moment—a single protocol implementation that is larger than an entire Network Operating System ten to fifteen years back.

What really grabbed my attention, though, was one of the reasons Bert believes we have these complexity problems—

DNS developers frequently see immense complexity not as a problem but as a welcome challenge to be overcome. We say ‘yes’ to things we should say ‘no’ to. Less gifted developer communities would have to say no automatically since they simply would not be able to implement all that new stuff. We do not have this problem. We’re also too proud to say we find something (too) hard.

How often is this the problem in network design and deployment? “Oh, you want a stretched Ethernet link between two data centers 150 miles apart, and you want an eVPN control plane on top of the stretched Ethernet to support MPLS Traffic Engineering, and you want…” All the while the equipment budget is ringing up numbers in our heads, and the realyl cool stuff we will be able to play with is building up on the list we are writing in front of us. Then you hear the ultimate challenge—”if you were a real engineer, you could figure out how to do this all with a pair of routers I can buy down at the local office supply store.”

Some problems just do not need to be solved in the current system. Some problems just need to have their own system built for them, rather than reusing the same old stuff because, well, “we can.”

The real engineer is the one who knows how to say “no.”