Optimal Route Reflection: Next Hop Self

Recently, I posted a video short take I did on BGP optimal route reflection. A reader wrote in the comments to that post:

…why can’t Router set next hop self to updates to router E and avoid this suboptimal path?

To answer this question, it is best to return to the scene of the suboptimality—

To describe the problem again: A and C are sending the same route to B, which is a route reflector. B selects the best path from its perspective, which is through B, and sends this route to each of its clients. In this case, E will learn the path with a next hop of A, even though the path through C is closer from E’s perspective. In the video, I discuss several ways to solve this problem; one option I do not talk about is allowing B to set the next hop to itself. Would this work?

Before answering the question, however, it is important to make one observation: I have drawn this network with B as a router in the forwarding path. In many networks, the route reflector is a virtual machine, or a *nix host, and is not capable of forwarding the traffic required to self the next-hop to itself. There are many advantages to intentionally removing the route reflector from the forwarding path. So while setting nexthop-self might work in this situation, it will not work in all situations.

But will it work in this situation? Not necessarily. The shortest path, for D, is through C, rather than through A. B setting its next hop to itself is going to draw E’s traffic towards 100::/64 towards itself, which is still the longer path from E’s perspective. So while there are situations where setting nexthop-self will resolve this problem, this particular network is not one of them.

Research: BGP Routers and Parrots

The BGP specification suggests implementations should have three tables: the adj-rib-in, the loc-rib, and the adj-rib-out. The first of these three tables should contain the routes (NLRIs and attributes) transmitted by each of the speaker’s peers. The second table should contain the calculated best paths; these are the routes that will be (or are) installed in the local routing table and used to build a forwarding table. The third table contains the routes which have been sent to each peering speaker. Why three tables? Routing protocols standards are (sometimes—not always) written to provide the maximum clarity to how the protocol works to someone who is writing an implementation. Not every table or process described in the specification is implemented, or implemented the way it is described.

What happens when you implement things in a different way than the specification describes? In the case of BGP and the three RIBs, you can get duplicated BGP updates. What do parrots and BGP have in common describes two situations where the lack of a adj-rib-out can cause duplicate BGP updates to be sent.

David Hauweele, Bruno Quoitin, Cristel Pelsser, and Randy Bush. 2016. “What Do Parrots and BGP Rotuers Have in Common?” Computer Communications Review, July. http://ccracmsigcomm.info.ucl.ac.be/wp-content/uploads/2016/07/sigcomm-ccr-paper26.pdf.

The authors of this paper begin by observing BGP updates from a full feed off the default free zone. The configuration of the network, however, is designed to provide not only the feed from a BGP speaker, but also the routes received by a BGP speaker, as shown in the illustration below.

In this figure, all the labeled routers are in separate BGP autonomous systems, and the links represent physical connections as well as eBGP sessions. The three BGP updates received by D are stored in three different logs which are time stamped so they can be correlated. The researchers found two instances where duplicate BGP updates were received at D.

In the first case, the best path at C switches between A and B because of the Multiple Exit Discriminator (MED), but the remainder of the update remains the same. C, however, strips the MED before transmitting the route to D, so D simply sees what appears to be duplicate updates. In the second case, the next hop changes because of an implicit withdraw based on a route change for the previous best path. For instance, C might choose A as the best path, but then A implicitly withdraws its path, leaving the path through B as the best. When this occurs, C recalculates the best path and sends it to D; since the next hop is stripped when C advertises the new route to D, this appears to be a duplicate at D.

In both of these cases, if C had an adj-rib-out, it would find the duplicate advertisement and squash it. However, since C has no record of what it has sent to D in the past, it must send information about all local best path changes to D. While this might seem like a trivial amount of processing, these additional updates can add enough load during link flap situations to make a material difference in processor utilization or speed of convergence.

Why do implementors decide not to include an adj-rib-out in their implementations, or why, when one is provided, do operators disable the adj-rib-out? Primarily because the adj-rib-out consumes local memory; it is cheaper to push the work to a peer than it is to keep local state that might only rarely be used. This is a classic case of reducing the complexity of the local implementation by pushing additional state (and hence complexity) into the overall system. The authors of the paper suggest a better balance might be achieved if implementations kept a small cache of the most recent updates transmitted to an adjacent speaker; this would allow the implementation to reduce memory usage, while also allowing it to prevent repeating recent updates.

CAA Records and Site Security

The little green lock—now being deprecated by some browsers—provides some level of comfort for many users when entering personal information on a web site. You probably know the little green lock means the traffic between the host and the site is encrypted, but you might not stop to ask the fundamental question of all cryptography: using what key? The quality of an encrypted connection is no better than the quality and source of the keys used to encrypt the data carried across the connection. If the key is compromised, then entire encrypted session is useless.

So where does the key pair come from to encrypt the session between a host and a server? The session key used for symmetric cryptography on each session is obtained using the public key of the server (thus through asymmetric cryptography). How is the public key of the server obtained by the host? Here is where things get interesting.

The older way of doing things was for a list of domains who were trusted to provide a public key for a particular server was carried in HTTP. The host would open a session with a server, which would then provide a list of domains where its public key could be found in the opening HTTP packets. The host would then find one of those hosts, and hence the server’s public key. From there, the host could create the correct nonce and other information to form a session key with the server. If you are quick on the security side, you might note a problem with this solution: if the HTTP session itself is somehow hijacked early in the setup process, a man-in-the-middle could substitute its own host list for the one the server provides. Once this substitution is done, the MITM could set up perfectly valid encrypted sessions with both the host and the server, funneling traffic between them. The MITM now has full access to the unencrypted data flowing through the session, even though the traffic is encrypted as it flows over the rest of the ‘net.

To solve this problem, a new method for finding the server’s public key was designed around 2010. In this method, the host requests the Certificate Authority Authorization (CAA) record from the server’s DNS server. This record lists the domains who are authorized to provide a public key, or certificate, for the servers within a domain. Thus, if you purchase your certificates from BigCertProvider, you would list BigCertProvider’s domain in your CAA. The host can then find the correct DNS record, and retrieve the correct certificate from the DNS system. This cuts out the possibility of a MITM attacking the HTTP session during the initial setup phases. If DNSSEC is deployed, the DNS records should also be secured, preventing MITM attacks from that angle, as well.

The paper under review today examines the deployment of CAA records in the wild, to determine how widely CAAs are deployed and used.

Scheitle, Quirin, Taejoong Chung, Jens Hiller, Oliver Gasser, Johannes Naab, Roland van Rijswijk-Deij, Oliver Hohlfeld, et al. 2018. “A First Look at Certification Authority Authorization (CAA).” SIGCOMM Comput. Commun. Rev. 48 (2): 10–23. https://doi.org/10.1145/3213232.3213235.

In this paper, a group of researchers put the CAA system to the test to see just how reliable the information is. In their first test, they attempted to request certificates that would cause the issuer to issue invalid certificates in some way; they found that many certificate providers will, in fact, issue such invalid certificates for various reasons. For instance, in one case, they discovered a defect in the provider’s software that allowed their automated system to issue invalid certificates.

In their second test, they examined the results of DNS queries to determine if DNS operators were supporting and returning CAA certificates. They discovered that very few certificate authorities deploy security controls on CAA lookups, leaving open the possibility of the lookups themselves being hijacked. Finally, they examine the deployment of CAA in the wild by web site operators. They found CAA is not widely deployed, with CAA records covering around 40,000 domains. DNSSEC and CAA deployment generally overlap, pointing to a small section of the global ‘net that is concerned about the security of their web sites.

Overall, the results of this study were not heartening for the overall security of the ‘net. While the HTTP based mechanism of discovering a server’s certificate is being deprecated, not many domains have started deploying the CAA infrastructure to replace it—in fact, only a small number of DNS providers support users entering their CAA certificate into their domain records.

Research: Measuring IP Liveness

Of the 4.2 billion IPv4 addresses available in the global space, how many are used—or rather, how many are “alive?” Given the increasing usage of IPv6, it might seem this is an unimportant question. Answering the question, however, resolves to another question that is actually more important: how can you determine whether or not an IP address is in use? This question might seem easy to answer: ping every address in the address space. This, however, turns out to be the wrong answer.

Scanning the Internet for Liveness. SIGCOMM Comput. Commun. Rev. 48, 2 (May 2018), 2-9. DOI: https://doi.org/10.1145/3213232.3213234

This answer is wrong because a substantial number of systems do not respond to ICMP requests. According to this paper, in fact, some 16% of the hosts they discovered that would respond to a TCP SYN, and another 2% that would respond to a UDP packet shaped to connect to a service, do not respond to ICMP requests. There are a number of possible reasons for this situation, including hosts being placed behind devices that block ICMP packets, hosts being configured not to respond to ICMP requests, or a server sitting behind a PAT or CGNAT device that only passes through service requests rather than all packets.

The paper begins by building a taxonomy of liveness, describing the process they use to determine if an address is in use or not, as shown in the image replicated from the paper.

One problem of note is that address usage can shift over time; between trying to use ICMP and a TCP SYN to determine if an IP address is in use, the device connected to that address can change. To limit the impact of this problem, the researchers sent each kind of liveness test to the same address close together in time. The authors then attempt to cross reference the liveness indicated using different techniques to an overall view of liveness for a particular address.

The research resulted in a number of interesting observations, such as the 16% of hosts that respond to TCP SYN probes on some port, but do not respond to ICMP requests. The kinds of ICMP and TCP responses was also quite interesting; many TCP implementations do not seem compliant to the TCP specification in how they respond to a SYN request.

Along the way, the authors added new capabilities to ZMap which allow them to perform these measurements. The tool they used has a web based frontend, and can be accessed here.

The results are interesting for network operators because they indicate the kinds of work required to find all the devices attached to a network using IP addresses—a mass ping utility is simply not enough. The tools developed here, and the lessons learned, can be added to the set of tools used by operators in all networks to better understand their IP address usage, and the shape of their networks.

IPv6 Security Considerations

When rolling out a new protocol such as IPv6, it is useful to consider the changes to security posture, particularly the network’s attack surface. While protocol security discussions are widely available, there is often not “one place” where you can go to get information about potential attacks, references to research about those attacks, potential counters, and operational challenges. In the case of IPv6, however, there is “one place” you can find all this information: draft-ietf-opsec-v6. This document is designed to provide information to operators about IPv6 security based on solid operational experience—and it is a must read if you have either deployed IPv6 or are thinking about deploying IPv6.

cross posted on CircleID

The draft is broken up into four broad sections; the first is the longest, addressing generic security considerations. The first consideration is whether operators should use Provider Independent (PI) or Provider Assigned (PA) address space. One of the dangers with a large address space is the sheer size of the potential routing table in the Default Free Zone (DFZ). If every network operator opted for an IPv6 /32, the potential size of the DFZ routing table is 2.4 billion routing entries. If you thought converging on about 800,000 routes is bad, just wait ‘til there are 2.4 billion routes. Of course, the actual PI space is being handed out on /48 boundaries, which makes the potential table size exponentially larger. PI space, then, is “bad for the Internet” in some very important ways.

This document provides the other side of the argument—security is an issue with PA space. While IPv6 was supposed to make renumbering as “easy as flipping a switch,” it does not, in fact, come anywhere near this. Some reports indicate IPv6 re-addressing is more difficult than IPv4. Long, difficult renumbering processes indicate many opportunities for failures in security, and hence a large attack surface. Preferring PI space over PA space becomes a matter of reducing the operational attack surface.

Another interesting question when managing an IPv6 network is whether static addressing should be used for some services, or if all addresses should be dynamically learned. There is a perception out there that because the IPv6 address space is so large, it cannot be “scanned” to find hosts to attack. As pointed out in this draft, there is research showing this is simply not true. Further, static addresses may expose specific servers or services to easy recognition by an attacker. The point the authors make here is that either way, endpoint security needs to rely on actual security mechanisms, rather than on hiding addresses in some way.

Other very useful topics considered here are Unique Local Addresses (ULAs), numbering and managing point-to-point links, privacy extensions for SLAAC, using a /64 per host, extension headers, securing DHCP, ND/RA filtering, and control plane security.

If you are deploying, or thinking about deploying, IPv6 in your network, this is a “must read” document.

BGP Security: A Gentle Reminder that Networking is Business

At NANOG on the Road (NotR) in September of 2018, I participated in a panel on BGP security—specifically the deployment of Route Origin Authentication (ROA), with some hints and overtones of path validation by carrying signatures in BGP updates (BGPsec). This is an area I have been working in for… 20 years? … at this point, so I have seen the argument develop across these years many times, and in many ways. What always strikes me about this discussion, whenever and wherever it is aired, is the clash between business realities and the desire for “someone to do something about routing security in the DFZ, already!” What also strikes me about these conversations it the number of times very fundamental concepts end up being explained to folks who are “new to the problem.”

TL;DR

  • BGP security is a business problem first, and a technology problem second
  • Signed information is only useful insofar as it is maintained
  • The cost of deployment must be lower than the return on that cost
  • Local policy will always override global policy—as it should
  • The fear of losing business is a stronger motivator than gaining new business

 

Part of the problem here is solutions considered “definitive and final” have been offered, the operator community has rejected them for many years, and yet these same solutions are put on the table year after year—like the perennial fruit cake made by someone’s great great aunt in the mists of Christmastime history that has been regifted so many times no-one really remembers where it came from, nor what sorts of fruit it actually contains.

cross posted at CircleID

The business reality, in terms of BGP security, is simple. To deploy some sort of check on the global routing table, at point must be reached where it costs more to not deploy it than to deploy it. This simply business reality is something network designers and architects beat their heads against every day. The solution can be the neatest solution in the world. It might even shop for the ingredients, mix the cookie dough in perfect proportions, bake the cookies, and then transport them to the proper location for the perfect amount of enjoyment from just the right people (insert Goldilocks here, perhaps). But none of this will ever matter if there is no financial upside, or if the financial risks are greater than the financial gains.

So, some hopefully helpful business realities.

Signed information is no more useful than unsigned information if it is not kept up to date. It is great to get everyone out in full force to build a cryptographically secured database of who owns what prefixes and AS numbers. It is wonderful to find a way to distribute that information throughout the ‘net so people can use it as another tool to determine whether or not to accept a route, or what weight to place on that route. People might even spend a bit of time building this database, just because they believe it is good for the community.

The problem is not day one. It is day two, and then day two thousand. What motivation is there to keep the information in this database up-to-date? Unless there is some—and here I mean financial motivation—the database will lose its effectiveness over time. At some point, when the error rate reaches some number (around 30% seems to be about right), people will simply stop trusting it. When the error rate gets high enough, the tools will stop being used, and the ‘net will revert to its old self.

The cost of deployment and operation must at least be close to the gains from deployment. At this point, there is no financial gain any one company can see from deploying anything in the realm of BGP security, so the cost of deployment and maintenance must be close to zero. While there are folks (including me) trying to reduce the cost as close to zero as possible, we are not there yet, and I do not know if we will ever be there.

Local policy will always override global policy. The literature of BGP security is replete with statements like: “if the route meets this criteria, the BGP speaker MUST drop it.” Good luck. The Internet is a confederation of independent companies, each of which runs their network in a way that they believe will make the most money at the lowest cost possible. One of the ways this happens is that people tune their local policies to charge their adjacent autonomous systems as much money as possible, while reducing their OPEX and CAPEX as much as possible. There will always be money to be made in the grey space around local tuning of policies for optimal traffic flow. Hence local policy will always win over what any database anyplace might say.

To give a specific example: assume you run a network, and you have peered with another operator in multiple places for many years. You know the other operator’s routes well, as they have not changed for many years. One day, you receive a route from this operator in which everything looks correct, but the route is not contained in this outside database of “correct routes.”

Noting this route is a route you have received from this very same operator for many years, are you going to drop it because it’s not right in some database, or are you going to use it given your past standing and relationship?

The fear of losing business is always the strongest motivator. Which leads to the next issue. It does not matter how wonderful your network is if you have a high customer churn rate. The most certain way to have a high churn rate is to place your customer’s experience in the hands of someone else. Such as a communally managed database, perhaps. This is another reason why local policy will always win over remote policy—the local provider is handed checks by customers, not the community at large. They have more incentive to keep their customers happy than the community.

You cannot secure things you do not tell anyone about. This final one is probably not as obvious as the others, but it is just as important as any other item on this list. There are many backdoor arrangements and sealed contracts in the provider world. People transit traffic without telling anyone else that traffic is being transited. Some people are customers of others only in the event of a massive failure someplace else in their network, but do not want anyone to know about this.

All of these arrangements are perfectly legitimate and legal in their respective jurisdictions. But you cannot secure something that no-one knows about. The more information that is hidden in a system, the harder it is to validate the information that exists is correct.

The bottom line is this: BGP security, like most networking problems, is not a technology problem. BGP security is, at its heart, a business problem. The lesson is here not just for security, but for network engineering in general. Business is the bottom line, not technology.

Research: Tail Attacks on Web Applications

When you think of a Distributed Denial of Service (DDoS) attack, you probably think about an attack which overflows the bandwidth available on a single link; or overflowing the number of half open TCP sessions a device can have open at once, preventing the device from accepting more sessions. In all cases, a DoS or DDoS attack will involve a lot of traffic being pushed at a single device, or across a single link.

TL;DR[time-span]

  • Denial of service attacks do not always require high volumes of traffic
  • An intelligent attacker can exploit the long tail of service queues deep in a web application to bring the service down
  • These kinds of attacks would be very difficult to detect

 

But if you look at an entire system, there are a lot of places where resources are scarce, and hence are places where resources could be consumed in a way that prevents services from operating correctly. Such attacks would not need to be distributed, because they could take much less traffic than is traditionally required to deny a service. These kinds of attacks are called tail attacks, because they attack the long tail of resource pools, where these pools are much thinner, and hence much easier to attack.

There are two probable reasons these kinds of attacks are not often seen in the wild. First, they require an in-depth knowledge of the system under attack. Most of these long tail attacks will take advantage of the interaction surface between two subsystems within the larger system. Each of these interaction surfaces can also be attack surfaces if an attacker can figure out how to access and take advantage of them. Second, these kinds of attacks are difficult to detect, because they do not require large amounts of traffic, or other unusual traffic flows, to launch.

The paper under review today, Tail Attacks on Web Applications, discusses a model for understanding and creating tail attacks in a multi-tier web application—the kind commonly used for any large-scale frontend service, such as ecommerce and social media.

Huasong Shan, Qingyang Wang, and Calton Pu. 2017. Tail Attacks on Web Applications. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS ’17). ACM, New York, NY, USA, 1725-1739. DOI: https://doi.org/10.1145/3133956.3133968

The figure below illustrates a basic service of this kind for those who are not familiar with it.

The typical application at scale will have at least three stages. The first stage will terminate the user’s session and render content; this is normally some form of modified web server. The second stage will gather information from various backend services (generally microservices), and pass the information required to build the page or portal to the rendering engine. The microservices, in turn, build individual parts of the page, and rely on various storage and other services to supply the information needed.

If you can find some way to clog up the queue at one of the storage nodes, you can cause every other service along the information path to wait on the prior service to fulfill its part of the job in hand. This can cause a cascading effect through the system, where a single node struggling because of full queues can cause an entire set of dependent nodes to become effectively unavailable, cascading to a larger set of nodes in the next layer up. For instance, in the network illustrated, if an attacker can somehow cause the queues at storage service 1 to fill up, even for a moment, this can cascade into a backlog of work at services 1 and 2, cascading into a backlog at the front-end service, ultimately slowing—or even shutting—the entire service down. The queues at storage service 1 may be the same size as every other queue in the system (although they are likely smaller, as they face internal, rather than external, services), but storage system 1 may be servicing many hundreds, perhaps thousands, of copies of services 1 and 2.

The queues at storage service 1—and all the other storage services in the system—represent a hidden bottleneck in the overall system. If an attacker can, for a few moments at a time, cause these internal, intra-application queue to fill up, the overall service can be made to slow down to the point of being almost unusable.

How plausible is this kind of attack? The researchers modeled a three-stage system (most production systems have more than three stages) and examined the total queue path through the system. By examining the queue depths at each stage, they devised a way to fill the queues at the first stage in the system by sending millibursts of valid sessions requests to the rend engine, or the use facing piece of the application. Even if these millibursts are spread out across the edge of the application, so long as they are all the same kind of requests, and timed correctly, they can bring the entire system down. In the paper, the researchers go further and show that once you understand the architecture of one such system, it is possible to try different millibursts on a running system, causing the same DoS effect.

This kind of attack, because it is built out of legitimate traffic, and it can be spread across the entire public facing edge of an application, would be nearly impossible to detect or counter at the network edge. One possible counter to this kind of attack would be increasing capacity in the deeper stages of the application. This countermeasure could be expensive, as the data must be stored on a larger number of servers. Further, data synchronized across multiple systems will subject to CAP limitations, which will ultimately limit the speed at which the application can run anyway. Operators could also consider fine grained monitoring, which increases the amount of telemetry that must be recovered from the network and processed—another form of monetary tradeoff.