RFC9199: Lessons in Large-scale Service Deployment
While RFC9199 (are we really in the 9000’s?) is targeted at large-scale DNS deployments–specifically root zone operators–so it might seem the average operator won’t find a lot of value here.
This is, however, far from the truth. Every lesson we’ve learned in deploying large-scale DNS root servers applies to any other large-scale user-facing service. Internally deployed DNS recursive servers are an obvious instance, but the lessons here might well apply to a scheduling, banking, or any other multi-user application accessed from a lot of places by a lot of different users. There are some unique points in DNS, such as the relatively slower pace of database synchronization across nodes, but the network-side lessons can still be useful for a lot of applications.
What are those lessons?
First, using anycast dramatically improves performance for these kinds of services. For those who aren’t familiar with the concept, anycase turns an IP address into a service identifier. Any host with a copy (or instance) or a given service advertises the same address, causing the routing table to choose the (topologically) closest instance of the service. If you’re using anycast, traffic destined to your service will automatically be forwarded to the closest server running the application, providing a kind of load sharing among multiple instances through routing. If there are instances in New York, California, France, and Taipei, traffic from users in North Carolina will be routed to New York and traffic from users in Singapore will be routed to Taipei.
You can think of an anycast address something like a cell tower; users within a certain desintance will be “captured” by a particular instance. The more copies of the service you deploy, the smaller the geographic region the service will support. Hence you can control the number of users using a particular copy of the service by controlling the number and location of service copies.
To understand where and how to deploy service instances, create anycast catchment maps. Again, just like a wifi signal coverage map, or a cellphone tower coverage map, it’s important to understand which users will be directed to which instances. Using a catchment map will help you decide where new instances need to be deployed, which instances need the fastest links and hardware, etc. The RIPE ATLAS pobes and looking glass servers are good ways to start building such a map. If the application supports a large number of users, you might be able to convince the application developer to include some sort of geographic information in requests to help build these maps.
Third, when deploying service instance, pay as much attention to routing and connectivity as you do the number of instances deployed. As the authors note, sometimes eight instances will provide the same level of service as several thousand instances. The connectivity available into each instance of the service–bandwidth, delay, availability, etc.–still has a huge impact on service speed.
Fourth, reduce the speed at which the database needs to be synchronized where possible. Not every piece of information needs to be synchronized at the same rate. The less data being synchronized, the more consistent the view from multiple users is going to be.
RFC9199 is well worth reading, even for the average network engineer.
Hedge 098: DRIP with Stuart Card
Drones are becoming—and in many cases have already become—an everyday part of our lives. Drones are used in warfare, delivery services, photography, and recreation. One of the problems facing the world of drones, however, is the strong tie-in between the controller and the drone; this proprietary link limits innovation and reduces the information available to public officials to manage traffic, and even to protect the privacy of drone operators. The DRIP working group is building protocols designed to standardize the drone-to-controller interface, advancing the state of the art in drones and opening up the field for innovation. Stuart Card joins Alvaro Retana and Russ White to discuss DRIP.
Hedge 92: The IETF isn’t the Standards Police
In most areas of life, where the are standards, there is some kind of enforcing agency. For instance, there are water standards, and there is a water department that enforces these standards. There are electrical standards, and there is an entire infrastructure of organizations that make certain the fewest number of people are electrocuted as possible each year. What about Internet standards? Most people are surprised when they realize there is no such thing as a “standards police” in the Internet.
Listen in as George Michaelson, Evyonne Sharp, Tom Ammon, and Russ White discuss the reality of standards enforcement in the Internet ecosystem.
The Hedge 77: The Internet is for End Users
When the interests of the end user, the operator, and the vendor come into conflict, who should protocol developers favor? According to RFC8890, the needs and desires of the end user should be the correct answer. According to the RFC:
As the Internet increasingly mediates essential functions in societies, it has unavoidably become profoundly political; it has helped people overthrow governments, revolutionize social orders, swing elections, control populations, collect data about individuals, and reveal secrets. It has created wealth for some individuals and companies while destroying that of others. All of this raises the question: For whom do we go through the pain of gathering rough consensus and writing running code?
Mark Nottingham joins Alvaro Retana and Russ White on this episode of the Hedge to discuss why the Internet is for end users.
The Hedge 61: Pascal Thubert and the RAW Working Group
RAW is a new working group recently chartered by the IETF to work on:
…high reliability and availability for IP connectivity over a wireless medium. RAW extends the DetNet Working Group concepts to provide for high reliability and availability for an IP network utilizing scheduled wireless segments and other media, e.g., frequency/time-sharing physical media resources with stochastic traffic: IEEE Std. 802.15.4 timeslotted channel hopping (TSCH), 3GPP 5G ultra-reliable low latency communications (URLLC), IEEE 802.11ax/be, and L-band Digital Aeronautical Communications System (LDACS), etc. Similar to DetNet, RAW will stay abstract to the radio layers underneath, addressing the Layer 3 aspects in support of applications requiring high reliability and availability.
The Hedge 53: Deprecating Interdomain ASM
Interdomain Any-source Multicast has proven to be an unscalable solution, and is actually blocking the deployment of other solutions. To move interdomain multicast forward, Lenny Giuliano, Tim Chown, and Toerless Eckert wrote RFC 8815, BCP 229, recommending providers “deprecate the use of Any-Source Multicast (ASM) for interdomain multicast, leaving Source-Specific Multicast (SSM) as the recommended interdomain mode of multicast.”
The Hedge 44: Pete Lumbis and Open Source
Open source software is everywhere, it seems—and yet it’s nowhere at the same time. Everyone is talking about it, but how many people and organizations are actually using it? Pete Lumbis at NVIDIA joins Tom Ammon and Russ White to discuss the many uses and meanings of open source software in the networking world.