TECH
The Hedge 5: Geoff Huston on DoH

In this episode of the Hedge, Geoff Huston joins Tom Ammon and Russ White to discuss the ideas behind DNS over HTTPS (DoH), and to consider the implications of its widespread adoption. Is it time to bow to our new overlords?
This is part one of a two part series.
Stop Using the OSI Model
We all use the OSI model to describe the way networks work. I have, in fact, included it in just about every presentation, and every book I have written, someplace in the fundamentals of networking. But if you have every looked at the OSI model and had to scratch your head trying to figure out how it really fits with the networks we operate today, or what the OSI model is telling you in terms of troubleshooting, design, or operation—you are not alone. Lots of people have scratched their heads about the OSI model, trying to understand how it fits with modern networking. There is a reason this is so difficult to figure out.
The OSI Model does not accurately describe networks.
What set me off in this particular direction this week is an article over at Errata Security:
This is partly true, and yet a bit … over the top. đ OTOH, the point is well taken: the OSI model is not an ideal model for understanding networks. Maybe a bit of analysis would be helpful in understanding why.
First, while the OSI model was developed with packet switching networks in mind, the general idea was to come as close as possible to emulating the circuit-switched networks widely deployed at the time. A lot of thought had gone into making those circuit-switched networks work, and applications had been built around the way they worked. Applications and circuit-switched networks formed a sort of symbiotic relationship, just as applications form with packet-switched networks today; it was unimaginable, at the time, that “everything would change.”
So while the designers of the OSI model understood the basic value of the packet-switched network, they also understood the value of the circuit-switched network, and tried to find a way to solve both sets of problems in the same network. Experience has shown it is possible to build a somewhat close-to-circuit switched network on top of packet switched networks, but not quite in the way, nor as close to perfect emulation, as those original designers thought. So the OSI model is a bit complex and perhaps overspecified, making it less-than-useful today.
Second, the OSI model largely ignored the role of middleboxes, focusing instead on the stacks implemented and deployed in hosts. This, again, makes sense, as there was no such thing as a device specialized in the switching of packets at the time. Hosts took packets in and processed them. Some packets were sent along to other hosts, other packets were consumed locally. Think PDP-11 with some rough code, rather than even an early Cisco CGS.
Third, the OSI model focuses on what each layer does from the perspective of an application, rather than focusing on what is being done to the data in order to transmit it. The OSI model is built “top down,” rather than “bottom up,” in other words. While this might be really useful if you are an application developer, it is not so useful if you are a network engineer.
So—what should we say about the OSI model?
It was much more useful at some point in the past, when networking was really just “something a host did,” rather than its own sort of sub-field, with specialized protocols, techniques, and designs. It was a very good attempt at sorting out what a network needed to do to move traffic, from the perspective of an application.
What it is not, however, is really all that useful for network engineers working within an engineering specialty to understand how to design protocols, and how to design networks on which those protocols will run. What should we replace it with? I would begin by pointing you to the RINA model, which I think is a better place to start. I’ve written a bit about the RINA model, and used the RINA model as one of the foundational pieces of Computer Networking Problems and Solutions.
Since writing that, however, I have been thinking further about this problem. Over the next six months or so, I plan to build a course around this question. For the moment, I don’t want to spoil the fun, or put any half-backed thoughts out there in the wild.
DNS Query Minimization and Data Leaks
When a recursive resolver receives a query from a host, it will first consult any local cache to discover if it has the information required to resolve the query. If it does not, it will begin with the rightmost section of the domain name, the Top Level Domain (TLD), moving left through each section of the Fully Qualified Domain Name (FQDN), in order to find an IP address to return to the host, as shown in the diagram below.

This is pretty simple at its most basic level, of courseâvirtually every network engineer in the world understands this process (and if you donât, you should enroll in my How the Internet Really Works webinar the next time it is offered!). The question almost no-one ever asks, however, is: what, precisely, is the recursive server sending to the root, TLD, and authoritative servers?
Begin with the perspective of a coder who is developing the code for that recursive server. You receive a query from a host, you have the code check the local cache, and you find there is no matching information available locally. This means you need to send a query out to some other server to determine the correct IP address to return to the host. You could keep a copy of the query from the host in your local cache and build a new query to send to the root server.
Remember, however, that local server resources may be scarce; recursive servers must be optimized to process very high query rates very quickly. Much of the userâs perception of network performance is actually tied to DNS performance. A second option is you could save local memory and processing power by sending the entire query, as you have received it, on to the root server. This way, you do not need to build a new query packet to send to the root server.
Consider this process, however, in the case of a query for a local, internal resource you would rather not let the world know exists. The recursive server, by sending the entire query to the root server, is also sending information about the internal DNS structure and potential internal server names to the external root server. As the FQDN is resolved (or not), this same information is sent to the TLD and authoritative servers, as well.
There is something else contained here, however, that is not so obviousâthe IP address of the requestor is contained in that original query, as well. Not only is your internal namespace leaking, your internal IP addresses are leaking, as well.
This is not only a massive security hole for your organization, it also exposes information from individual users on the global ânet.
There are several things that can be done to resolve this problem. Organizationally, running a private DNS server, hard coding resolving servers for internal domains, and using internal domains that are not part of the existing TLD infrastructure, can go a long way towards preventing information leaking of this kind through DNS. Operating a DNS server internally might not be ideal, of course, although DNS services are integrated into a lot of other directory services used in operational networks. If you are using a local DNS server, it is important to remember to configure DHCP and/or IPv6 ND to send the correct, internal, DNS server address, rather than an external address. It is also important to either block or redirect DNS queries sent to public servers by hosts using hard-coded DNS server configurations.
A second line of defense is through DNS query minimization. Described in RFC7816, query minimization argues recursive servers should use QNAME queries to only ask about the one relevant part of the FQDN. For instance, if the recursive server receives a query for www.banana.example, the server should request information about .example from the root server, banana.example from the TLD, and send the full requested domain name only to the authoritative server. This way, the full search is not exposed to the intermediate servers, protecting user information.
Some recursive server implementations already support QNAME queries. If you are running a server for internal use, you should ensure the server you are using supports DNS query minimization. If you are directing your personal computer or device to publicly reachable recursive servers, you should investigate whether these servers support DNS query minimization.
Even with DNS query minimization, your recursive server still knows a lot about what you ask for—the topic of discussion on a forthcoming episode of the Hedge, where our guest will be Geoff Huston.
There is Always a Back Door
A long time ago, I worked in a secure facility. I wonât disclose the facility; Iâm certain it no longer exists, and the people who designed the system Iâm about to describe are probably long retired. Soon after being transferred into this organization, someone noted I needed to be trained on how to change the cipher door locks. We gathered up a ladder, placed the ladder just outside the door to the secure facility, popped open one of the tiles on the drop ceiling, and opened a small metal box with a standard, low security key. Inside this box was a jumper board that set the combination for the secure door.
First lesson of security: there is (almost) always a back door.
I was reminded of this while reading a paper recently published about a backdoor attack on certificate authorities. There are, according to the paper, around 130 commercial Certificate Authorities (CAs). Each of these CAs issue widely trusted certificates used for everything from TLS to secure web browsing sessions to RPKI certificates used to validate route origination information. When you encounter these certificates, you assume at least two things: the private key in the public/private key pair has not been compromised, and the person who claims to own the key is really the person you are talking to. The first of these two can come under attack through data breaches. The second is the topic of the paper in question.
How do CAs validate the person asking for a certificate actually is who they claim to be? Do they work for the organization they are obtaining a certificate for? Are they the âright personâ within that organization to ask for a certificate? Shy of having a personal relationship with the person who initiates the certificate request, how can the CA validate who this person is and if they are authorized to make this request?
They could do research on the personâcheck their social media profiles, verify their employment history, etc. They can also send them something that, in theory, only that person can receive, such as a physical letter, or an email sent to their work email address. To be more creative, the CA can ask the requestor to create a small file on their corporate web site with information supplied by the CA. In theory, these electronic forms of authentication should be solid. After all, if you have administrative access to a corporate web site, you are probably working in information technology at that company. If you have a work email address at a company, you probably work for that company.
These electronic forms of authentication, however, can turn out to be much like the small metal box which holds the jumper board that sets the combination just outside the secure door. They can be more security theater than real security.
In fact, the authors of this paper found that some 70% of the CAs could be tricked into issuing a certificate for just about any organizationâby hijacking a route. Suppose the CA asks the requestor to place a small file containing some supplied information on the corporate web site. The attacker creates a web server, inserts the file, hijacks the route to the corporate web site so it points at the fake web site, waits for the authentication to finish, and then removes the hijacked route.
The solution recommended in this paper is for the CAs to use multiple overlapping factors when authenticating a certificate requestorâwhich is always a good security practice. Another solution recommended by the authors is to monitor your BGP tables from multiple âviewsâ on the Internet to discover when someone has hijacked your routes, and take active measures to either remove the hijack, or at least to detect the attack.
These are all good measuresâones your organization should already be taking.
But the larger point should be this: putting a firewall in front of your network is not enough. Trusting that others will âdo their job correctly,â and hence that you can trust the claims of certificates or CAs, is not enough. The Internet is a low trust environment. You need to think about the possible back doors and think about how to close them (or at least know when they have been opened).
Having personal relationships with people you do business with is a good start. Being creative in what you monitor and how is another. Firewalls are not enough. Two-factor authentication is not enough. Security is systemic and needs to be thought about holistically.
There are always back doors.
The Hedge 2: Jeff Tantsura and Intent Based Networking

Jeff Tantsura recently co-authored a draft in the IRTF defining some of the concepts and parameters for intent based networking. Jeff joins Tom Ammon and Russ White to dig into this new area, and what it means for networks.
The Floating Point Fix
Floating point is not something many network engineers think about. In fact, when I first started digging into routing protocol implementations in the mid-1990’s, I discovered one of the tricks you needed to remember when trying to replicate the router’s metric calculation was always round down. When EIGRP was first written, like most of the rest of Cisco’s IOS, was written for processors that did not perform floating point operations. The silicon and processing time costs were just too high.
What brings all this to mind is a recent article on the problems with floating point performance over at The Next Platform by Michael Feldman. According to the article:
For those who have not spent a lot of time in the coding world, a floating point number is one that has some number of digits after the decimal. While integers are fairly easy to represent and calculate over in the binary processors use, floating point numbers are much more difficult, because floating point numbers are very difficult to represent in binary. The number of bits you have available to represent the number makes a very large difference in accuracy. For instance, if you try to store the number 101.1 in a float, you will find the number stored is actually 101.099998 To store 101.1, you need a double, which is twice as long as a float
Okay—this is all might be fascinating, but who cares? Scientists, mathematicians, and … network engineers do, as a matter of fact. Fist, carrying around double floats to store numbers with higher precision means a lot more network traffic. Second, when you start looking at timestamps and large amounts of telemetry data, the efficiency and accuracy of number storage becomes a rather big deal.
Okay, so the current floating point storage format, called IEEE754, is inaccurate and rather inefficient. What should be done about this? According to the article, John Gustafson, a computer scientist, has been pushing for the adoption of a replacement called posits. Quoting the article once again:
It does this by using a denser representation of real numbers. So instead of the fixed-sized exponent and fixed-sized fraction used in IEEE floating point numbers, posits encode the exponent with a variable number of bits (a combination of regime bits and the exponent bits), such that fewer of them are needed, in most cases. That leaves more bits for the fraction component, thus more precision.
Did you catch why this is more efficient? Because it uses a variable length field. In other words, posits replaces a fixed field structure (like what was originally used in OSPFv2) with a variable length field (like what is used in IS-IS). While you must eat some space in the format to carry the length, the amount of "unused space" in current formats overwhelms the space wasted, resulting in an improvement in accuracy. Further, many numbers that require a double today can be carried in the size of a float. Not only does using a TLV format increase accuracy, it also increases efficiency.
From the perspective of the State/Optimization/Surface (SOS) tradeoff, there should be some increase in complexity somewhere in the overall system—if you have not found the tradeoffs, you have not looked hard enough. Indeed, what we find is there is an increase in the amount of state being carried in the data channel itself; there is additional state, and additional code that knows how to deal with this new way of representing numbers.
It's always interesting to find situations in other information technology fields where discussions parallel to discussions in the networking world are taking place. Many times, you can see people encountering the same design tradeoffs we see in network engineering and protocol design.
It’s not a CLOS, it’s a Clos
Way back in the day, when telephone lines were first being installed, running the physical infrastructure was quite expensive. The first attempt to maximize the infrastructure was the party line. In modern terms, the party line is just an Ethernet segment for the telephone. Anyone can pick up and talk to anyone else who happens to be listening. In order to schedule things, a user could contact an operator, who could then âringâ the appropriate phone to signal another user to âpick up.â CSMA/CA, in essence, with a human scheduler.
This proved to be somewhat unacceptable to everyone other than various intelligence agencies, so the operatorâs position was âupgraded.â A line was run to each structure (house or business) and terminated at a switchboard. Each line terminated into a jack, and patch cables were supplied to the operator, who could then connect two telephone lines by inserting a jumper cable between the appropriate jacks.
An important concept: this kind of operator driven system is nonblocking. If Joe calls Susan, then Joe and Susan cannot also talk to someone other than one another for the duration of their call. If Joeâs line is tied up, when someone tries to call him, they will receive a busy signal. The network is not blocking in this case, the edge isâbecause the node the caller is trying to reach is already using 100% of its available bandwidth for an existing call. This is called admission control; all non-blocking networks must either be provisioned so the sum total of the transmitters cannot consume the whole of the cross-sectional bandwidth available in the network, or they must have effective admission control.
Blocking networks did exist in the form of trunk connections, or connections between these switch panels. Trunk connections not only consumed ports on the switchboard, they were expensive to build, and required a lot of power to run. Hence, making a âlong distance callâ costs money because it consumed a blocking resource. It is only when we get to packet switched digital networks that the cost of a âlong distance callâ drops to the rough equivalent of a ânormalâ call, and we see âlong distanceâ charges fade into memory (many of my younger readers have never been charged for âlong distance calls,â in fact, and may not even know what Iâm talking about).
The human operators were eventually replaced by relays.

The old âcrank to ringâ phones were replaced with âdial phones,â which break the circuit between the phone and the relay to signal a digit being dialed. The process begins when you lift the handset, which causes voltage to run across the line. Spinning the dial breaks the circuit once for each digit on the dial. Each time the circuit breaks, the armature on this stack of circular relays is moved. The first digit dialed causes the relay to move up (or down, depending on the relay) the stack. When you pause for a second before dialing the second number, the motors reset so the arm will now move around the circle, starting with the zero position. When it reaches this spot, the arm connects your physical line to another relay, which then repeats the process, ultimately reaching the line of the person you want to call and creating a direct electrical connection between the caller and the receiver. We used to have huge banks of these relays, accompanied by 66 style punch-down blocks, and sometimes (in newer systems) spin-down connections for permanent circuits. We mostly âtroubleshotâ them with WD-40 and electrical contact solution from spray cans (yes, I have personal experience).
These huge relay banks became unwieldy to support, of course, so they were eventually replaced with fabricsâcrossbar fabrics.

In a crossbar fabric, each device is âattached twice,â once on the send side, and once on the receive side. When you place a call, your âsendâ is connected to the receivers âreceiveâ by making an electrical connection where your âsendâ line intersects with the receivers âreceiveâ line. All of this hardware, however, has the property of scaling up rather than out. In order to create a larger telephone switch, you simply have to buy a larger fabric of some kind. You cannot âadd toâ a fabric once it is built, which means when a region outgrows its fabric, you either must install a new on and connect the two fabrics with trunk lines, you must rip the entire fabric out and replace it.
This is the problem Edson Erwin, and later Charles Clos, were trying to solveâhow do you build a large-scale switching fabric out of simple components, and yet scale to virtually any size. While Erwin invented the concept in 1938, the concept was formalized by Clos in 1952. Charles Clos, by the way, was French, so the proper pronunciation is âklo,â with a long âo.”
The solution was to use four port switches (or relays, in the case of circuits), each of which are connected into a familiar looking fabric.

This was designed to be a unidirectional fabric, with each edge node (leaf) having a send and receive side, much like a crossbar. It is fairly obvious that the fabric is nonblocking on admission control; if a1 connects to b3, then a1âs entire bandwidth is consumed in that connection, so a1 cannot accept any other connections. If d1 also tries to connect to b3, it will receive the traditional âbusy signal.â The network does not block, it just refuses to accept the new connection at the edge.
What if you wanted to make this fabric bidirectional? To allow traffic to flow from a1 to b3 while also allowing traffic to flow from a3 to, say, d1? You would âfoldâ the fabric. This does not involve changing the physical layout at all; folded Clos fabrics are not drawn any differently than non-folded Clos fabrics. They look identical on paper (stop drawing them as if putting a1 and a3 on the same side of the fabric makes them into a âtwo-stage folded fabricââthis is not what âfoldedâ means). The primary change here is in the âcontrol plane,â or the scheduler, which must be modified to allow traffic to flow in either direction. Packet switching is almost always bidirectional, so all Clos fabrics used in packet switched networks are âfoldedâ in this way.
Another interesting pointâyou cannot really perform âacceptance flow controlâ on a packet-switched network, so there is no way to make the fabric ânonblocking.â In a circuit-based network, the number of edge ports can be tied to the amount of cross-sectional bandwidth available in the fabric. You can over-subscribe the fabric by provisioning more ports on the edge than the cross-sectional bandwidth of the fabric itself can support, of course, which makes the fabric non-contending, rather than non-blocking. In a packet-switched network, it just is not possible to perform reliable admission control; because of this, packet switched Clos fabrics, then, are always non-contending rather than non-blocking.
So the next time you put CLOS in a document, or on a white board, donât. Itâs a Clos fabric, not a CLOS fabric. Itâs not non-blocking, and you do not âfoldâ it by drawing it with two stages. Clos fabrics always have an odd number of stages. There is a mathematical reason for this, but this post is already long enough.
Another problem with our common usage of these terms is we call every kind of spine-and-leaf (or leaf-and-spine) fabric a Clos, and we have started calling all kinds of networks, including overlay networks, “fabrics.” This post is already long (as above), so I will leave these for the future.
If you liked this short article, and would like to understand more about fabrics and fabric design, please join me for me upcoming Data Center Design live webinar on Safari Books. I will cover this history there, but I will also cover a lot of other things involved in designing data center fabrics.
