History of Networking: Policy with Joel Halpern

Policy at Internet scale is a little understood, and difficult (potentially impossible) to solve problem. Joel Halpern joins the History of Networking over at the Network Collective to talk about the history of policy in the Internet at large, and networked systems in general.

Enterprise versus Provider?

Two ideas that are widespread, and need to be addressed—

FANG (read this hyper/web/large scale network operators) have very specific needs; they run custom-built single-purpose software in a very big scale. So all the really want/need are dumb boxes and smart people. … Enterprise have another view, they want smart boxes run by dumb people.

First, there is no enterprise, there are no service providers. There are problems, and there are solutions.

When I was young (and even more foolish than I am now) I worked for a big vendor. When this big vendor split the enterprise and service provider teams, I thought this kindof made sense. After all, providers have completely different requirements, and should therefore run with completely different technologies, equipment, and software. When I thought of providers in those days, I thought of big transit network operators, like AT&T, and Verizon, and Orange, and Level3, and Worldcom, and… The world has changed since then, but our desire to split the world into two neat halves has not.

If you want to split the world into two halves, split it this way: There are companies who consider the network an asset, and companies that consider the network a necessary evil. There are companies who consciously depend on the network within their product lifecycle and value chain, and there are companies who see the network as a consumer of money which is best minimized. This has nothing to do with “service provider” and “enterprise,” and everything to do with the company’s attitude towards technology and their future.

Second, the smart boxes/dumb people smart people/dumb boxes pairings is a false dichotomy.

All networks rely on having smart people design and run them. There are two ways you can access the smart people your network needs. You can hire a small group of smart people and allow them to work in the open source/open standards communities. This way you build a community that supports a lot of businesses, including yours. Or you can rely on your vendor to hire the right smart engineers, call them in when you need them, and hope they show up. Both models have positive and negative aspects, but the assumption that there is no cost sharing model in the realm of directly hiring smart engineers distorts the tradeoffs; distroted tradeoffs always lead to poor decisions.

Sometimes smart engineers can design things so you do not need smart boxes. Rather than hiring someone to build the smarts you will be missing by not buying from a vendor, you ask, do I really need this complexity in the first place?

The bottom line.

In my experience, most companies that use the “smart boxes/dumb engineers” line do not understand their business, their operating environment, or network engineering. This response normally comes from either a misunderstanding of the value of the network, a misunderstanding of the value of simplicity, or a fear of smart network engineers (they might actually push back against the application developers and vendors!).

It is much easier to scream at a vendor than it is to change the way you do business to take advantage of the network as an asset.

Another great reaction to this article can be found here

On the ‘net: Rethinking Firewalls

In January of 1995, Network Translation’s PIX firewall received the “hot product of the year” award from Data Communications Magazine. While the PIX was originally designed to perform Network Address Translation (NAT), doing for the IP host market what the PBX market did for the telephone, the PIX itself quickly morphed into the original appliance-based firewall. In those heady days in the Cisco Technical Assistance Center (TAC), we spent hours thinking through how best to build a Demilitarized Zone (DMZ) using PIX’s and routers so the network simply could not be penetrated. We built walls around our networks to defend them against the hoards of horseback riding invaders. @ECI

The DNS Negative Cache

Considering the DNS query chain—

  • A host queries a local recursive server to find out about banana.example
  • The server queries the root server, then recursively the authoritative server, looking for this domain name
  • banana.example does not exist

There are two possible responses in this chain of queries, actually. .example might not exist at all. In this case, the root server will return a server not found error. On the other hand, .example might exist, but banana.example might not exist; in this case, the authoritative server is going to return an NXDOMAIN record indicating the subdomain does not exist.

Assume another hosts, a few moments later, also queries for banana.example. Should the recursive server request the same information all over again for this second query? It will unless it caches the failure of the first query—this is the negative cache. This negative cache reduces load on the overall system, but it can also be considered a bug.

Take, for instance, the case where you set up a new server, assign it banana.example, jump to a host and try to connect to the new server before the new DNS information has been propagated through the system. On the first query, the local recursive server will cache the nonexistence of banana.example, and you will need to wait until this negative cache entry times out before you can reach the newly configured server. If the time required to propagate the new DNS information is two seconds, you query after one second, and the negative cache is sixty seconds, the negative cache will cost you fifty-eight seconds of your time.

How long will a recursive server keep a negative cache entry? The answer depends on the kind of response it received in its initial attempt to resolve the name. If server not found is the response, then negative cache timeout is locally configured. If an NXDOMAIN record is returned, the negative cache is set to timeout based on the timeout found in the SOA.

So, first point about negative caching in DNS: if you are dealing with a local DNS server for internal lookups on a data center fabric or campus network, it might improve the performance of applications and the network in general to turn off negative caching for the local domains. DNS turnaround times can be a major performance bottleneck in application performance. In turning off negative caching for local resources, you are trading processing power on your DNS server against reduced turnaround times, particularly when a new server or service is brought up.

The way a negative cache is built, however, seems to allow for a measure of inefficiency. Assume three subdomains exist as part of .example:

  • apple.example
  • orange.example
  • pear.example

A hosts queries for banana.example, and the recursive server, on receiving an NXDOMAIN response that this subdomain does not exist, build a negative cache with ,code>banana.example. A few moments later, some other host (or the same host) queries for cantaloupe.example. Once again, the recursive server discovers this subdomain does not exist, and builds a negative cache entry. If the point of the negative cache is to reduce the workload on the DNS system, it does not seem to be doing its job. A given host, in fact, could use a good deal of processing power by requesting one domain after another, forcing the recursive server to discover whether or not the subdomain exists.

RFC8198 proposes a way to resolve this problem by including more information in the response to the recursive server. Specifically, given DNSSEC signed zones (to ensure no-one is poisoning the cache to force the building of a large negative cache in the recursive server), an answering DNS server can provide a list of the two domain names on either side of the missing queried domain name.

In this case, a host queries for banana.example, and the server responds with a the pair of subdomains surrounding the request subdomain—apple.example and orange.example. Now when the recursive server receives a request for cantaloupe.example, it can look into its negative cache and immediately see there is no such domain in the place where it should exist. The recursive server can now respond with a “no server found,” without sending queries to any other upstream server.

This aggressive form of negative caching can reduce the workload of upstream servers, and close an attack surface that might be used for denial of service attacks.

Reaction: The Pace of Innovation

Dave Ward has an excellent article over at the Cisco blog on the three year journey since he started down the path of trying to work the standards landscape (called SDOs) to improve the many ways in which these organizations are broken. Specifically, he has been trying to connect the open source and open standards communities better—a path I heartily endorse, as I have been intentionally trying to work in both communities in parallel over the last several years, and find places where I can bring them together.

While the entire blog is worth reading, there are two lines I think need some further thought. The first of this is a bit of a scold, so be prepared to have your knuckles rapped.

My real bottom line here is that innovators can’t go faster than their customers and customers can’t go faster than their own understanding of the technology and integration, deployment and operational considerations.

Precisely. Maybe this is just an old man talking, but I sometimes want to scold the networking industry on this very point. We fuss about innovation, but innovation requires customers who understand the technology—and the networking world has largely become a broad set of meta-engineers, interspersed with the occasional engineer. It could be there are just as many engineers as there have always been, but there are so many more administrators calling themselves engineers that it doesn’t not seem like it—but this is not my sense of where we have come since I first started working on the McGuire AFB backbone in in the late 1980’s.

My time in the Cisco TAC compared to today is telling; jbash used to open cases with the stack trace decodes in his initial email. Rodney Dunn and I used to speak to standing room only crowds on router architecture. Today, on the other hand, I go to sessions where speakers are talking about router architecture, and they are flat out wrong on how these things work. When I ask why, the speaker says “this is all this crowd is ever going to understand.” This sort of cycle is self-reinforcing. Another sign: I’m told my audience is so small because most networking folks “just don’t see the point.” They can’t see how to apply the theory I teach to the “real world,” which is driven by vendor product and configuration.

If you want innovation, be an educated consumer.

A corollary: if you want to effectively use innovation, you need to be an educated consumer. To innovate, there needs to be a tight interaction between the consumer and the creator. This is why car manufacturers sponsor racing teams, for instance—they need educated consumers to interact with in order to invent.

The second line that caught my eye was this one—

And, we need to reduce the fracturing of the industry because, in this interim period, a technology landscape has evolved that is littered with “Stacks”, “Controllers”, and “Virtual Fubars”.

This is a real problem. I think it is a product of the first problem Dave mentions—the uneducated consumer. The tendency, in the networking world, is to chase after new solutions, and to layer solutions one on top of the other. The result is often massively intertwined complexity, as we create new layers to hide the complexity of the layers below the current one. We often don’t even understand how the underlying layer works, we just assume it will. Further, we assume there will be no interaction between the layer we are smearing on top and the underlying layers we built before.

The result is a system hardened to the point of ossification. These brittle systems fail when the wind blows the wrong way, causing much heartache and distrust among our users. What is our solution?

Smear another layer on top.

Further, this kind of thinking leads to the market fragmentation Dave is talking about. As each layer becomes more complex, only “specialists” know how to work on it, creating silos.

If you want to see the market innovate, and you want to break down the silos being built right now, and you want to reduce the complexity in your network, there is a solution at hand. It is not an easy solution; it requires real work.

Get educated. Learn how this stuff works.

Weekend Reads 020918: The Usual Stuff

The Linux Foundation which has been host to many leading open source networking projects, felt the need to streamline all its various ventures, informed Arpit Joshipura, general manager of networking and orchestration at The Linux Foundation. The six founding open source projects involved in the LFN are FD.io, OpenDaylight, Open Network Automation Platform (ONAP), Open Platform for NFV (OPNFV), PDNA and Streaming Network Analytics System. An additional 83-member organization is participating in LFN. This is significant because members of the Linux Foundation can choose whether they want to join LFN, and they can participate in as many or as few of the projects as they want. —Syeda Beenish @OpenSource

The plunder of more than $500 million worth of digital coins from the Japanese cryptocurrency exchange Coincheck last week has added to a growing perception that cryptocurrencies are particularly vulnerable to hackers. It’s an expensive reminder that like many things in the cryptocurrency world, security technologies—and the norms, best practices, and rules for using them—are still emerging. Not least because of its enormous size, the Coincheck hack could go down as a seminal moment in that process. —Mike Orcutt @Technology Review

The days of pointing to a file cabinet and telling your loved ones “everything is there when the time comes” are fading fast. In today’s world of technology, digital assets are becoming a more important part of a person’s estate. If these assets are not included in an estate plan, grieving survivors can be left without access to loved one’s online accounts. —Casey Dowd @Marketwatch

Retailers are in the throes of adapting to a shopping environment completely revolutionized by e-commerce giant Amazon (AMZN), but restaurants are now facing the same challenge thanks to what has become known as the “Amazon effect.” Delivery apps, including Amazon Restaurants, Uber Eats and Grubhub (GRUB) are transforming the dining experience for consumers, and participation isn’t usually a choice for those restaurants hoping to survive. —Brittany De Lea @Fox News

While in graduate school in mathematics at the University of Wisconsin-Madison, I took a logic course from David Griffeath. The class was fun. Griffeath brought a playfulness and openness to problems. Much to my delight, about a decade later, I ran into him at a conference on traffic models. During a presentation on computational models of traffic jams, his hand went up. I wondered what Griffeath – a mathematical logician – would have to say about traffic jams. He did not disappoint. Without even a hint of excitement in his voice, he said: ‘If you are modelling a traffic jam, you should just keep track of the non-cars.’ —Scott E Page @intellectual Takeout

How do lawsuits grow our understanding of the risks and harms of new technologies? What incentives do they offer corporations to ensure the safety of their products? —J. Nathan Matias @Medium

Before videotex (the predecessor of the internet) arrived in the late 1970s early 1980s, 90% of telecommunications revolved around telephone calls. And at that time telephony was still a luxury for many, as making calls were expensive. I remember that in 1972 a telephone call between London and Amsterdam cost one pound per minute. Local telephone calls were timed, and I still remember shouts from my parents when I was on a call to my girlfriend — ‘don’t make it too long’ and ‘get off the phone.’ —Paul Budde @CircleID

The recently released BIND version 9.12 includes an implementation of RFC8198 – ‘Aggressive Use of DNSSEC-Validated Cache’. While the purpose of this specification may not be immediately clear from its title, its intent is to improve the efficiency of the DNS protocol by allowing a DNSSEC-validating resolver to self-synthesize certain DNS responses — without referring to an authoritative server — if suitable NSEC/NSEC3[1] records already exist in the resolver’s cache. —Ray Bellis @APNIC

When asked to think about how new inventions might shape the future, says economist Tim Hartford, our imaginations tend to leap to technologies that are sophisticated beyond comprehension. But the reality is that most influential new technologies are often humble and cheap and new inventions do not appear in isolation… —Joe Carter @Acton


Some Market Thoughts on the Broadcom SDKLT

Broadcom, to much fanfare, has announced a new open source API that can be used to program and manage their Tomahawk set of chips. As a general refresher, the Tomahawk chip series is the small buffer, moderate forwarding table size hardware network switching platform on which a wide array of 1RU (and some chassis) routers

Weekends Reads 020218: GDPR, taxes, and security

The regulatory environment for brands and retailers that do business online is getting stricter thanks to regulatory changes in Europe with the General Data Protection Regulation (GDPR), as well as existing regulations in th ompanies that adapt quickly can turn these changes into a competitive advantage. —Christopher Rence @CircleID Europe's General Data Protection Regulation (GDPR)

Rehashing Certifications

While at Cisco Live in Barcelona this week, I had a chat with someone—I don't remember who—about certifications. The main point that came out of the conversation was this: One of the big dangers with chasing a certification is you will end up chasing knowledge about using a particular vendor feature set, rather than chasing

Giving the Monkey a Smaller Club

Over at the ACM blog, there is a terrific article about software design that has direct application to network design and architecture. The problem is that once you give a monkey a club, he is going to hit you with it if you try to take it away from him. What do monkeys and clubs

Learning to Ask Questions

A lot of folks ask me about learning theory—they don't have the time for it, or they don't understand why they should. This video is in answer to that question.

Weekend Reads 012618: Mostly Security and Legal Stuff

Before we begin, its worth mentioning that yes, yesssssssssssssssssssss, I did not have enough protection around my Gmail account. I’ve used Google Authenticator before, for my personal account and for various work emails, but I stopped using it at a certain point out of convenience. —Cody Brow @Medium This report assesses the impact disclosure of

One Weird Trick

I'm often asked what the trick is to become a smarter person—there are many answers, of course, which I mention in this video. But there is "one weird trick" many people don't think about, which I focus on here.

Responding to Readers: How are these thing discovered?

A while back I posted on section 10 routing loops; Daniel responded to the post with this comment: I am curious how these things are discovered. You said that this is a contrived example, but I assume researchers have some sort of methodology to discover issues like this. I am sure some things have been

Responding to Readers: Automated Design?

Deepak responded to my video on network commodization with a question: What’s your thoughts on how Network Design itself can be Automated and validated. Also from Intent based Networking at some stage Network should re-look into itself and adjust to meet design goals or best practices or alternatively suggest the design itself in green field