Holiday Weekend Reads (22nov17)

The coming holiday is cutting my publishing schedule short, but I didn’t want to leave too many interesting stories on the cutting room floor. Hence the weekend read comes early this week, and contains a lot more stuff to keep you busy for those couple of extra days. For the long weekend, I have five on security and one on culture. Enjoy!

This first read is about the US government’s collection and maintenance of security vulnerabilities. This is always a tricky topic; if a government knows about security vulnerabilities, there is at least some chance some “bad actor” will, as well. While the government might want to hoard such knowledge, in order to be more effective at breaking into systems, there is at least some possibility that refusing to release information about the vulnerabilities could lead to them not being fixed, and therefore to various systems being comrpomised, resulting in damage to real lives. The US government appears to be rethinking their use and disclosure of vulnerabilities

There can be no doubt that America faces significant risk to our national security and public safety from cyber threats. During the past 25 years, we have moved much of what we value to a digital format and stored it in Internet-connected devices that are vulnerable to exploitation. This risk is increasing as our dependence on technology and the data we store continues to grow such that technology now connects nearly every facet of our society and the critical services that sustain our way of life. This fact is not lost on criminal actors and adversarial nation states who discover and exploit existing flaws in software, hardware, and the actions of legitimate users to steal, disrupt, and destroy data and services critical to our way of life. — The White House


A team of government, industry and academic officials successfully demonstrated that a commercial aircraft could be remotely hacked in a non-laboratory setting last year, a U.S. Department of Homeland Security (DHS) official said Wednesday at the 2017 CyberSat Summit in Tysons Corner, Virginia. — Calvin Biesecker @ Aviation Today

For years, I researched and wrote about the State Longitudinal Database Systems (SLDS) here in Oklahoma and across the nation (here, here and here), warning that these ill-advised legislative efforts to codify “transparency and accountability” in public schools would end up creating what could only be considered a national database. — Jenni White @ The Federalist

When Apple released the iPhone X on November 3, it touched off an immediate race among hackers around the world to be the first to fool the company’s futuristic new form of authentication. A week later, hackers on the actual other side of the world claim to have successfully duplicated someone’s face to unlock his iPhone X—with what looks like a simpler technique than some security researchers believed possible. — Andy Greenberg @ Wired

Jake Williams awoke last April in an Orlando, Fla., hotel where he was leading a training session. Checking Twitter, Mr. Williams, a cybersecurity expert, was dismayed to discover that he had been thrust into the middle of one of the worst security debacles ever to befall American intelligence. Mr. Williams had written on his company blog about the Shadow Brokers, a mysterious group that had somehow obtained many of the hacking tools the United States used to spy on other countries. Now the group had replied in an angry screed on Twitter. — NY Times

In the wake of the 2015 San Bernardino massacre, the FBI, having failed to open the suspect’s iPhone, turned to Apple, demanding that it break the device’s encryption. Much posturing ensued, in the media, in Congress, and in the Court. During his Congressional testimony, FBI director James Comey (remember him?) was especially aggressive in his misrepresentations: “This will be a one-time-only break-in, we’re not interested in a master key that will unlock Apple’s encryption. — Jean-Louis Gassée @ Monday Note

If you’ve ever had “that sick, sad, cold, wet feeling that you have no idea what you’re doing, you’re going to get caught, and it’s all going to be terrible,” you’re may be experiencing imposter syndrome, says Jessica Rose, a former teacher and a self-taught technologist. — Opensource.org

On the ‘net: Three for the week

Because this is a short week, I’m going to combine three places I showed up on other sites recently.

The lessons of trying to prevent failures, rather than live with them, is something I saw this in my time in electronics, as well. World events conspired, at one point, to cause us to push our airfield equipment into operation 24 hours a day, 7 days a week, for a period of about 2 years. For the first 6 months or so, the shop was a very lazy place; we almost took to not coming in to work. We painted a lot of walls, kept the floor really shiny, reorganized all the tools (many times!), and even raised a few shops and sheds at various people’s houses. After this initial stretch, however, the problems started. Every failure turned into a cascading failure, taking many hours to troubleshoot and repair, and many replaced parts along the way. —ECI

I was also featured on the IT Origins series over at Gestalt IT.

A while back yet, but Shawn Zandi and I I also talked to Greg Ferro over at Packet Pushers about white box switching.

Thoughts on Open/R

Since Facebook has released their Open/R routing platform, there has been a lot of chatter around whether or not it will be a commercial success, whether or not every hyperscaler should use the protocol, whether or not this obsoletes everything in routing before this day in history, etc., etc. I will begin with a single point.

If you haven’t found the tradeoffs, you haven’t looked hard enough.

Design is about tradeoffs. Protocol design is no different than any other design. Hence, we should expect that Open/R makes some tradeoffs. I know this might be surprising to some folks, particularly in the crowd that thinks every new routing system is going to be a silver bullet that solved every problem from the past, that the routing singularity has now occurred, etc. I’ve been in the world of routing since the early 1990’s, perhaps a bit before, and there is one thing I know for certain: if you understand the basics, you would understand there is no routing singularity, and there never will be—at least not until someone produces a quantum wave routing protocol.

Ther reality is you always face one of two choices in routing: build a protocol specifically tuned to a particular set of situations, which means application requirements, topologies, etc., or build a general purpose protocol that “solves everything,” at some cost. BGP is becoming the latter, and is suffering for it. Open/R is an instance of the former.

Which means the interesting question is: what are they solving for, and how? Once you’ve answered this question, you can then ask: would this be useful in my network?

A large number of the points, or features, highlighted in the first blog post are well known routing constructions, so we can safely ignore them. For instance: IPv6 link local only, graceful restart, draining and undraining nodes, exponential backoff, carrying random information in the protocol, and link status monitoring. These are common features of many protocols today, so we don’t need to discuss them. There are a couple of interesting features, however, worth discussing.

Dynamic Metrics. EIGRP once had dynamic metrics, and they were removed. This simple fact always makes me suspicious when I see dynamic metrics touted as a protocol feature. Looking at the heritage of Open/R, however, dynamic metrics were probably added for one specific purpose: to support wireless networks. This functionality is, in fact, provided through DLEP, and supported in OLSR, MANET extended OSPF, and a number of other MANET control planes. Support DLEP and dynamic metrics based on radio information was discussed at the BABEL working group at the recent Singapore IETF, in fact, and the BABEL folks are working on integration dynamic metrics for wireless. So this feature not only makes sense in the wireless world, it’s actually much more widespread than might be apparent if you are looking at the world from an “Enterprise” point of view.

But while this is useful, would you want this in your data center fabric? I’m not certain you would. I would argue dynamic metrics are actually counter productive in a fabric. What you want, instead, is basic reacbility provided by the distributed control plane (routing protocol), and some sort of controller that sits on top using an overlay sort of mechanism to do traffic engineering. You don’t want this sort of policy stuff in a routing protocol in a contained envrionment like a fabric.

Which leads us to our second point: The API for the controller. This is interesting, but not strictly new. Openfabric, for instance, already postulates such a thing, and the entire I2RS working group in the IETF was formed to build such an interface (though it has strayed far from this purpose, as usual with IETF working groups). The really interesting thing, though, is this: this southbound interface is built into the routing protocol itself. This design decision makes a lot of sense in a wireless network, but, again, I’m not certain it does in a fabric.

Why not? It ties the controller architecture, including the southbound interface, to the routing protocol. This reduced component flexibility, which means it is difficult to replace one piece without replacing the other. If you wanted to replace the basic functionality of Open/R without replacing the controller architecture at some point int he future, you must hack your way around this problem. In a monolithic system like Facebook, this might be okay, but in most other network environments, it’s not. In other words, this is a rational decision for Open/R, but I’m not certain it can, or should, be generalized.

This leads to a third observation: This is a monolithic architecture. While in most implementations, there is a separate RIB, FIB, and interface into the the forwarding hardware, Open/R combines all these things into a single system. In any form of Linux based network operating system, for instance, the routing processes install routes into Zebra, which then installs routes into the kernel and notifies processes about routes through the Forwarding Plane Manager (FPM). Some external process (switchd in Cumulus Linux, SWSS in SONiC), then carry this routing information into the hardware.

Open/R, from the diagrams in the blog post, pushes all of this stuff, including the southbound interface from the controller, into a different set of processes. The traditional lines are blurred, which means the entire implemention acts as a single “thing.” You are not going to take the BGP implementation from snaproute or FR Routing and run it on top of Open/R without serious modification, nor are you going to run Open/R on ONL or SONiC or Cumulus Linux without serious modification (or at least a lot of duplication of effort someplace).

This is probably an intentional decision on the part of Open/R’s designers—it is designed to be an “all in one solution.” You RPM it to a device, with nothing else, and it “just works.” This makes perfect sense in the wrieless environment, particularly for Facebook. Whether or not it makes perfect sense in a fabric depends—does this fit into the way you manage boxes today? Do you plan on using boxex Faebook will support, or roll your own drivers as needed for different chipsets, or hope the SAI support included in Open/R is enough? Will you ever need segment routing, or some other capability? How will those be provided for in the Open/R model, given it is an entire stack, and does not interact with any other community efforts?

Finally, there are a number of interesting points that are not discussed in the publicly available information. For instance, this controller—what does it look like? What does it do? How would you do traffic engineering with this sytem? Segment routing, MPLS—none of the standard ways of providing virtualization are mentioned at all. Dynamic metrics simply are not enough in a fabric. How is the flooding of information actually done? In the past, I’ve been led to believe this is based on ZeroMQ—is this still true? How optimal is ZeroMQ for flooding information? What kind of telemetry can you get out of this, and is it carried in the protocol, or in a separate system? I assume they want to carry telemtry as opaque information flooded by the protocol, but does it really make sense to do this?

Overall, Open/R is interesting. It’s a single protocol designed to opperate optimally in a small range of environments. As such, it has some interesting features, and it makes some very specific design choices. Are those design choices optimal for more general cases, or even other specific problem spaces? I would argue the architecture, in particular, is going to be problematic in terms of long term maintenance and growth. This can modified over time, of course, but then we are left with a collection of ideas that are available in many other protocols, making the idea much less interesting.

Is it interesting? Yes. Is it the routing singularity? No. As engineers, we should take it for what it is worth—a chance to see how other folks are solving the problems they are facing in day-to-day operation, and thinking about how some of those lessons might be applied in our own world. I don’t think the folks at Facebook would argue any more than this, either.

Weekend Reading: 2017-11-17

There’s an “automation meteor” headed right at us, according to financial adviser and Reformed Broker blogger Josh Brown, who used this troubling “chart o’ the day” from Wharton to show just “how quickly things have changed” over the past decade. And how they’re going to keep on changing. — Sean Langlios @ MarketWatch

Andrew Ng, formerly the head of AI for Chinese search giant Baidu and, before that, creator of Google’s deep-learning Brain project, knows as well as anyone that artificial intelligence is coming for plenty of jobs. And many of us don’t even know it. — Racel Metz @ MIT Technology Review

Cloudflare provides security and domain name services for millions of the most prominent sites on the web. The company has built a solid reputation for its secure encryption and one of the key factors in its system is a wall of 100 lava lamps in the lobby of its San Francisco headquarters. — Rhett Jones @ gizmodo