Uncategorized
Weekend Reads 041520
…as always, I’ve saved potentially controversial articles for the end…
Weekend Reads 040320
Weekend Reads 011020
Service Provider Tech Doesn’t Apply?
Service provider problems are not your problems. You should not be trying to solve your problems the same way service providers do.
This seems intuitively true—after all, just about everything about a train or a large over-the-road truck (or lorry) is different from a passenger car. If the train is the service provider network and the car is the “enterprise” network, it seems to be obvious the two have very little in common.
Or is it?
What this gets right is that if an operator sells access to their network, or a single application, their network is likely to be built differently than the more general-purpose designs used in organizations that must support a wide range of applications and purposes. These differences are likely to show up in the choice of hardware, how the network is operated, and the kinds of services offered (or not).
What this gets right is operators who sell access to their networks, or support a single application, always seem to build at a scale far beyond what more general-purpose networks ever reach. Microsoft and Facebook number their servers in the millions, and single purchase orders include thousands of routers. eBay and LinkedIn number their servers in the hundreds of thousands, and their routers and switches in the tens of thousands. How can a small enterprise network of a few hundred servers be anything like these larger networks?
What this gets wrong is assuming none of the technologies, tools, or attitudes from these larger-scale networks is every applicable to the smaller networks many engineers encounter on a day-to-day basis.
All those networks with BGP deployed in their data center fabrics are using technology designed primarily for interconnecting intermediate systems on the default-free zone—in other words, for connecting the networks of transit service providers. All those networks with OSPF deployed are using a link state protocol originally designed to provide edge-to-edge reachability in transit service provider networks. All those networks with IS-IS deployed are using a link state protocol originally designed to provide connectivity to large-scale telephony-style networks.
What about transport technologies? The only transport technologies originally designed specifically for “enterprise use” have long since been replaced by optical technologies designed for large-scale provider or “hyperscale” use. Token Ring and ARCnet are long gone, as is the original shared medium Ethernet, replaced by switched Ethernet largely over optical transport. Even current general WiFi is primarily designed for public operator use cases—look at 5G and WiFi 6 and note how public operator requirements have influenced these technologies.
The truth is there is no “pure” enterprise technology; following the dictum that you should not use “service-provider technologies” in your network would leave you with … no network at all.
There is a second realm where this line of argument falls flat, and its more important than the question of which technologies to use: the techniques and attitudes learned in the operation of truly large-scale networks hold valuable lessons for all network engineers. Should you use a spine and leaf topology in your data center, rather than a more traditional hierarchical design? The answer has nothing to do with scale, and everything to do with flexibility in design and operational agility. Should you automate your network, even if its only ten routers? The answer has nothing to do with what Amazon is doing, and everything to do with how much time you want to spend on configuring and troubleshooting versus responding to real business needs.
Think of it this way: the driver who drives the large over-the-road truck is still going to learn lessons and instincts about driving that will make them a better driver in a minivan.
Come join me at NXTWORK in November to continue the conversation in my master class on building and operating data center fabrics, as I explore how you can apply lessons from the hyperscale world to your network.
Weekend Reads 091319
The BGP Monitoring Protocol (BMP)
If you run connections to the ‘net at any scale, even if you are an “enterprise” (still a jinxed term, IMHO), you will quickly find it would be very useful to have a time series record of the changes in BGP at your edge. Even if you are an “enterprise,” knowing what changes have taken place in the routes your providers have advertised to you can make a big difference in tracking down an application performance issue, or knowing just when a particular service went off line. Getting this kind of information, however, can be difficult.
BGP is often overloaded for use in data center fabrics, as well (though I look forward to the day when the link state alternatives to this are available, so we can stop using BGP this way). Getting a time series view of BGP updates in a fabric is often crucial to understanding how the fabric converges, and how routing convergence events correlate to application issues.
One solution is to set up the BGP Monitoring Protocol (BMP—an abbreviation within an abbreviation, in the finest engineering tradition).
BMP is described in RFC7854 as a protocol intended to “provide a convenient interface for obtaining route views.” How is BMP different from setting up an open source BGP process and peering with all of your edge speakers? If you peer using eBGP, you will not see parroted updates unless you look for them; if you peer using iBGP, you might not receive all the updates (depending on how things are configured). However you peer, you will not get a “time series” view of the updates along your edge that can be correlated with other events in your network. Any time you peer using BGP, you are receiving routes after bestpath.
When you pull a BMP feed, in contrast, you are getting the BGP updates as the speaker sees them—before bestpath, before inbound filters, etc. This means you receive a full feed just as the edge speaker receives it. This is all provided in a format that is easily pushed into a database and correlated through timestamps—a huge wealth of information that can be quite useful not only to monitor the health of your network edge, but for troubleshooting. BMP includes messaging for:
- An initial dump of the current BGP table, called route monitoring
- Peer down notification, including a code indicating why the peer went down
- Stat reports, including number of prefixes rejected by inbound policy, number of duplicate prefixes, number of duplicate withdraws, etc.
- Peer up notification
- Route mirroring, in which the speaker sends copies of updates it is receiving
To set BMP up, you need to start with a BGP speaker that supports sending a BMP feed. Juniper supports BMP, as does Cisco. The second thing you will need is a BMP collector, a handy open source version of which is available at openbmp.org.
You will note that the openbmp collector has interfaces to a RESTful database interface, as well as a KAFKA producer. One of these two interfaces should allow you to tie BMP into your existing network management system, or set up a new database to collect the information.
BMP is becoming a bit of an ecosystem in its own right; the GROW working group has already a draft to extend BMP to report on the local routing table, which would allow you to see what is received by BGP but not installed. Another draft accepted by the GROW WG extends BMP to support the adj-rib-out, which would allow you to see the difference between what a BGP speaker receives and what it sends to its peers.
Several other drafts have been proposed but not accepted by GROW, including adding a namespace for the BGP peer up event, and using BMP as part of a BGP based traffic engineering system.
Hopefully, at some point in the future, I’ll be able to follow this post up with a small lab showing what BMP looks like in operation. For now, though, you should definitely try setting BMP up in your network if you have any sort of ‘net edge scale, or a data center using BGP as its IGP.
Is it Balance, or Workism?
While we tend to focus on work/life balance, perhaps the better question is: how effective are we at using the time we use for work? From a recent study (which you may have already seen):
- Workers average just 2 hours and 48 minutes of productive device time a day
- 21% of working hours are spent on entertainment, news, and social media
- 28% of workers start their day before 8:30 AM (and 5% start before 7 AM)
- 40% of people use their computers after 10 PM
- 26% of work is done outside of normal working hours
- Workers average at least 1 hour of work outside of working hours on 89 days/year (and on ~50% of all weekend days)
- We check email and IM, on average, every 6 minutes
This is odd—we are starting work earlier, finishing later, and working over weekends, but we still only “work” less than three hours a day.
The first question must be: is this right? How are they measuring productive versus unproductive device time? What is “work time,” really? I know I don’t keep any sort of recognizable “office hours,’ so it seems like it would be hard to measure how much time I spend on the weekend versus during the “work day.”
On the other hand, no matter how flawed they might be, these numbers are still interesting. They do not, it seems to me, necessarily tell of “overwork.” Instead, they tell a tale of spending a lot of time work while not actually getting anything done.
Here is the thing: we already all know the strategies we could use to help bring the productive time up, the nonproductive time down, and “personal time” up. I try to macrotask as much as possible—take on one job for as long as it takes to reach either my limit of being able to focus on it or a point where I need to stop to do something else. During this time, I try not to look at social media, email, etc. There are commercial solutions to help you focus, as well.
So if we know there is a problem, and we know there are solutions, why don’t we fix this?
The first option—we don’t think this is really a problem. For instance, it could be that we don’t understand our own behaviors well enough to realize we are killing our own productivity by checking email constantly.
A second option—We are more afraid of missing out than we are of not getting anything done. Or perhaps we are replacing actual productivity with having an empty inbox, or a caught up news feed. Maybe we are afraid to just delete all the email we’ve not read, or mark the entire slack channel read without actually reading it.
A third option—these technologies are addictive.
Any of these will do, of course, and they are all probably partly. But I think there is another problem at the root of all of these, a problem we don’t want to talk about because it isn’t something you say in polite company. Perhaps—just maybe—the problem goes back to a spiritual ailment. Maybe we are trying to build the meaning of our lives around work.Maybe we need to realize just how much workism has infected our lives—our attachment work as the primary means through which we gain meaning in life.
And that problem, I think, is a bit harder to solve than just installing an application to rule the other applications, forcing you to focus.