Can I2RS Keep Up? (I2RS Performance)

What about I2RS performance?

The first post in this series provides a basic overview of I2RS; there I used a simple diagram to illustrate how I2RS interacts with the RIB—

rib-fib-remote-proxy

One question that comes to mind when looking at a data flow like this (or rather should come to mind!) is what kind of performance this setup will provide. Before diving into the answer to this question, though, perhaps it’s important to ask a different question—what kind of performance do you really need? There are (at least) two distinct performance profiles in routing—the time it takes to initially start up a routing peer, and the time it takes to converge on a single topology and/or route change. In reality, this second profile can be further broken down into multiple profiles (with or without an equal cost path, with or without a loop free alternate, etc.), but for our purposes I’ll just deal with the two broad categories here.

If your first instinct is to say that initial convergence time doesn’t matter, go back and review the recent Delta Airlines outage carefully. If you are still not convinced initial convergence time matters, go back and reread what you can find about that outage. And then read about how Facebook shuts down entire data centers to learn what happens, and think about it some more. Keep thinking about it until you are convinced that initial convergence time really matters. 🙂 It’s a matter of “if,” not “when,” where major outages like this are concerned; if you think your network taking on the order of tens of minutes (or hours) to perform initial convergence so applications can start spinning back up is okay, then you’re just flat wrong.

How fast for initial convergence is fast enough? Let’s assume we have a moderately sized data center fabric, or larger network, with something on the order of 50,000 routes in the table. If your solution can install routes on the order of 8,000 routes in ten seconds in a lab test (as a recently tested system did), then you’re looking at around a minute to converge on 50,000 routes in a lab. I don’t know what the actual ratio is, but I’d guess the “real world” has at least a doubling effect on route convergence times, so two minutes. Are you okay with that?

To be honest, I’m not. I’d want something more like ten seconds to converge on 50,000 routes in the real world (not in a lab). Let’s think about what it takes to get there. In the image just above, working from a routing protocol (not an I2RS object), we’d need to do—

  • Receive the routing information
  • Calculate the best path(s)
  • Install the route into the RIB
  • The RIB needs to arbitrate between multiple best paths supplied by protocols
  • The RIB then collects the layer 2 header rewrite information
  • The RIB then installs the information into the FIB
  • The FIB, using magic, pushes the entry to the forwarding ASIC

What is the point of examining this process? To realize that a single route install is not, in fact, a single operation performed by the RIB. Rather, there are several operations here, including potential callbacks from the RIB to the protocol (what happens when BGP installs a route for which the next hop isn’t available, but then becomes available later on, for instance?). The RIB, and any API between the RIB and the protocol, needs to operate at about 3 to 4 times the speed at which you expect to be able to actually install routes.

What does this mean for I2RS? To install, say, 50,000 routes in 10 seconds, there needs to be around 200,000 transactions in that 10 seconds, or about 20,000 transactions per second. Now, consider the following illustration of the entire data path the I2RS controller needs to feed routing information through—

i2rs-install-process

For any route to be installed in the RIB from the I2RS controller, it must be:

  • Calculated based on current information
  • Marshalled, which includes pouring it into the YANG format, potentially pushed to JSON, and placed into a packet
  • Transported, which includes serialization delay, queuing, and the like
  • Unmarshalled, or rather locally copied from the YANG format into a format that can be installed into the RIB
  • Route arbitration and layer 2 rewrite information calculation performed
  • Any response, such as an “install successful,” or “route overridden” returned through the same process to the I2RS controller

It is, of course, possible to do all of this 20,000 times per second—especially with a lot of heavy optimization, etc., in a well designed/operated network. But not all networks operate under ideal conditions all the time, so perhaps replacing the entire control plane with a remote controller isn’t the best idea in the world.

Luckily, I2RS wasn’t designed to replace the entire control plane, but rather to augment it. To explain, the next post will begin considering some use cases where I2RS can be useful.

2 Comments

  1. ccna on 26 September 2016 at 4:07 pm

    I think to be fair the fact that routers need to populate tables using traditional protocols (BGP, IS-IS, OSPF) should be taken into account. In this case you can pretty much replace I2RS controller with BGP-RR or eBGP peer in case of BGP or link state neighbors. Your marshal/unmarshal interface then will be byte protocol and I2RS agent – local router routing daemon. Next your routing protocol daemon or I2RS agent talks to RIB API or FIB API. Not a huge difference overall. So I think that the comparison and thus feasibility conclusions should be comparative rather than absolute.

    Secondly, it should be noted that on the “server side” (I2RS controller) you may expect to find a medium to high end x86 server, for which handling 100000-ish transactions/sec is really peanuts. For instance NGINX web server is known to operate in the ranges of 100k-500k transactions a second. On the other hand it’s common to see routers/switches being equipped with low-energy-class single-core MIPS CPU, which never come close to server in terms of performance. Having that said it’s an open question whether your receiving router, probably equipped with same garbage class CPU, can chew up incoming stream.

    The real difference, if I get I2RS right, lies in how well your marshaling protocol is implemented. If it’s JSON, it’ll probably be a magnitude slower than byte oriented traditional routing protocols. If it happens to be gRPC or Thrift, I guess you can get very competitive compute figures.

    Maybe I’m totally wrong here, so appreciate your reply.



    • Russ on 1 October 2016 at 3:43 pm

      I would agree with your thinking on this — but this still leaves the question of formatting the data. For instance, gPRC might be faster because you can build the data on the server in a way that doesn’t require so much “unmarshalling” on the client, but then either —

      1. The server must adapt the way it pushes data into the gPRC stream on a per client basis, which is (probably) going to eat any savings on the transport and install side of things, or —

      2. Every hardware/software maker of “white box” devices must use the same data models for their tables, which can (potentially) squash innovation, etc.

      Imagine, for the first one, that every web browser actually used a different way to express precisely the same things — web servers would the face the same quandry. As it is, we’ve standardized the protocol on the wire, and left optimization up to the receiver, which is probably a good decision, but — can you get the type of performance required out of a routing system with this solution? Possibly, but I’m not certain how.

      We always come back to the same sort of problem — TLVs, or fixed length fields. We can overcome the problem with faster CPUs, of course, but then we ramp up the number of table entries, and end up right back in the same place again.

      🙂

      Russ