snaproute Go BGP Code Dive (13): Finding the tail of the update chain

snaproute Go BGP Code Dive (13): Finding the tail of the update chain

Just in time for Hallo’ween, the lucky thirteenth post in the BGP code dive series. In this series, we’re working through the Snaproute Go implementation of BGP just to see how a production, open source BGP implementation really works. Along the way, we’re learning something about how larger, more complex projects are structured, and also something about the Go programming language. The entire series can be found on the series page.

In the last post in this series, we left off with our newly established peer just sitting there sending and receiving keepalives. But BGP peers are not designed just to exchange random traffic, they’re actually designed to exchange reachability and topology information about the network. BGP carries routing information in updated, which are actually complicated containers for a lot of different kinds of reachability information. In BGP, a reachable destination is called an NLRI, or Network Layer Reachability Information. Starting with this code dive, we’re going to look at how the snaproute BGP implementation processes updates, sorting out NLRIs, etc.

When you’re reading through code, whether looking for a better understanding of an implementation, a better understanding of a protocol, or even to figure out “what went wrong” on the wire or in the network, the hardest part is often just figuring out where to start. In the first post in this series (as my dog would say, “much time ago—in fact, forever, I counted!”), I put the code on a box and ran some logging. The logging gave me a string to search for in the code base, which then led to the beginning of the call chain, and the unraveling of the finite state machine.

This time, being more familiar with the code base, we’re going to start from a different place. We’re going to guess where to start. 🙂 Given—

  • BGP works entirely around a finite state machine in this implementation (which is, in fact, how it operates in most implementations)
  • Updates are really only processed while the peering BGP speaker is in the established state
  • Receiving an update is what we’d probably consider an event

We’ll take a look at the code in the FSM for processing events while the peering relationship is in the established state to see what we can find. The relevant function in the FSM begins around line 673 (note the line numbers can change for various reasons, so this is just a quick reference that might not remain constant over time)—

func (st *EstablishedState) processEvent(event BGPFSMEvent, data interface{}) {
 switch event {

The top of this event gives us a large switch statement, each of which is going to relate to a specific event. The events are—

case BGPEventManualStop:
case BGPEventAutoStop:
case BGPEventHoldTimerExp:
case BGPEventKeepAliveTimerExp:
case BGPEventTcpConnValid:
case BGPEventTcpCrAcked, BGPEventTcpConnConfirmed:
case BGPEventTcpConnFails, BGPEventNotifMsgVerErr, BGPEventNotifMsg:
case BGPEventBGPOpen:
case BGPEventOpenCollisionDump:
case BGPEventUpdateMsg:
case BGPEventUpdateMsgErr:
case BGPEventConnRetryTimerExp, BGPEventDelayOpenTimerExp, BGPEventIdleHoldTimerExp, 
  BGPEventOpenMsgErr, BGPEventBGPOpenDelayOpenTimer, BGPEventHeaderErr:

 

In fact, the event we’re interested in is included in the list of events while a peer is in the established state. It’s around line 734—

case BGPEventUpdateMsg:
  st.fsm.StartHoldTimer()
  bgpMsg := data.(*packet.BGPMessage)
  st.fsm.ProcessUpdateMessage(bgpMsg)

case BGPEventUpdateMsgErr:
  bgpMsgErr := data.(*packet.BGPMessageError)
  st.fsm.SendNotificationMessage(bgpMsgErr.TypeCode, bgpMsgErr.SubTypeCode, bgpMsgErr.Data)
  st.fsm.StopConnectRetryTimer()
  st.fsm.ClearPeerConn()
  st.fsm.StopConnToPeer()
  st.fsm.IncrConnectRetryCounter()
  st.fsm.ChangeState(NewIdleState(st.fsm))

 

The first of the two case statements here deals with receiving an actual update; the second with an update error. The process of handling an update is really only three lines of code.

st.fsm.StartHoldTimer() resets the hold timer. Updates count as keepalives in BGP, allowing active peers to skip sending keepalive messages, saving bandwidth and processing power.

bgpMsg := data.(*packet.BGPMessage) copies the actual BGP message just received into a local data structure. This is a little bit of magic we’ll look at more closely in the next post.

st.fsm.ProcessUpdateMessage(bgpMsg) processes the actual BGP message that was just copied into a local structure. Again, chasing through this is something that will need to wait until a later post.

For now, we’ve found the tail of the update processing chain in the snaproute BGP code—more next time.