snaproute Go BGP Code Dive (7): Moving to Connect

In last week’s post, we looked at how snaproute’s implementation of BGP in Go moves into trying to connect to a new peer—we chased down the connectRetryTimer to see what it does, but we didn’t fully work through what the code does when actually moving to connect. To jump back into the code, this is where we stopped—

func (st *ConnectState) processEvent(event BGPFSMEvent, data interface{}) {
  switch event {
  ....
    case BGPEventConnRetryTimerExp:
      st.fsm.StopConnToPeer()
      st.fsm.StartConnectRetryTimer()
      st.fsm.InitiateConnToPeer()
....

When the connectRetryTimer timer expires, it is not only restarted, but a new connection to the peer is attempted through st.fsm.InitiateConnToPeer(). This, then, is the next stop on the road to figuring out how this implementation of BGP brings up a peer. Before we get there, though, there’s an oddity here that needs to be addressed. If you look through the BGP FSM code, you will only find this call to initiate a connection to a peer in a few places. There is this call, and then one other call, here—

func (st *ConnectState) enter() {
  ....
  st.fsm.AcceptPeerConn()
  st.fsm.InitiateConnToPeer()
}

The rest of the instances of InitiateConnToPeer() are related to the definition of the function. This raises the question: why wouldn’t you just call this function directly when moving to connect? In other words, why not call it directly, rather than by setting a timer and calling it when the timer wakes up? One of the prime points of coding coherently is to provide consistent entry and exit points into specific states. The more ways you can enter a state within an FSM, the more confusing the FSM gets, the easier it is to make mistakes when modifying the FSM, and the harder it is to troubleshoot problems with the FSM. If you can construct a code path that funnels every way to get into a single state through a single call, the code will ultimately be easier to understand and maintain.

Now let’s look at what st.fsm.InitiateConnToPeer() actually does—

func (fsm *FSM) InitiateConnToPeer() {
  if bytes.Equal(fsm.pConf.NeighborAddress, net.IPv4bcast) {
    fsm.logger.Info("Unknown neighbor address")
    return
  }
  remote := net.JoinHostPort(fsm.pConf.NeighborAddress.String(), config.BGPPort)
  local := ""

  if strings.TrimSpace(fsm.pConf.UpdateSource) != "" {
    local = net.JoinHostPort(strings.TrimSpace(fsm.pConf.UpdateSource), "0") 
  }
  if fsm.outTCPConn == nil {
    fsm.outTCPConn = NewOutTCPConn(fsm, fsm.outConnCh, fsm.outConnErrCh)
    go fsm.outTCPConn.ConnectToPeer(fsm.connectRetryTime, remote, local)
  }
}

I’ve removed the logging code for clarity—I’ll be removing the logging code consistently throughout this series.

The first step is to determine if we have a valid, reachable peer IP address. This is taken care of by—

if bytes.Equal(fsm.pConf.NeighborAddress, net.IPv4bcast)

If the neighbor address is the same as an IPv4 broadcast address (either 0.0.0.0 or 255.255.255.255), then we don’t have a valid peer address. At this point, we just log the event and fail the attempt to connect to this peer. If we have a valid address to peer to, we need to build the data structures that will hold the TCP state. Remember that TCP is a stateful connection, which means we not only need to keep track of our local state, but we also need to keep track of the window and other information for the remote TCP peer. This is why there are two sets of calls to net.JoinHostPort, one for the local state, and one for the remote state.

Now that we have someplace to store the remote and local state, we can actually open a TCP connection (NewOutTCPConn) and then try to open the peering session (ConnectToPeer).

You can find the ConnectToPeer code in fsm/conn.go around line 175; the code is somewhat low level, so we won’t spend any time going through it here. Just taking a quick look shows that it essentially calls o.Connect, which then tries to open a new TCP session to the IP address specified.

Assuming this connection is actually opened, we have successfully moved the peer from idle to connect. We’ll tie up some loose ends in the next installment, and then consider the process of moving beyond connect state.