feat(bfd): add BFD server and peer state machine#3402
Conversation
bd258f0 to
e5d4c70
Compare
Implements RFC 5880/5881 BFD liveness detection for BGP peers: - UDP server listening on port 3784 with per-peer session management - State machine: Down → Init → Up with detection timeout and hard reset - Per-peer async counters and session state - Integration with BGP peer lifecycle (AddPeer/DeletePeer/ResetPeer) - Tests covering state transitions, peer reset, and BGP integration Co-Authored-By: Ivan-Pokhabov <vanek3372@gmail.com>
e5d4c70 to
60a9472
Compare
fujita
left a comment
There was a problem hiding this comment.
You add some public API functions. They can be called by user applications that use GoBGP as a library. Are they safely called by multiple goroutines? Also functions that could be blocked need to take context.
| Prefix: p.Prefix, | ||
| RD: p.Rd, | ||
| LookupOption: apiutil.LookupOptionFromAPI(p.Type), | ||
| LookupOption: apiutil.LookupOption(p.Type), |
There was a problem hiding this comment.
Please don't modify unrelated code.
| s.eventConfig <- &config | ||
| } | ||
|
|
||
| func (s *bfdServer) Stop() { |
There was a problem hiding this comment.
Fixed, we lose it
|
|
||
| s.eventGetPeer <- list | ||
|
|
||
| select { |
There was a problem hiding this comment.
Looks like this code could be blocked. The caller of GetPeerState holds s.shared.mu so you can't.
| return nil | ||
| } | ||
|
|
||
| func (s *bfdServer) GetPeerState(ctx context.Context, peerAddress string) (*bfdPeerState, error) { |
There was a problem hiding this comment.
Replace all string with netip.Addr
| } | ||
|
|
||
| func (s *bfdServer) DeletePeer(peerAddress string) error { | ||
| address, err := convertToIPAddress(peerAddress) |
| } | ||
|
|
||
| func (s *bfdServer) AddPeer(peerAddress string, config oc.BfdConfig) error { | ||
| if !config.Enabled { |
52b3bad to
c841d82
Compare
Thank you for review, I fixed mistakes and made code more concurent-safety |
| } | ||
|
|
||
| func (p *bfdPeer) Stop() { | ||
| close(p.eventShutdown) |
There was a problem hiding this comment.
Needs to wait for the coroutine to exit here.
|
|
||
| switch h.State { | ||
| case bfd.StateDown: | ||
| p.sendPacket(bfd.StateInit, false, false, h.MyDiscriminator) |
There was a problem hiding this comment.
If the current state is Up, needs to set the sate Down?
There was a problem hiding this comment.
Fixed, made like more RFC
| peerState: ps, | ||
| logger: logger, | ||
| peerAddress: peerAddress, | ||
| peerPort: int(config.Port), |
There was a problem hiding this comment.
Needs to use the default port number if config.Port is zero?
There was a problem hiding this comment.
Yeah, made this
- BgpServer.Stop() now calls bfdServer.Stop(), fixing goroutine leak
where bfdServer.loop() ran indefinitely after server shutdown
- Start/AddPeer/DeletePeer now accept context.Context and use select
with ctx.Done() and eventShutdown, making them safe to call
concurrently and non-blockable under cancellation or shutdown
- GetPeerState/GetPeerStateList no longer route through the event loop:
they read directly under peersMutex/atomic, eliminating the risk of
deadlock when called while s.shared.mu is held (e.g. from toConfig)
- Removed bfdEventGetPeerState and bfdEventGetPeerStateList types and
their event loop cases — no longer needed
- Fixed IPv4-mapped address mismatch: rxPacket now calls Addr().Unmap()
so ReadFromUDP's ::ffff:1.2.3.4 form matches peers stored as pure IPv4
- AddPeer/DeletePeer API changed from string to netip.Addr, eliminating
redundant string parsing in the hot path
c841d82 to
d249887
Compare
- Wait for bfdPeer loop exit in Stop() and stop all peer timers during
shutdown, so callers do not return while peer goroutine is still active
- Treat remote BFD StateDown as a session failure when local state is Up,
resetting the BGP peer immediately instead of waiting for timeout
- Default per-peer BFD destination port to 3784 when BfdConfig.Port is zero
- Aligned the peer state machine closer to RFC 5880: remote Down from Up now
transitions local state to Down without immediately sending Init, local
Init is represented explicitly, and remote Up no longer brings a Down
session directly to Up
- Add regression tests for peer shutdown, remote down handling, and default
port selection
d249887 to
130c415
Compare
|
Pushed, thanks. Could we add documentation for configuring BFD? |
Thanks! I wiil do docs soon |
Implements RFC 5880/5881 BFD liveness detection for BGP peers: