fix: memory leak in libp2p bootstrap node#13
Conversation
…e resource management during shutdown
…tion in BootstrapNode
…educing ticker interval and improving logging
…g metrics in BootstrapNode
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53bcdb5321
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| environment: | ||
| - LOG_FILE=/app/logs/bootstrap-node.log | ||
| - LOG_LEVEL=${LOG_LEVEL:-info} | ||
| - PPROF_PORT=${PPROF_PORT:-6060} |
There was a problem hiding this comment.
Keep pprof disabled unless explicitly configured
Setting PPROF_PORT=${PPROF_PORT:-6060} makes PPROF_PORT non-empty even when operators do not set it, because Compose :- interpolation injects the default when the variable is unset/empty. Since cmd/main.go starts pprof whenever PPROF_PORT is non-empty, this change enables and publishes the debug profiler by default, exposing runtime profiling endpoints in environments that expected pprof to stay opt-in.
Useful? React with 👍 / 👎.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53bcdb5321
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| environment: | ||
| - LOG_FILE=/app/logs/bootstrap-node.log | ||
| - LOG_LEVEL=${LOG_LEVEL:-info} | ||
| - PPROF_PORT=${PPROF_PORT:-6060} |
There was a problem hiding this comment.
Do not enable pprof by default
In the Docker Compose path, the README says PPROF_PORT must be set to enable pprof, but this line injects 6060 even when .env omits it; combined with the newly added port mapping, every default compose deployment starts http.ListenAndServe on :6060 and exposes /debug/pprof/* on the host. For public bootstrap nodes this leaks heap/goroutine/cmdline profiling data and can run expensive profiles without opt-in, so keep PPROF_PORT unset unless it is explicitly configured.
Useful? React with 👍 / 👎.
|
@codex review |
|
To use Codex here, create a Codex account and connect to github. |
Summary
ConnManagerHighWater + 100so gossipsub peer-scoring state cannot grow unbounded as peers churn.BootstrapNode.Close()with cancelable context,NotifyBundlederegistration, and DHT close — wired intocmd/main.goshutdown.PPROF_PORT) and a 60-second status logger reporting connected/peerstore/dht_rt/goroutines/heap_alloc/heap_inuse/sys.start.sh/stop.sh/build-docker.sh,--buildflag forstart.sh, signal trap for clean shutdown, Go base image bumped to 1.25, README debugging guide.Why
The bootstrap node was leaking memory under sustained peer churn. Three root causes were identified:
Conns,ConnsInbound,ConnsOutbound,Memory, andFDwere allrcmgr.Unlimitedin both theSystemandTransientscopes. Gossipsub peer-score state is tracked per peer/connection, so unbounded conns meant unbounded scoring memory.RemovePeerdoes not clear addresses, so we now callClearAddrsfirst and thenRemovePeer.host.Close(). The DHT, the network notification bundle, and the parent context for DHT/gossipsub were never released — so background goroutines and routing-table state lingered across restarts in long-running test cycles.Changes
pkg/service/bootstrap.goSystemandTransientscopes:Conns / ConnsInbound / ConnsOutboundnowrcmgr.LimitVal(cfg.ConnManagerHighWater + 100)instead ofUnlimited.BootstrapNodenow holdsPubsub,notificationBundle,ctx, andcancelfor proper lifecycle.hostCtx) is used for DHT and gossipsub construction; it is cancelled on every constructor failure path.Close()method orders cleanup as: cancel context →Network().StopNotify(bundle)→DHT.Close()→Host.Close(), aggregating errors into a single returned error.startPeerstoreGC()goroutine ticks every 2 minutes; for each non-self peer not currently connected, it callsClearAddrs(p)thenRemovePeer(p)and logs a per-cycle summary at info level when anything was removed (debug otherwise)._, err = pubsub.NewGossipSub(...)), so the GC root is explicit and we can extend pubsub usage later without reconstructing the instance.cmd/main.goPPROF_PORTenv var; logs the listen address on startup and an error ifListenAndServefails.Connected peers: Nlog) emits:node.Close()(instead ofnode.Host.Close()) and logs aggregated shutdown errors.signal.Stop(sigs)is called on the shutdown path.Docker / scripts
Dockerfile:golang:1.24.5-alpine→golang:1.25-alpine.docker-compose.yaml: expose${PPROF_PORT:-6060}:${PPROF_PORT:-6060}and passPPROF_PORTthrough as an environment variable.start.sh,stop.sh,build-docker.sh: detectdocker composeplugin vs the standalonedocker-composebinary and fail clearly if neither is present.start.sh: parses a--buildflag, runs<DOCKER_COMPOSE> buildfirst when set, and installs a signal trap that runs<DOCKER_COMPOSE> downonSIGINT/SIGTERM.Docs / misc
README.md: new Debugging Memory / Performance section explaining the status line metrics and pprof recipes (heap,-alloc_space, two-snapshot-basediff, goroutine dump)..gitignore: add.DS_Store.File map
pkg/service/bootstrap.gocmd/main.gostart.sh--build, signal trapREADME.mdstop.shbuild-docker.shdocker-compose.yamlDockerfile.gitignore.DS_StoreCommits
51882964d248260eb61eaa9f7554f762208fb6ddb7Test plan
go build ./...succeeds on Go 1.25../start.sh --buildbuilds the image and runs the container;./stop.shshuts it down cleanly on bothdocker compose(plugin) anddocker-compose(standalone) hosts.PPROF_PORT=6060set,curl http://localhost:6060/debug/pprof/heapreturns a profile and the port is reachable from the host.go tool pprof -base heap1.pb.gz heap2.pb.gzdoes not show unbounded growth in gossipsub peer-score allocations.peerstore=value in the status line tracksconnected=rather than monotonically increasing, andPeerstore GC: removed N stale peerslog lines appear.SIGINT, logs showShutting down bootstrap node...followed by clean DHT and host close (noError during shutdownlines).Risk / rollout
ConnManagerHighWater + 100. Confirm the configured high-water mark is sized for expected peak inbound peer count — too low will reject legitimate connections during traffic spikes.