feat: dynamic mesh idle_timeout from patch.cfg without container restart#414
Open
rippleitinnz wants to merge 2 commits into
Open
feat: dynamic mesh idle_timeout from patch.cfg without container restart#414rippleitinnz wants to merge 2 commits into
rippleitinnz wants to merge 2 commits into
Conversation
When log.log_level is present in patch.cfg, apply_patch_config() now updates the live plog logger severity via plog::get()->setMaxSeverity() in addition to persisting the change to hp.cfg and the runtime cfg struct. Previously log level was only read at startup (hplog::init()) and could not be changed on a running node without a container restart. This meant operators had no way to change log verbosity on external Evernode hosts where they don't control the container lifecycle. The fix uses plog's built-in setMaxSeverity() API which is thread-safe and takes effect immediately on the next log statement.
When mesh.idle_timeout is present in patch.cfg, apply_patch_config() now updates all active peer sessions via a new p2p::update_idle_timeout() function and also updates the cached metric_thresholds array for future connections. Previously mesh.idle_timeout was only read at startup (p2p::init()) and could not be changed on a running node without a container restart. This is critical for operators who need to increase roundtime beyond mesh.idle_timeout * 4 — without this fix, increasing roundtime past 480000ms (at default idle_timeout of 120000ms) causes peer disconnections and permanent consensus failure. Implementation: - comm_server.hpp: added for_each_session() template to iterate live sessions - p2p.cpp: added update_idle_timeout() which updates metric_thresholds[4] and calls set_threshold(IDLE_CONNECTION_TIMEOUT) on all active sessions - conf.cpp: reads mesh.idle_timeout from patch.cfg in apply_patch_config(), calls p2p::update_idle_timeout() when value changes Sibling PRs (part of dynamic config series): - fix/dynamic-log-level-from-patch-cfg (already raised) - feat/dynamic-user-idle-timeout-from-patch-cfg (to be raised) Fixes: mesh connections dropping during long roundtimes
This was referenced May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
mesh.idle_timeoutis only read at startup inp2p::init()and cached inmetric_thresholds[4]. It cannot be changed on a running node without acontainer restart, which is not possible on leased Evernode instances.
consensus.roundtimecan already be changed dynamically from patch.cfg (it isread from the contract section each ledger). However the effective maximum
roundtime is constrained by
mesh.idle_timeout— with 4 consensus stages eachtaking
roundtime × stage_slice%(default 25%), the longest any single stagecan wait is
roundtime × 0.25. If this exceedsmesh.idle_timeout, peersdisconnect during the wait and proposals from that peer are discarded as stale.
At the default
mesh.idle_timeoutof 120000ms this creates a hard ceiling:Exceeding 480000ms roundtime causes peer disconnections during stage waits,
leading to
Not enough stage X proposalsevery round and permanent consensusfailure. The cluster cannot recover without terminating all nodes.
Without this fix, the dynamically-configurable roundtime range of 1000–3600000ms
is misleading — only 1000–480000ms is actually safe with default settings.
Fix
Three changes:
comm_server.hpp— addedfor_each_session()template method to iterateover all active sessions under mutex protection.
p2p.cpp— addedupdate_idle_timeout()which updatesmetric_thresholds[4]for future connections AND calls
set_threshold(IDLE_CONNECTION_TIMEOUT)on allexisting active sessions via
for_each_session().conf.cpp— readsmesh.idle_timeoutfrom patch.cfg inapply_patch_config(),calls
p2p::update_idle_timeout()when value changes.Effect
Operators can now increase
mesh.idle_timeoutvia patch.cfg alongsideconsensus.roundtime, enabling roundtimes beyond 480000ms without peerdisconnections. Takes effect immediately on all active and future connections
without any container restart. The full 1000–3600000ms roundtime range becomes
safely usable.
Sibling PRs
This is part of a series making some hpcore config fields dynamically updatable from patch.cfg:
fix/dynamic-log-level-from-patch-cfg— feat: dynamic log level update from patch.cfg without container restart #413feat/dynamic-user-idle-timeout-from-patch-cfg— feat: dynamic user idle_timeout from patch.cfg without container restart #415These three PRs should be reviewed and merged together. This PR and #415 share
the
for_each_session()template added tocomm_server.hpp.Testing
Tested on a live 3-node Evernode cluster. Roundtime of 485000ms with default
mesh.idle_timeout=120000mscauses permanent consensus failure. Withmesh.idle_timeoutupdated dynamically to 200000ms alongside the roundtimechange, the cluster runs cleanly.