feat: dynamic mesh idle_timeout from patch.cfg without container restart by rippleitinnz · Pull Request #414 · EvernodeXRPL/hpcore

rippleitinnz · 2026-05-20T21:32:46Z

Problem

mesh.idle_timeout is only read at startup in p2p::init() and cached in
metric_thresholds[4]. It cannot be changed on a running node without a
container restart, which is not possible on leased Evernode instances.

consensus.roundtime can already be changed dynamically from patch.cfg (it is
read from the contract section each ledger). However the effective maximum
roundtime is constrained by mesh.idle_timeout — with 4 consensus stages each
taking roundtime × stage_slice% (default 25%), the longest any single stage
can wait is roundtime × 0.25. If this exceeds mesh.idle_timeout, peers
disconnect during the wait and proposals from that peer are discarded as stale.

At the default mesh.idle_timeout of 120000ms this creates a hard ceiling:

safe_max_roundtime = mesh.idle_timeout / stage_slice% = 120000 / 0.25 = 480000ms

Exceeding 480000ms roundtime causes peer disconnections during stage waits,
leading to Not enough stage X proposals every round and permanent consensus
failure. The cluster cannot recover without terminating all nodes.

Without this fix, the dynamically-configurable roundtime range of 1000–3600000ms
is misleading — only 1000–480000ms is actually safe with default settings.

Fix

Three changes:

comm_server.hpp — added for_each_session() template method to iterate
over all active sessions under mutex protection.

p2p.cpp — added update_idle_timeout() which updates metric_thresholds[4]
for future connections AND calls set_threshold(IDLE_CONNECTION_TIMEOUT) on all
existing active sessions via for_each_session().

conf.cpp — reads mesh.idle_timeout from patch.cfg in apply_patch_config(),
calls p2p::update_idle_timeout() when value changes.

Effect

Operators can now increase mesh.idle_timeout via patch.cfg alongside
consensus.roundtime, enabling roundtimes beyond 480000ms without peer
disconnections. Takes effect immediately on all active and future connections
without any container restart. The full 1000–3600000ms roundtime range becomes
safely usable.

Sibling PRs

This is part of a series making some hpcore config fields dynamically updatable from patch.cfg:

fix/dynamic-log-level-from-patch-cfg — feat: dynamic log level update from patch.cfg without container restart #413
feat/dynamic-user-idle-timeout-from-patch-cfg — feat: dynamic user idle_timeout from patch.cfg without container restart #415

These three PRs should be reviewed and merged together. This PR and #415 share
the for_each_session() template added to comm_server.hpp.

Testing

Tested on a live 3-node Evernode cluster. Roundtime of 485000ms with default
mesh.idle_timeout=120000ms causes permanent consensus failure. With
mesh.idle_timeout updated dynamically to 200000ms alongside the roundtime
change, the cluster runs cleanly.

When log.log_level is present in patch.cfg, apply_patch_config() now updates the live plog logger severity via plog::get()->setMaxSeverity() in addition to persisting the change to hp.cfg and the runtime cfg struct. Previously log level was only read at startup (hplog::init()) and could not be changed on a running node without a container restart. This meant operators had no way to change log verbosity on external Evernode hosts where they don't control the container lifecycle. The fix uses plog's built-in setMaxSeverity() API which is thread-safe and takes effect immediately on the next log statement.

When mesh.idle_timeout is present in patch.cfg, apply_patch_config() now updates all active peer sessions via a new p2p::update_idle_timeout() function and also updates the cached metric_thresholds array for future connections. Previously mesh.idle_timeout was only read at startup (p2p::init()) and could not be changed on a running node without a container restart. This is critical for operators who need to increase roundtime beyond mesh.idle_timeout * 4 — without this fix, increasing roundtime past 480000ms (at default idle_timeout of 120000ms) causes peer disconnections and permanent consensus failure. Implementation: - comm_server.hpp: added for_each_session() template to iterate live sessions - p2p.cpp: added update_idle_timeout() which updates metric_thresholds[4] and calls set_threshold(IDLE_CONNECTION_TIMEOUT) on all active sessions - conf.cpp: reads mesh.idle_timeout from patch.cfg in apply_patch_config(), calls p2p::update_idle_timeout() when value changes Sibling PRs (part of dynamic config series): - fix/dynamic-log-level-from-patch-cfg (already raised) - feat/dynamic-user-idle-timeout-from-patch-cfg (to be raised) Fixes: mesh connections dropping during long roundtimes

rippleitinnz added 2 commits May 19, 2026 16:13

This was referenced May 20, 2026

feat: dynamic log level update from patch.cfg without container restart #413

Open

feat: dynamic user idle_timeout from patch.cfg without container restart #415

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: dynamic mesh idle_timeout from patch.cfg without container restart#414

feat: dynamic mesh idle_timeout from patch.cfg without container restart#414
rippleitinnz wants to merge 2 commits into
mainfrom
feat/dynamic-mesh-idle-timeout-from-patch-cfg

rippleitinnz commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rippleitinnz commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Effect

Sibling PRs

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rippleitinnz commented May 20, 2026 •

edited

Loading