Skip to content

feat(consumer): per-chain latency cutoff for random provider selection#2314

Open
EliasiOfir wants to merge 4 commits into
mainfrom
feature/consumer-latency-cutoff
Open

feat(consumer): per-chain latency cutoff for random provider selection#2314
EliasiOfir wants to merge 4 commits into
mainfrom
feature/consumer-latency-cutoff

Conversation

@EliasiOfir

Copy link
Copy Markdown
Collaborator

Summary

Adds an optional per-chain latency cutoff for the consumer's random provider
selection. Providers whose measured QoS latency exceeds a configurable
max-provider-latency (seconds) are put aside during general random selection,
so slow providers stop receiving traffic while faster ones absorb it. Opt-in:
max-provider-latency: 0 (or omitted) preserves current behaviour exactly.

Behaviour

  • Enabled (> 0): providers measured above the threshold are excluded from the general random pool.
  • Safety fallback: if every provider is above the threshold, the full pool is kept (we never strand a chain).
  • Cold start: providers with no latency measurement yet are not filtered.
  • Unaffected: stickiness, stateful, and explicitly-selected-provider paths.

Changes

  • protocol/lavasession/consumer_session_manager.go — latency filtering in the general selection path.
  • protocol/lavasession/consumer_types.goRPCEndpoint.MaxProviderLatency field.
  • protocol/rpcconsumer — parse max-provider-latency from the endpoints YAML (defaults to 0).
  • config/ — example consumer configs (enabled / disabled variants) + rpcconsumer.yml field.
  • protocol/rpcconsumer/README.md — docs.
  • Tests: unit + integration tests, plus a live E2E harness under scripts/.

Testing

  • New unit/integration tests: 14/14 pass (13 lavasession + 1 rpcconsumer).
  • E2E harness (scripts/test/e2e_latency_cutoff.sh all): all 3 modes pass
    • cutoff: slow provider put aside (deltas p1=+46 p2=+54 p3(slow)=+0)
    • regression-disabled: cutoff off → slow provider still selected (p3=+24)
    • regression-fallback: all slow → fallback keeps full pool (+100/+100/+100)
  • Full suite: go build ./... clean; no regressions across x/... and protocol/....

EliasiOfir and others added 3 commits June 16, 2026 18:00
Adds an optional per-endpoint QoS latency cutoff (MaxProviderLatency,
in seconds) to RPCEndpoint. During the general random/weighted provider
selection, providers whose EWMA latency exceeds the cutoff are put aside
via the existing ignored-providers set, so the optimizer never picks them.

Includes a safety fallback: the cutoff is applied only if at least one
candidate stays under the threshold; otherwise the full pool is kept so
relays keep flowing when the whole pairing is slow. Cold-start providers
(no QoS data) are treated as under-threshold.

Static, header-selected, sticky and stateful selection paths are
unaffected. No optimizer/interface changes; the setting rides on
csm.rpcEndpoint. 0 (default) disables the cutoff.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds focused unit tests for filterHighLatencyProviders covering: excluding
slow providers, keeping fast ones, the all-slow safety fallback, disabled
cutoff (0), cold-start (no QoS) handling, preserving pre-existing ignored
entries, strict ">" threshold boundary, multi-provider filtering, and a
mixed cold-start/slow/fast case.

Documents max-provider-latency in the rpcconsumer README and adds a
commented example to config/rpcconsumer.yml.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Go tests:
- consumer_session_manager_latency_selection_test.go: drives
  getValidProviderAddresses through a stub optimizer to prove the cutoff
  applies only to the general random path (sticky/stateful/selected-provider
  paths are unaffected).
- rpcconsumer_endpoints_test.go: verifies max-provider-latency parses from
  the YAML endpoints config into RPCEndpoint (0/omitted => disabled).

Live E2E (full env: dev chain + mock-backed providers + consumer):
- scripts/pre_setups/init_eth_latency_cutoff.sh: brings up 3 ETH1 providers,
  each behind its own mock RPC backend; one mock is slowed so its provider
  exceeds the cutoff. Modes: cutoff | regression-disabled | regression-fallback.
- scripts/test/verify_latency_cutoff.sh: drives relays and asserts on the
  lava_consumer_provider_selections metric per mode.
- scripts/test/e2e_latency_cutoff.sh: one-command runner (setup -> wait ->
  verify -> teardown).
- config/eth_latency_cutoff_consumer{,_disabled}.yml: the two consumer configs
  (cutoff 1.0 and disabled 0) selected by mode.

The mock RPC server is left untouched; providers use --use-static-spec with a
verification-stripped ETH1 spec so they accept the mock backend while still
proxying real (latency-bearing) relays.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@EliasiOfir EliasiOfir requested a review from avitenzer June 18, 2026 16:07
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.30769% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
protocol/lavasession/consumer_session_manager.go 92.30% 2 Missing and 1 partial ⚠️
Flag Coverage Δ
consensus 8.96% <ø> (ø)
protocol 38.19% <92.30%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
protocol/lavasession/consumer_types.go 77.13% <ø> (ø)
protocol/lavasession/consumer_session_manager.go 72.60% <92.30%> (+3.46%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

Test Results

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
7 files   ±0   0 ❌ ±0 

Results for commit 4147188. ± Comparison against base commit 1bdc695.

♻️ This comment has been updated with latest results.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant