Skip to content

Make the sidecar a dumb versioned proxy: drop request tracking, gating, and quiesce#10

Open
jvmncs wants to merge 1 commit into
mainfrom
devin/1782881670-dumb-proxy-sidecar
Open

Make the sidecar a dumb versioned proxy: drop request tracking, gating, and quiesce#10
jvmncs wants to merge 1 commit into
mainfrom
devin/1782881670-dumb-proxy-sidecar

Conversation

@jvmncs

@jvmncs jvmncs commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

The sidecar tracked in-flight requests (_active_requests, _exact_inflight) behind a _committing admission gate to support two commit modes. This was brittle (a ClientDisconnect could leak the counter and wedge a quiesce commit forever) and, in the deployment we actually run, dead weight: every trainer config commits in_place and gates requests with weight_version_mode="min" (never exact), so _commit_ready never waits on any count. This collapses the sidecar to a single in-place path and removes all request tracking.

What stays (these were never "tracking" — they carry no shared state and are unaffected by disconnect drift): the 409 min/exact WeightVersionPolicy check, start_version capture, version-namespaced extra_key stamping, response weight_version_start/end metadata, and rid-injection + client-disconnect /abort_request.

What goes: the RolloutAdmissionGate class, CommitMode/quiesce, the counters, the _committing gate, and the exact-pin draining. There is now one commit path:

commit_version: pause() -> apply() -> on_applied()  -> finally: resume()

on_applied still advances current_version before resume() (new admissions see the new namespace first), and a failed apply() still leaves the version unchanged and resumes the engine. request_context is now stateless — it validates the policy (still calling queue_sync on a WeightVersionNotReady so a lagging replica pulls forward) and yields current_version, with no lock/counter/finally.

Because commits no longer drain, the quiesce-only flush_cache is gone (removed from EngineAdapter, the SGLang adapter, and the _sync_once path); the reload payload keeps "flush_cache": False. server_info() drops the commit_mode/active_requests/inflight_exact_versions fields.

Trade-off: exact_version requests still get a correct 409 on mismatch, but are no longer guaranteed to be served wholly on version v if an in-place commit lands mid-decode (the tail decodes on v+1; extra_key isolates the prefix cache, not decode continuation). No shipped config uses exact; min-staleness + "cross commits freely" is the documented async stance.

Cookbook: removed the commit_mode/--commit-mode/SIDECAR_COMMIT_MODE/COMMIT_MODE/STITCH_SHIM_COMMIT_MODE knobs from the sidecar/provider/process/helper launch paths and configs; updated README/docstrings to describe the single in-place path.

Testing

  • uv run pytest — 112 passed
  • uv run ruff check . — clean
  • Gate-specific tests removed; policy-409 and version-stamping coverage retained; in-place commit ordering / apply-failure-resume tests kept.

Link to Devin session: https://modal.devinenterprise.com/sessions/9847754de05e4e94bc99199825a47c15
Requested by: @jvmncs

…ing, and quiesce

Co-Authored-By: jason.mancuso@modal.com <jvmncs@gmail.com>
@jvmncs jvmncs self-assigned this Jul 1, 2026
@devin-ai-integration

Copy link
Copy Markdown

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

⚙️ Control Options:

  • Disable automatic comment, CI, and merge conflict monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant