Summary
On latest main (e3031b8, v1.1.0), the Codex bridge now has --loaded-timeout and --turn-timeout, but there are still a few unbounded failure paths:
--app-server ws://... TCP connection succeeds, but the WebSocket upgrade never completes.
- WebSocket upgrade succeeds, but a JSON-RPC request such as
initialize never receives a response.
watch-once repeatedly exits non-zero; the bridge logs the failure and re-arms forever.
These are not Windows-specific in the reproduction below; they are transport / guardrail behavior in the bridge itself.
Reproduction Evidence
I reproduced this against a temp copy of the scripts so the repo's real teams/, db/, and run/ were not touched.
1. WebSocket handshake stall
Fake app-server: listens on 127.0.0.1, accepts the TCP connection, reads data, and never sends the HTTP 101 WebSocket upgrade response.
Bridge command shape:
timeout 4s node scripts/drivers/types/codex/codex-bridge.js \
--project "$tmp/proj" --team team --name alice --thread thread-existing \
--app-server "ws://127.0.0.1:$port" --timeout 1 --interval 1
Observed:
status=124
stdout=
stderr=
The bridge did not fail itself; the outer timeout killed it.
2. JSON-RPC request stall
Fake app-server: completes the WebSocket upgrade, then ignores JSON-RPC frames, so initialize never receives a response.
Observed with the same bridge command shape:
status=124
stdout=
stderr=
Again, the bridge only stopped because of the outer timeout.
3. Repeated watch failure loop
Fake app-server: responds to initialize, thread/resume, and process/spawn, then sends process/exited with exitCode: 1 every time the watch process is armed.
Observed with timeout 8s:
status=124
stderr:
codex-bridge: resumed thread thread-existing
codex-bridge: armed team/alice
codex-bridge: watch-once failed with exit 1: fake watch failure
codex-bridge: armed team/alice
codex-bridge: watch-once failed with exit 1: fake watch failure
failure_count=2
The bridge kept running after the repeated failures and was killed by the outer timeout.
Expected Behavior
The bridge should bound these stalls and fail explicitly, for example:
- WebSocket handshake timeout, with a clear error message.
- JSON-RPC request timeout for app-server requests, with cleanup of pending requests.
- A configurable consecutive watch failure limit, so persistent
watch-once failures stop the bridge instead of re-arming forever.
Notes
This is complementary to the existing --loaded-timeout and --turn-timeout; those guard different parts of the bridge lifecycle and did not cover the reproductions above.
Summary
On latest
main(e3031b8, v1.1.0), the Codex bridge now has--loaded-timeoutand--turn-timeout, but there are still a few unbounded failure paths:--app-server ws://...TCP connection succeeds, but the WebSocket upgrade never completes.initializenever receives a response.watch-oncerepeatedly exits non-zero; the bridge logs the failure and re-arms forever.These are not Windows-specific in the reproduction below; they are transport / guardrail behavior in the bridge itself.
Reproduction Evidence
I reproduced this against a temp copy of the scripts so the repo's real
teams/,db/, andrun/were not touched.1. WebSocket handshake stall
Fake app-server: listens on
127.0.0.1, accepts the TCP connection, reads data, and never sends the HTTP 101 WebSocket upgrade response.Bridge command shape:
Observed:
The bridge did not fail itself; the outer
timeoutkilled it.2. JSON-RPC request stall
Fake app-server: completes the WebSocket upgrade, then ignores JSON-RPC frames, so
initializenever receives a response.Observed with the same bridge command shape:
Again, the bridge only stopped because of the outer
timeout.3. Repeated watch failure loop
Fake app-server: responds to
initialize,thread/resume, andprocess/spawn, then sendsprocess/exitedwithexitCode: 1every time the watch process is armed.Observed with
timeout 8s:The bridge kept running after the repeated failures and was killed by the outer timeout.
Expected Behavior
The bridge should bound these stalls and fail explicitly, for example:
watch-oncefailures stop the bridge instead of re-arming forever.Notes
This is complementary to the existing
--loaded-timeoutand--turn-timeout; those guard different parts of the bridge lifecycle and did not cover the reproductions above.