Fix open security & reliability backlog (#38–#47) by OnlyTerp · Pull Request #52 · sybil-solutions/codex-shim

OnlyTerp · 2026-06-22T16:58:43Z

Summary

Closes the full open backlog of 10 issues — security hardening, reliability fixes, and a CI quality-gate uplift. Each issue is addressed in its own focused commit with regression tests.

Issue	Fix	Sev
#45	`_resolve_api_key` no longer leaks the Cursor key to empty-key models — fallback scoped to Cursor-target entries only	P1 / High
#42	`patch-app` pins the ASAR packer (`@electron/asar@4.2.0`, `npx --package`) instead of unpinned `npx --yes asar`	P1 / High
#39	Web search no longer silently fails — `_maybe_intercept_web_search` is async and awaits the search (removes `run_until_complete` deadlock)	P1 / High
#44	Router cache is now a per-instance bounded LRU, invalidated on model switch	P2 / Med
#43	`/picker` serves CSP + `X-Frame-Options: DENY` + `nosniff` + `Referrer-Policy` headers	P2 / Med
#41	Removed `Popen(["Codex.exe"], shell=True)` PATH-hijack fallback in the Codex restart path	P2 / Med
#40	Non-streaming upstream calls get a finite timeout (default 300s, `CODEX_SHIM_UPSTREAM_TIMEOUT`); streaming keeps unbounded read	P2 / Med
#38	`/api/models` now requires the picker token (403 otherwise); picker JS sends it	P2 / Med
#46	`_dump_debug_request` is gated behind `CODEX_SHIM_DEBUG_DUMP` and written `0600` in a `0700` dir	P2 / Med
#47	CI gains a blocking `ruff` lint gate, report-only mypy, coverage (`--cov-fail-under=70`), Python 3.13 matrix; new `test_opencode_go.py` (module 70%→97%)	P2 / Low

Notes for reviewers

Net-new tests: ~40 across test_settings_catalog.py, test_server.py, test_router.py, and the new test_opencode_go.py.
Web search always silently fails: loop.run_until_complete() deadlocks inside running asyncio event loop #39 scope: the documented run_until_complete deadlock is fixed and tested via _post_openai_chat. A separate pre-existing quirk (native web_search/apply_patch tools are flattened to function by responses_to_chat before _build_tool_types runs on the forwarded body) is intentionally left out of scope — it predates this issue and has a different root cause.
CI pipeline has no linting, type checking, or coverage gates; opencode_go module has zero test coverage #47 pragmatism: ruff check (E/F/W) is blocking and passes; ruff format/import-sorting (18-file reflow) and mypy (16 pre-existing type errors) are kept report-only to avoid a permanently-red pipeline or a sweeping out-of-scope reformat. Line-length is set to 160 to match existing code.
Backwards compatibility preserved: router.reset_cache()/resolve_auto() still work without an explicit cache; the process-wide default RouterCache is retained.

Test plan

python -m compileall codex_shim/ -q
ruff check codex_shim/ tests/ — clean
pytest tests/ -q --cov=codex_shim --cov-fail-under=70 — 222 passed, 70.32% total
Per-issue regression tests added and passing
CI green on 3.11 / 3.12 / 3.13

Generated with Devin

…45) _resolve_api_key previously fell back to the ambient Cursor key (~/.codex-shim/cursor-api-key, then $CURSOR_API_KEY) for ANY model whose resolved api_key was empty. A model entry pointing at an arbitrary base_url with no api_key would therefore silently forward the user's Cursor key as a Bearer token to that host. Restrict the fallback to Cursor-target entries only — provider in {cursor, cursor-passthrough} or a base_url on a known Cursor host. Empty keys on any other provider now stay empty, so byok_model_has_credentials() returns False and the model is excluded from usable_byok_models() instead of leaking the key upstream. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

patch-app rewrote the trusted Codex Desktop app.asar bundle using `npx --yes asar`, which resolves and immediately executes whatever version the npm registry currently serves. A registry/maintainer compromise of `asar` (~2M weekly downloads) would gain code execution against the bundle with the user's privileges. Pin the packer to an exact, auditable spec (`@electron/asar@4.2.0`) via `npx --package <spec> asar`, so the resolved version is deterministic regardless of PATH/cache. The spec is overridable via CODEX_SHIM_ASAR_PACKAGE for mirrors or vetted upgrades, and the active spec is printed at patch time. Document the pin and the override in the README. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

…39) _maybe_intercept_web_search called loop.run_until_complete(_perform_web_search) from inside the already-running request handler loop. CPython always raises RuntimeError("This event loop is already running.") there, and the except RuntimeError swallowed it, so web search silently returned "Web search unavailable in this context." on every call. Make _maybe_intercept_web_search a coroutine and await _perform_web_search directly; both non-streaming call sites (_post_openai_chat, _post_anthropic as_responses) now await it. The blocking urlopen in _perform_web_search is moved off the event loop via asyncio.to_thread so a slow search no longer stalls the handler. The run_until_complete/except-RuntimeError block is deleted. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

The Auto Router cached routing decisions in a module-level dict whose only eviction was a bulk clear() at capacity — flip-flopping between 256 entries and 0 (thundering-herd of classifier calls) — and which was never invalidated when the active model changed, so a picker model switch left stale routing in place. The global dict was also shared across all ShimServer instances. Replace it with a bounded LRU RouterCache (OrderedDict). Each ShimServer owns its own cache (self.router_cache) and passes it to resolve_auto via a new cache= parameter; switch_model now clears it so reconfiguration invalidates stale decisions. A process-wide default RouterCache is retained for callers that don't supply one, keeping reset_cache()/resolve_auto() backward compatible. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

GET /picker served the page (with the picker token embedded in inline JS) with only Content-Type, so any local script/extension with loopback access could frame it or passively read the token and drive /api/switch. Add defense-in-depth response headers: a strict Content-Security-Policy (default-src 'none', inline script/style only, connect-src 'self', frame-ancestors 'none'), X-Frame-Options: DENY, X-Content-Type-Options: nosniff, Referrer-Policy: no-referrer, and Cache-Control: no-store. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

_restart_codex_app fell back to Popen(["Codex.exe"], shell=True) when LOCALAPPDATA was unset or the resolved path didn't exist. A bare name with shell=True does a PATH search (cmd.exe /C Codex.exe), so a malicious Codex.exe earlier on PATH would be executed by the user who started the shim when an authenticated picker user switches models with restart enabled. Remove the fallback: relaunch only via the absolute LOCALAPPDATA path; if it can't be resolved, log to stderr and skip the relaunch (Codex is still quit). Refactor the Windows/macOS branches into _restart_codex_windows / _restart_codex_macos helpers so each is directly unit-testable. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Every upstream call reused one ClientTimeout(total=None, sock_read=None) — aiohttp's "no timeout". A stalled non-streaming upstream (provider outage, network partition) could hold an aiohttp worker open until the OS eventually reset the TCP connection, so a few stuck requests could exhaust capacity and require a process restart. Split the timeout by request shape: streaming (SSE) keeps an unbounded sock_read but a bounded sock_connect; non-streaming gets a finite total wall-clock cap (default 300s, override via CODEX_SHIM_UPSTREAM_TIMEOUT). A _upstream_timeout(body) helper selects per request and is applied to all BYOK and ChatGPT passthrough call sites; compaction (always non-streaming) uses the sync timeout. Documented in the README. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

/api/models enumerated every configured model slug, display name, provider, and the active model to any loopback caller (local script, browser extension, or a DNS-rebinding page on a misconfigured bind host) with no auth, while the state-changing /api/switch already required the picker token. Guard api_models with the same _valid_picker_token check (403 without it), and update the picker page's loadModels() fetch to send the X-Codex-Shim-Picker-Token header so the UI keeps working. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

) _dump_debug_request wrote the full forwarded request body — the entire conversation (code context, tool output, history) — to .codex-shim/last_request.json on every non-streaming forward, unconditionally and with the default umask (often world-readable 0644) inside the project directory. Gate it behind CODEX_SHIM_DEBUG_DUMP (off by default). When enabled, create the directory mode 0700 and write the file via os.open(..., 0o600) so it is private from creation rather than relying on the umask. Documented in the README. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

CI ran only compileall + pytest with no static analysis, no coverage gate, and no test matrix entry for 3.13, and opencode_go.py's HTTP paths were entirely untested. - Add a lint job: blocking `ruff check` (after fixing the 2 real lint errors — an unused import and an undefined-name forward ref) plus a report-only mypy step (codebase isn't fully typed yet). - Run pytest with coverage and --cov-fail-under=70; add Python 3.13 to the matrix and classifiers. - Configure ruff/mypy/coverage in pyproject.toml and add ruff/mypy/pytest-cov to dev extras; ignore coverage/cache artifacts in .gitignore. - Add tests/test_opencode_go.py covering _request_json (success, HTTPError, URLError, non-JSON), model discovery + probes, and refresh orchestration with stubbed urlopen. opencode_go.py coverage 70% -> 97%. Generated with [Devin](https://devin.ai) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

test and others added 10 commits June 22, 2026 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix open security & reliability backlog (#38–#47)#52

Fix open security & reliability backlog (#38–#47)#52
OnlyTerp wants to merge 10 commits into
mainfrom
fix/security-reliability-backlog

OnlyTerp commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

OnlyTerp commented Jun 22, 2026

Summary

Notes for reviewers

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant