Skip to content

Fix open security & reliability backlog (#38–#47)#52

Open
OnlyTerp wants to merge 10 commits into
mainfrom
fix/security-reliability-backlog
Open

Fix open security & reliability backlog (#38–#47)#52
OnlyTerp wants to merge 10 commits into
mainfrom
fix/security-reliability-backlog

Conversation

@OnlyTerp

Copy link
Copy Markdown
Collaborator

Summary

Closes the full open backlog of 10 issues — security hardening, reliability fixes, and a CI quality-gate uplift. Each issue is addressed in its own focused commit with regression tests.

Issue Fix Sev
#45 _resolve_api_key no longer leaks the Cursor key to empty-key models — fallback scoped to Cursor-target entries only P1 / High
#42 patch-app pins the ASAR packer (@electron/asar@4.2.0, npx --package) instead of unpinned npx --yes asar P1 / High
#39 Web search no longer silently fails — _maybe_intercept_web_search is async and awaits the search (removes run_until_complete deadlock) P1 / High
#44 Router cache is now a per-instance bounded LRU, invalidated on model switch P2 / Med
#43 /picker serves CSP + X-Frame-Options: DENY + nosniff + Referrer-Policy headers P2 / Med
#41 Removed Popen(["Codex.exe"], shell=True) PATH-hijack fallback in the Codex restart path P2 / Med
#40 Non-streaming upstream calls get a finite timeout (default 300s, CODEX_SHIM_UPSTREAM_TIMEOUT); streaming keeps unbounded read P2 / Med
#38 /api/models now requires the picker token (403 otherwise); picker JS sends it P2 / Med
#46 _dump_debug_request is gated behind CODEX_SHIM_DEBUG_DUMP and written 0600 in a 0700 dir P2 / Med
#47 CI gains a blocking ruff lint gate, report-only mypy, coverage (--cov-fail-under=70), Python 3.13 matrix; new test_opencode_go.py (module 70%→97%) P2 / Low

Notes for reviewers

  • Net-new tests: ~40 across test_settings_catalog.py, test_server.py, test_router.py, and the new test_opencode_go.py.
  • Web search always silently fails: loop.run_until_complete() deadlocks inside running asyncio event loop #39 scope: the documented run_until_complete deadlock is fixed and tested via _post_openai_chat. A separate pre-existing quirk (native web_search/apply_patch tools are flattened to function by responses_to_chat before _build_tool_types runs on the forwarded body) is intentionally left out of scope — it predates this issue and has a different root cause.
  • CI pipeline has no linting, type checking, or coverage gates; opencode_go module has zero test coverage #47 pragmatism: ruff check (E/F/W) is blocking and passes; ruff format/import-sorting (18-file reflow) and mypy (16 pre-existing type errors) are kept report-only to avoid a permanently-red pipeline or a sweeping out-of-scope reformat. Line-length is set to 160 to match existing code.
  • Backwards compatibility preserved: router.reset_cache()/resolve_auto() still work without an explicit cache; the process-wide default RouterCache is retained.

Test plan

  • python -m compileall codex_shim/ -q
  • ruff check codex_shim/ tests/ — clean
  • pytest tests/ -q --cov=codex_shim --cov-fail-under=70 — 222 passed, 70.32% total
  • Per-issue regression tests added and passing
  • CI green on 3.11 / 3.12 / 3.13

Generated with Devin

test and others added 10 commits June 22, 2026 12:21
…45)

_resolve_api_key previously fell back to the ambient Cursor key
(~/.codex-shim/cursor-api-key, then $CURSOR_API_KEY) for ANY model whose
resolved api_key was empty. A model entry pointing at an arbitrary base_url
with no api_key would therefore silently forward the user's Cursor key as a
Bearer token to that host.

Restrict the fallback to Cursor-target entries only — provider in
{cursor, cursor-passthrough} or a base_url on a known Cursor host. Empty keys
on any other provider now stay empty, so byok_model_has_credentials() returns
False and the model is excluded from usable_byok_models() instead of leaking
the key upstream.

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
patch-app rewrote the trusted Codex Desktop app.asar bundle using
`npx --yes asar`, which resolves and immediately executes whatever version
the npm registry currently serves. A registry/maintainer compromise of `asar`
(~2M weekly downloads) would gain code execution against the bundle with the
user's privileges.

Pin the packer to an exact, auditable spec (`@electron/asar@4.2.0`) via
`npx --package <spec> asar`, so the resolved version is deterministic
regardless of PATH/cache. The spec is overridable via CODEX_SHIM_ASAR_PACKAGE
for mirrors or vetted upgrades, and the active spec is printed at patch time.
Document the pin and the override in the README.

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…39)

_maybe_intercept_web_search called loop.run_until_complete(_perform_web_search)
from inside the already-running request handler loop. CPython always raises
RuntimeError("This event loop is already running.") there, and the except
RuntimeError swallowed it, so web search silently returned
"Web search unavailable in this context." on every call.

Make _maybe_intercept_web_search a coroutine and await _perform_web_search
directly; both non-streaming call sites (_post_openai_chat,
_post_anthropic as_responses) now await it. The blocking urlopen in
_perform_web_search is moved off the event loop via asyncio.to_thread so a slow
search no longer stalls the handler. The run_until_complete/except-RuntimeError
block is deleted.

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The Auto Router cached routing decisions in a module-level dict whose only
eviction was a bulk clear() at capacity — flip-flopping between 256 entries
and 0 (thundering-herd of classifier calls) — and which was never invalidated
when the active model changed, so a picker model switch left stale routing in
place. The global dict was also shared across all ShimServer instances.

Replace it with a bounded LRU RouterCache (OrderedDict). Each ShimServer owns
its own cache (self.router_cache) and passes it to resolve_auto via a new
cache= parameter; switch_model now clears it so reconfiguration invalidates
stale decisions. A process-wide default RouterCache is retained for callers
that don't supply one, keeping reset_cache()/resolve_auto() backward compatible.

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
GET /picker served the page (with the picker token embedded in inline JS) with
only Content-Type, so any local script/extension with loopback access could
frame it or passively read the token and drive /api/switch.

Add defense-in-depth response headers: a strict Content-Security-Policy
(default-src 'none', inline script/style only, connect-src 'self',
frame-ancestors 'none'), X-Frame-Options: DENY, X-Content-Type-Options:
nosniff, Referrer-Policy: no-referrer, and Cache-Control: no-store.

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
_restart_codex_app fell back to Popen(["Codex.exe"], shell=True) when
LOCALAPPDATA was unset or the resolved path didn't exist. A bare name with
shell=True does a PATH search (cmd.exe /C Codex.exe), so a malicious Codex.exe
earlier on PATH would be executed by the user who started the shim when an
authenticated picker user switches models with restart enabled.

Remove the fallback: relaunch only via the absolute LOCALAPPDATA path; if it
can't be resolved, log to stderr and skip the relaunch (Codex is still quit).
Refactor the Windows/macOS branches into _restart_codex_windows /
_restart_codex_macos helpers so each is directly unit-testable.

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Every upstream call reused one ClientTimeout(total=None, sock_read=None) —
aiohttp's "no timeout". A stalled non-streaming upstream (provider outage,
network partition) could hold an aiohttp worker open until the OS eventually
reset the TCP connection, so a few stuck requests could exhaust capacity and
require a process restart.

Split the timeout by request shape: streaming (SSE) keeps an unbounded
sock_read but a bounded sock_connect; non-streaming gets a finite total
wall-clock cap (default 300s, override via CODEX_SHIM_UPSTREAM_TIMEOUT). A
_upstream_timeout(body) helper selects per request and is applied to all BYOK
and ChatGPT passthrough call sites; compaction (always non-streaming) uses the
sync timeout. Documented in the README.

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
/api/models enumerated every configured model slug, display name, provider, and
the active model to any loopback caller (local script, browser extension, or a
DNS-rebinding page on a misconfigured bind host) with no auth, while the
state-changing /api/switch already required the picker token.

Guard api_models with the same _valid_picker_token check (403 without it), and
update the picker page's loadModels() fetch to send the
X-Codex-Shim-Picker-Token header so the UI keeps working.

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
)

_dump_debug_request wrote the full forwarded request body — the entire
conversation (code context, tool output, history) — to
.codex-shim/last_request.json on every non-streaming forward, unconditionally
and with the default umask (often world-readable 0644) inside the project
directory.

Gate it behind CODEX_SHIM_DEBUG_DUMP (off by default). When enabled, create the
directory mode 0700 and write the file via os.open(..., 0o600) so it is private
from creation rather than relying on the umask. Documented in the README.

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
CI ran only compileall + pytest with no static analysis, no coverage gate, and
no test matrix entry for 3.13, and opencode_go.py's HTTP paths were entirely
untested.

- Add a lint job: blocking `ruff check` (after fixing the 2 real lint errors —
  an unused import and an undefined-name forward ref) plus a report-only mypy
  step (codebase isn't fully typed yet).
- Run pytest with coverage and --cov-fail-under=70; add Python 3.13 to the
  matrix and classifiers.
- Configure ruff/mypy/coverage in pyproject.toml and add ruff/mypy/pytest-cov to
  dev extras; ignore coverage/cache artifacts in .gitignore.
- Add tests/test_opencode_go.py covering _request_json (success, HTTPError,
  URLError, non-JSON), model discovery + probes, and refresh orchestration with
  stubbed urlopen. opencode_go.py coverage 70% -> 97%.

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant