Summary
The proxy's test suite (test_proxy.py) is entirely offline and mock-based, covering only the proxy's own code. It provides zero coverage of the four provider backends (codex_oauth.py, cursor_agent.py, OpenAI-compat translation edge cases, Anthropic passthrough), the scripts/doctor.py config validator, or the shell launcher scripts. There are no integration tests, no fuzz tests, and no tests for the security-relevant code paths (auth header forwarding, /uc/select state mutation, /healthz disclosure, SSRF via route upstream).
Evidence
Test file: test_proxy.py — runs entirely against a MockBackend HTTP server in the same process. All routes resolve to mock responses.
Coverage gaps identified:
providers/codex_oauth.py: zero test coverage. JWT decoding (_decode_jwt_claims), token refresh (_best_effort_refresh via subprocess), and the Codex Responses API streaming parser are untested.
providers/cursor_agent.py: zero test coverage. The _MARKER_RE regex extraction (the prompt-injection path) is untested.
- Auto Router classifier integration: the mock tests verify routing decisions but not the classifier's actual HTTP call, timeout behavior, or score parsing robustness.
- Shell launchers (
bin/ultracode, windows/Start-UltraCode.ps1): no tests. The PID file race, settings file write, and model save/restore are untested.
- Security paths: no test verifies that
/healthz does or does not expose configuration, or that /uc/select can or cannot be called without auth.
- The
_router_cache_key collision behavior is not tested.
CI matrix (ci.yml) runs test_proxy.py and examples/auto_router_demo.py only — no coverage measurement, no branch coverage enforcement.
Why this matters
- Bugs in
codex_oauth.py token handling or cursor_agent.py tool-call parsing fail silently in production (the proxy falls back or returns an error) with no automated detection.
- The prompt injection path in
cursor_agent.py (_MARKER_RE over full CLI output) has never been tested with adversarial input.
- The router cache key collision bug described in a separate issue cannot be caught by the existing suite.
- The launcher scripts have logic that can corrupt user settings (
SAVED_MODEL_FILE); this is completely untested.
Attack or failure scenario
A regression in codex_oauth.py's JWT expiry check (_is_expiring) causes fresh tokens to be reported as expired, triggering unnecessary subprocess.run("codex login status") on every request, DoS-ing the user's shell environment. This would not be caught by CI.
Root cause
The test suite was written to validate the core proxy transformation logic in isolation. No test infrastructure exists for the providers, launchers, or security properties.
Recommended fix
- Add unit tests for
codex_oauth.py: mock AUTH_FILE, test _decode_jwt_claims with valid/expired/malformed JWTs, test _access_token refresh path.
- Add unit tests for
cursor_agent.py: test _MARKER_RE with injected markers in tool results, assert they are not extracted as tool calls.
- Add a test that verifies
/healthz returns 200 but does NOT include slots or upstream when called without auth (or fails appropriately once auth is added).
- Add coverage measurement to CI (
python -m pytest --cov) with a minimum threshold.
- Add a fuzz test for
_parse_scores (classifier JSON parsing) using hypothesis.
Acceptance criteria
Suggested labels
testing, bug, security
Priority
P2
Severity
Medium — missing tests are a reliability and regression risk, and specifically mask the prompt-injection path in cursor_agent.py.
Confidence
Confirmed — test file and CI workflow are explicit; provider files are not imported or exercised by the test suite.
Summary
The proxy's test suite (
test_proxy.py) is entirely offline and mock-based, covering only the proxy's own code. It provides zero coverage of the four provider backends (codex_oauth.py,cursor_agent.py, OpenAI-compat translation edge cases, Anthropic passthrough), thescripts/doctor.pyconfig validator, or the shell launcher scripts. There are no integration tests, no fuzz tests, and no tests for the security-relevant code paths (auth header forwarding,/uc/selectstate mutation,/healthzdisclosure, SSRF via route upstream).Evidence
Test file:
test_proxy.py— runs entirely against aMockBackendHTTP server in the same process. All routes resolve to mock responses.Coverage gaps identified:
providers/codex_oauth.py: zero test coverage. JWT decoding (_decode_jwt_claims), token refresh (_best_effort_refreshvia subprocess), and the Codex Responses API streaming parser are untested.providers/cursor_agent.py: zero test coverage. The_MARKER_REregex extraction (the prompt-injection path) is untested.bin/ultracode,windows/Start-UltraCode.ps1): no tests. The PID file race, settings file write, and model save/restore are untested./healthzdoes or does not expose configuration, or that/uc/selectcan or cannot be called without auth._router_cache_keycollision behavior is not tested.CI matrix (
ci.yml) runstest_proxy.pyandexamples/auto_router_demo.pyonly — no coverage measurement, no branch coverage enforcement.Why this matters
codex_oauth.pytoken handling orcursor_agent.pytool-call parsing fail silently in production (the proxy falls back or returns an error) with no automated detection.cursor_agent.py(_MARKER_REover full CLI output) has never been tested with adversarial input.SAVED_MODEL_FILE); this is completely untested.Attack or failure scenario
A regression in
codex_oauth.py's JWT expiry check (_is_expiring) causes fresh tokens to be reported as expired, triggering unnecessarysubprocess.run("codex login status")on every request, DoS-ing the user's shell environment. This would not be caught by CI.Root cause
The test suite was written to validate the core proxy transformation logic in isolation. No test infrastructure exists for the providers, launchers, or security properties.
Recommended fix
codex_oauth.py: mockAUTH_FILE, test_decode_jwt_claimswith valid/expired/malformed JWTs, test_access_tokenrefresh path.cursor_agent.py: test_MARKER_REwith injected markers in tool results, assert they are not extracted as tool calls./healthzreturns200but does NOT includeslotsorupstreamwhen called without auth (or fails appropriately once auth is added).python -m pytest --cov) with a minimum threshold._parse_scores(classifier JSON parsing) usinghypothesis.Acceptance criteria
providers/codex_oauth.pyhas unit tests covering happy path, expired token, and malformed auth file.providers/cursor_agent.pyhas a test verifying that injected<CLAUDE_TOOL_CALL>markers in user content are NOT emitted as tool calls (or are documented as expected behavior with a skip).Suggested labels
testing, bug, security
Priority
P2
Severity
Medium — missing tests are a reliability and regression risk, and specifically mask the prompt-injection path in
cursor_agent.py.Confidence
Confirmed — test file and CI workflow are explicit; provider files are not imported or exercised by the test suite.