Skip to content

Test suite covers only mock-based core proxy; providers, launchers, and all security paths have zero test coverage #26

Description

@tg12

Summary

The proxy's test suite (test_proxy.py) is entirely offline and mock-based, covering only the proxy's own code. It provides zero coverage of the four provider backends (codex_oauth.py, cursor_agent.py, OpenAI-compat translation edge cases, Anthropic passthrough), the scripts/doctor.py config validator, or the shell launcher scripts. There are no integration tests, no fuzz tests, and no tests for the security-relevant code paths (auth header forwarding, /uc/select state mutation, /healthz disclosure, SSRF via route upstream).

Evidence

Test file: test_proxy.py — runs entirely against a MockBackend HTTP server in the same process. All routes resolve to mock responses.

Coverage gaps identified:

  • providers/codex_oauth.py: zero test coverage. JWT decoding (_decode_jwt_claims), token refresh (_best_effort_refresh via subprocess), and the Codex Responses API streaming parser are untested.
  • providers/cursor_agent.py: zero test coverage. The _MARKER_RE regex extraction (the prompt-injection path) is untested.
  • Auto Router classifier integration: the mock tests verify routing decisions but not the classifier's actual HTTP call, timeout behavior, or score parsing robustness.
  • Shell launchers (bin/ultracode, windows/Start-UltraCode.ps1): no tests. The PID file race, settings file write, and model save/restore are untested.
  • Security paths: no test verifies that /healthz does or does not expose configuration, or that /uc/select can or cannot be called without auth.
  • The _router_cache_key collision behavior is not tested.

CI matrix (ci.yml) runs test_proxy.py and examples/auto_router_demo.py only — no coverage measurement, no branch coverage enforcement.

Why this matters

  • Bugs in codex_oauth.py token handling or cursor_agent.py tool-call parsing fail silently in production (the proxy falls back or returns an error) with no automated detection.
  • The prompt injection path in cursor_agent.py (_MARKER_RE over full CLI output) has never been tested with adversarial input.
  • The router cache key collision bug described in a separate issue cannot be caught by the existing suite.
  • The launcher scripts have logic that can corrupt user settings (SAVED_MODEL_FILE); this is completely untested.

Attack or failure scenario

A regression in codex_oauth.py's JWT expiry check (_is_expiring) causes fresh tokens to be reported as expired, triggering unnecessary subprocess.run("codex login status") on every request, DoS-ing the user's shell environment. This would not be caught by CI.

Root cause

The test suite was written to validate the core proxy transformation logic in isolation. No test infrastructure exists for the providers, launchers, or security properties.

Recommended fix

  1. Add unit tests for codex_oauth.py: mock AUTH_FILE, test _decode_jwt_claims with valid/expired/malformed JWTs, test _access_token refresh path.
  2. Add unit tests for cursor_agent.py: test _MARKER_RE with injected markers in tool results, assert they are not extracted as tool calls.
  3. Add a test that verifies /healthz returns 200 but does NOT include slots or upstream when called without auth (or fails appropriately once auth is added).
  4. Add coverage measurement to CI (python -m pytest --cov) with a minimum threshold.
  5. Add a fuzz test for _parse_scores (classifier JSON parsing) using hypothesis.

Acceptance criteria

  • providers/codex_oauth.py has unit tests covering happy path, expired token, and malformed auth file.
  • providers/cursor_agent.py has a test verifying that injected <CLAUDE_TOOL_CALL> markers in user content are NOT emitted as tool calls (or are documented as expected behavior with a skip).
  • CI reports coverage and fails if it drops below a configured threshold.

Suggested labels

testing, bug, security

Priority

P2

Severity

Medium — missing tests are a reliability and regression risk, and specifically mask the prompt-injection path in cursor_agent.py.

Confidence

Confirmed — test file and CI workflow are explicit; provider files are not imported or exercised by the test suite.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions