fix(api): bind v1 role guard to HMAC token, not just header#222
fix(api): bind v1 role guard to HMAC token, not just header#222antfleet-ops wants to merge 1 commit into
Conversation
`_require_roles` previously trusted only the `X-Agent-Role` header. When ROBOCO_AGENT_AUTH_REQUIRED is enabled (prod), the main agent context dep (`get_agent_context`) enforces HMAC via `_check_agent_auth_token` — but the v1 role guards bypass that path entirely and check the raw header. A caller with network reach to a v1 flow endpoint could pick any allowed role by setting just `X-Agent-Role` and pass the guard without proving they hold the matching token. Add the same token-validation step `get_agent_context` uses to `_require_roles` so the HMAC binding is enforced uniformly: - ROBOCO_AGENT_AUTH_REQUIRED=true + no token → 401 - Token presented + invalid HMAC → 401 (verified in dev too) - Token presented + valid HMAC + wrong role → 403 - Auth required-off + role header alone → preserved (dev parity) The role comparison itself is unchanged. The token validation primitive stays centralized in `roboco.api.deps` — `_role_dep` defers the import inside the closure to avoid pulling the full deps module at router registration time. Regression tests: - New: auth-required + missing token → 401 - New: token presented + invalid HMAC → 401 - Existing: dev mode without auth required still accepts role-only Found via AntFleet two-model bench review (Claude Opus 4.7 + GPT-5): AntFleet/bench-roboco#3 (comment)
|
Thanks for opening your first pull request on RoboCo! Quick checklist before review (most of these are enforced by CI, but worth a glance):
See CONTRIBUTING.md for the full workflow and the Code of Conduct for the community standards we follow. Welcome aboard — a maintainer will review shortly. |
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
rennf93
left a comment
There was a problem hiding this comment.
Change Request for external PR #222: fix(api): bind v1 role guard to HMAC token, not just header
Summary: The diff correctly identifies that the v1 role dependencies (require_dev, require_qa, etc.) in _role_dep.py only validated the X-Agent-Role header string and performed no token binding. This is a bypass vector because these route-level deps were the sole gate for the /api/v1/flow/* endpoints (unlike get_agent_context which already calls _check_agent_auth_token).
The fix hoists the token check into _require_roles before the membership test and reuses the centralized primitive. In header-trust mode (no ROBOCO_AGENT_AUTH_REQUIRED) a missing token is a no-op (legacy preserved); any presented token is still verified and bad ones rejected. This matches the documented contract and deps.py:215 / agents_config.py:79 behavior.
Findings (per-criterion / concrete evidence)
-
File: roboco/api/routes/v1/_role_dep.py (patched _require_roles)
- Expected: the role guard must cryptographically bind
X-Agent-Roleto a validX-Agent-Token(HMAC over id:role:team) when a token is presented, and require the token whenROBOCO_AGENT_AUTH_REQUIRED=true, before allowing the role membership check to pass. - Actual (after patch):
_checknow declaresx_agent_id,x_agent_role, optional team/token; calls_check_agent_auth_token(...)(deferred import) thenif x_agent_role.lower() not in allowed: 403. Token failures raise 401 from the shared helper. Correct and minimal. - Note: deferred import placement inside the closure (not top-of-_require_roles) properly avoids import-time cycles with routers that import both this module and
roboco.api.deps. Good.
- Expected: the role guard must cryptographically bind
-
File: tests/unit/api/test_v1_role_dep.py (new tests after line ~82)
test_dev_route_requires_token_when_auth_enabled: sets env true, posts to/api/v1/flow/developer/give_me_workwith only ID+Role (no Token) → asserts 401. Matches_check_agent_auth_token:215(if _auth_required() and not x_agent_token: 401).test_dev_route_rejects_invalid_token_even_in_dev: unsets required, sets a secret, sends ID+Role+bad-token → asserts 401. Matchesverify_agent_token:86(non-matching or sentinel → False) and theif x_agent_token and not verify: 401path.- Observation (actionable): both tests are negative cases only. For a security-binding fix, add at least one positive case that constructs a valid token via
issue_agent_token(or the panel token path) for an allowed role and asserts the route returns 200 (or the expected envelope) rather than 401/403. This proves the "good" path still works after the guard change.
-
Test helper dependency (tests/unit/api/test_v1_role_dep.py)
- The new tests call
TestClient(_build_app())and hit the exact v1 developer router path. - Expected:
_build_app(or equivalent fixture) must be included or clearly defined in this PR so the regression is reproducible and the unit test can execute without full integration DB/lifespan side effects. - Actual: only the test bodies appear in the provided diff;
_build_appdefinition is not visible. Provide the helper (or a minimal router-mounted app with dep overrides) as part of the change.
- The new tests call
-
Header declaration duplication (multiple files)
- v1 flow routers (e.g. flow_dev.py:29, flow_pr_reviewer.py:25, flow_qa.py:28, flow_board.py:27, ...) declare
_AgentIdHeader = Annotated[UUID, Header(alias="X-Agent-ID")]on handlers. - The patched guard now also declares
x_agent_id: Annotated[str, Header(alias="X-Agent-ID")](plus team/token). - Expected vs actual: works (FastAPI merges same alias), but the ID is validated twice (str vs UUID coercion) and declared in two places. Not a correctness bug for this fix, but consider a shared auth header dependency in a follow-up to keep the binding logic and declarations co-located.
- v1 flow routers (e.g. flow_dev.py:29, flow_pr_reviewer.py:25, flow_qa.py:28, flow_board.py:27, ...) declare
-
Scope and blast radius
- Only the role-guarded v1 flow routers are affected (
require_*created by_require_roles).v1/do.pyhas no role dep on its router (intentionally different surface). No change toget_agent_contextor middleware. No new dependencies or secret handling introduced. - No evidence of injection, secret-in-code, or supply-chain changes in the diff.
- Only the role-guarded v1 flow routers are affected (
-
Error semantics preserved
- Missing required headers (ID or Role) continue to yield FastAPI 422 (pre-existing behavior asserted by
test_dev_route_rejects_missing_role_header). - Auth failures from token check yield 401; role-not-allowed yields 403. Consistent with
deps.pyand HTTP conventions.
- Missing required headers (ID or Role) continue to yield FastAPI 422 (pre-existing behavior asserted by
Recommendation
Address items 2 and 3 (positive test + visible test helper) so the security regression coverage is complete and reviewable. Once those are present, this is a clean, correctly-scoped defense-in-depth improvement that aligns the v1 role path with the rest of the HMAC contract.
Requesting changes for the completeness items above.
The v1 flow role guards (_require_roles) are the sole gate for /api/v1/flow/* but checked only the X-Agent-Role string — unlike get_agent_context, which verifies the token. So a forged role header passed, and in strict mode (ROBOCO_AGENT_AUTH_REQUIRED) the token was never required on these routes. Call _check_agent_auth_token before the role membership test (deferred import to avoid the cycle): missing token stays a no-op in header-trust mode, any presented token is verified, and strict mode now requires it. Supersedes #222 (antfleet-ops flagged the bypass); re-done as our own change so it lands without the contributor CLA, and adds the positive valid-token test their PR was missing. Co-authored-by: Renn F <rennf93@users.noreply.github.com>
|
Superseded by #228 (merged to master): same token-binding fix on the v1 role guard (plus the positive valid-token test), re-done as our own change so it could land without the contributor CLA. Thanks for flagging the bypass. |
Summary
roboco/api/routes/v1/_role_dep.py::_require_rolestrusts theX-Agent-Roleheader without verifying the agent's HMAC token. The main agent-context dep (get_agent_contextinroboco/api/deps.py) does enforce HMAC via_check_agent_auth_tokenwhenROBOCO_AGENT_AUTH_REQUIRED=true, but the v1 flow routers use_require_rolesinstead and bypass that path entirely.If a caller can reach a v1 endpoint, they can pick any allowed role with just
X-Agent-Role: developer(or any other allowed value) and pass the guard without proving they hold the matching token.What changed
_require_rolesnow calls the same_check_agent_auth_tokenprimitive thatget_agent_contextuses, before comparing the role to the allowed set:ROBOCO_AGENT_AUTH_REQUIRED=true+ missing token → 401This keeps the HMAC contract centralized in
roboco.api.deps(the model's recommendation and is the minimum change that closes the gap. The import is deferred inside the closure to avoid pulling the full deps module at router-registration time.Tests
How it was found
AntFleet's two-model security review (Claude Opus 4.7 + GPT-5) on a mirror of this repo. Both models independently flagged the same defect.
EOF
)