feat(auth): oauth install/callback handshake + b24oauth factory (PR-2c)#216
Merged
Conversation
§11 of docs/OAUTH-DESIGN.md mandates this commit lands FIRST inside
PR-2c — before any oauth.* handler emits a structured log line.
Reversing the order means a reviewer can be asked to accept a
`logger.info('oauth.callback.ok', { url })` call while the redactor
still passes `?code=…` through unchanged. The ordering removes that
class of mistake by making the redactor load-bearing at the first
caller.
What this commit adds (and nothing else):
- `OAUTH_URL_RE` — catches `?code=…`, `?refresh_token=…`,
`?access_token=…`, `?client_secret=…` in URL query-string position.
Preserves the key name (operator forensics) and replaces only the
value with `<REDACTED>`.
- `OAUTH_JSON_RE` — catches `"access_token":"…"`, `"refresh_token":"…"`,
`"client_secret":"…"` in JSON-stringified payloads. Closes the
fixture-shape-4 leak in §11 (`logger.error('…', { body:
JSON.stringify(res) })`). The OAuth authorization `code` is
intentionally OMITTED from the JSON regex — `"code"` is too common a
key name (HTTP status, error codes) and would over-redact; the
URL-shape `?code=` is the only realistic leak surface for that
parameter.
- `client_secret` added to `SENSITIVE_KEYS` so direct-key field access
(`{ client_id, client_secret }`) also redacts via the object-walker
path.
Coverage — 12 new fixtures in `tests/unit/utils/logger-redactor.oauth.test.ts`:
- Fixture shape 1: `{ url: '…?code=…' }`.
- Fixture shape 2: `{ redirectUrl: '…?refresh_token=…' }`.
- Fixture shape 3: template literal carrying `err.message` with URL.
- Fixture shape 4: `{ body: JSON.stringify(response) }`.
- Plus: defence-in-depth `?access_token=` strays, pretty-printed JSON
(whitespace between `:` and value), the "no over-redaction" pin for
generic `{"code": "…"}` shapes, and the end-to-end test that drives
both shapes through `makeRedactingLogger` in one call.
Zero OAuth-logging callers introduced in this commit. Tests pass
against the existing webhook-only suite (566 total, +12 new), and the
extension is byte-identical for webhook-only forks — `WEBHOOK_URL_RE`
is unchanged and runs first.
) Round-3 review on PR-2d flagged `requestId?: string` as forward-compat that invited "forgot-to-set" bugs — issue #214's resolution was to add a strict accessor that throws loudly the moment the middleware fails to populate the field, instead of returning `undefined` and letting downstream `jq` queries silently miss correlation ids. The accessor: ```ts export function getRequestId(): string { const ctx = tenantContext.getStore() if (!ctx?.requestId) { throw new Error( 'getRequestId() called outside a runWithTenant scope, or runWithTenant was ' + 'called without a requestId. The MCP middleware (PR-2c) must wrap every ' + 'request with `runWithTenant({memberId, userId, requestId}, …)`. ' + 'See docs/OAUTH-DESIGN.md §11 and issue #214.', ) } return ctx.requestId } ``` Two failure modes covered by one throw — both are "middleware not wired" bugs, neither is a runtime branch: 1. `getRequestId()` called outside any `runWithTenant` (no ALS scope). 2. `runWithTenant` was called with just `{memberId, userId}` (a partial context that PR-2a's tests still use; production middleware in PR-2c sets the field unconditionally). The PR-2d-introduced `requestId?: string` optional field stays as-is in the interface — tests that construct partial contexts keep compiling. Production code reads via `getRequestId()`, not via the raw field, so a missing value becomes a loud error at the first logger.info call (the helper is invoked call. Tests: +4 cases in `tests/unit/utils/request-context.test.ts`: - Happy path returns the field value. - Throws when called with no `runWithTenant` wrap. - Throws when wrap omits `requestId`. - Error message contains the operator-actionable hints (`MCP middleware (PR-2c)`, `§11`, `#214`) so a stack-trace-only reader can trace back to the design doc + issue. Zero callers introduced in this commit — `oauth.*` and `mcp.auth.*` emitters that use the helper land in subsequent PR-2c commits. Refs: #214.
Marketplace-app install entry point — operator pastes
`https://<mcp-host>/api/oauth/install?portal=<portal>` into their
README; each user who clicks lands here, gets a state nonce minted +
CSRF cookie set, and is 302-redirected to their portal's authorize
page. After the user approves, Bitrix24 redirects back to /callback
(PR-2c step 5).
Gates (all fail loud with errorCode in `data`):
- 503 FLAG-OFF — NUXT_BITRIX24_OAUTH_ENABLED=false. The install
link should not be reachable on a webhook-only
deploy or after a rollback (§10).
- 503 NOT-CONFIGURED — flag is on but CLIENT_ID or REDIRECT_URL is
empty (operator partially-configured). Names the
missing variable in the structured log line.
- 400 PORTAL-FORMAT — `?portal=` failed the allow-list regex from §8
#3: `^[a-z0-9-]+\.bitrix24\.(com|ru|eu|de|by|kz|ua)$`.
Defends against open-redirector abuse (any
host smuggled in via `?portal=` would otherwise
end up in the 302 Location).
Happy path:
- Mints two independent 32-byte hex nonces — `state` (persisted in
oauth_state with 5-min TTL) and `csrfCookie` (set as a first-party
cookie). The cookie/state pair is what /callback will verify.
- Cookie attributes: `HttpOnly; Secure; SameSite=Lax; Path=/api/oauth/`.
Lax allows the top-level navigation back from Bitrix24's authorize
page; Path-scoping keeps the cookie out of `/mcp` and tool routes.
- 302 to `https://<portal>/oauth/authorize/` with `client_id`, `state`,
`redirect_uri`, `scope`, `response_type=code`.
Logging (§11 taxonomy):
- `oauth.install.start` (INFO, on entry — portal + clientId visible)
- `oauth.install.deny.flag-off` (WARN)
- `oauth.install.deny.not-configured` (ERROR — operator misconfig)
- `oauth.install.deny.portal-format` (WARN — caller error / probe)
- `oauth.install.ok` (INFO — `statePrefix` is the first 8 hex
chars of the nonce only; the full value is a
secret bound to the CSRF check, per §11
debug-trace policy)
The structured logger goes through the redactor extended in step 1, so
any future code path that accidentally feeds an OAuth URL into the
context (e.g. via a wrapped error) gets scrubbed before it reaches the
sink.
Tests (`tests/unit/api/oauth/install.test.ts`, 23 cases):
- Each gate path → expected statusCode + errorCode in body.
- 7 happy-portal hostnames accepted; 7 invalid (empty, non-bitrix24,
javascript scheme, unlisted TLD `bitrix24.us`, no subdomain,
path/port injection) → 400.
- 302 to authorize URL with all expected query params.
- oauth_state row persisted with portal + clientId + 5-min TTL.
- Cookie set with the documented attributes + Max-Age=300.
- Cookie value MATCHES the persisted csrfCookie (the pairing the
callback will verify — pinned so a future split doesn't break it).
- Full state nonce NEVER appears in any logged context — only the
8-char `statePrefix`.
- Two sequential calls produce distinct state + cookie nonces (no
reuse).
The tests drive a real h3 `createApp` + `toNodeListener` so `setCookie`,
`sendRedirect`, and `createError` go through the actual h3 code paths
— no shortcuts.
No other code paths touch this route yet. Callback handler lands in
the next commit; the Bearer middleware that consumes `mcp_tokens` lands
after that.
…step 4-5) Closes #211 in spirit (the scheduler the §11 contract demanded for the oauth_state table); the health endpoint per §11 lands together so the operator-tier observability surface arrives in one piece. ## /api/oauth/_health (step 4 of PR-2c) Operator-tier counts endpoint — no PII, no tokens, no portal hosts. Returns: { enabled: true, dbPath: '/data/oauth.sqlite', tenants: 12, // count of oauth_tokens rows bearers: 47, // count of active mcp_tokens (revoked_at IS NULL) pendingStates: 3, // count of oauth_state rows with expires_at > now lastRefreshOk: null, // stub — PR-2c step 6 (B24OAuth factory) populates lastRefreshFail: null, // ditto } Three new prepared statements on TokenStore + a `getHealthCounts()` verb. All three queries run in one synchronous bundle (`better-sqlite3` is sync) so the endpoint is a single round-trip. Authentication — fails closed (§11 mandate). Two acceptable patterns: 1. Network isolation: request from 127.0.0.1 / ::1 (nginx allow/deny upstream). No application-level token; ops infra owns access. The recommended pattern for the reference docker-compose setup. 2. Dedicated `NUXT_BITRIX24_OAUTH_ADMIN_TOKEN` env var. Constant-time compared against `Authorization: Bearer`. Use this if network isolation is infeasible (shared single-host setups). Failure modes (all loud, with errorCode in the JSON body): - 503 FLAG-OFF — NUXT_BITRIX24_OAUTH_ENABLED=false. - 503 NOT-CONFIGURED — neither admin token nor localhost. Default config ships closed. - 401 ADMIN-TOKEN-MISSING — token configured, no Bearer header. - 401 ADMIN-TOKEN-INVALID — Bearer present but wrong. NEVER falls back to `NUXT_MCP_AUTH_TOKEN` (the agent's token) — pinned by a dedicated test. Privilege separation: a compromised agent (prompt injection, jailbroken DXT) must not read fleet-level counts. Once the operator opts in to token-based auth, the route is uniformly token-gated even from localhost — no implicit bypass. A dev box must include the Bearer too. ## pruneExpiredStates scheduler (step 5 of PR-2c, issue #211) `oauth_state` rows live 5 minutes (the install→callback TTL). Rows that never get consumed (browser closed, link not clicked, `/install` spam) accumulate without cleanup. The schema-bootstrap plugin (`server/plugins/oauth-schema.ts`) now arms a 5-minute setInterval: - First tick at +5 min, NOT on boot. - Logs `oauth.state.prune.ok` ONLY when rows > 0 (no log spam on empty buckets). - Logs `oauth.state.prune.fail` on transient SQLite errors (disk full, lock contention) and CONTINUES — next tick retries. - Cleared via Nitro's `close` hook so SIGTERM / dev hot-reload don't leave a zombie timer. - `timer.unref()` so a hanging timer doesn't keep the process alive if the close hook didn't fire (defensive). Flag-off path: zero behaviour change. No timer armed, no close hook registered, no `useTokenStore()` call. `PRUNE_INTERVAL_MS` exported as `@internal` for testability. ## .env.example + nuxt.config.ts Added `bitrix24OauthAdminToken: ''` to runtimeConfig (NUXT_BITRIX24_OAUTH_ADMIN_TOKEN). Documented in the JSDoc that this token MUST NOT fall back to mcpAuthToken — privilege levels differ. ## Tests - `tests/unit/api/oauth/_health.test.ts` (14 cases): each gating branch + counts shape + dbPath derivation + log-payload assertion + the MCP_AUTH_TOKEN privilege-separation invariant + the localhost+token "token wins" behaviour. - `tests/unit/utils/oauth-schema.test.ts` (+5 cases on top of the existing 3): timer arms at +5 min, logs on >0 prunes, silent on 0 prunes, survives a prune throw, close-hook clears the timer, flag-off arms NOTHING. Total: 612 passed, 1 skipped, 2 todo. Typecheck + eslint clean. Refs: #211, §11.
The user-visible half of the install→callback handshake. Bitrix24
redirects the user here after the authorize page; this handler verifies
the state nonce, exchanges the authorization code for tokens, persists
the tenant + mints a Bearer, and renders an HTML page that shows the
Bearer ONCE with paste instructions.
Verification chain (each failure → distinct errorCode + 4xx):
- PARAMS-MISSING — code or state absent from the URL.
- STATE-MISSING — state not in oauth_state OR expired.
Single atomic `DELETE … RETURNING`
(PR-2b) — re-using a state is a hard
no.
- STATE-COOKIE-MISMATCH — CSRF cookie doesn't match the persisted
csrfCookie (§8 #2).
- STATE-PORTAL-MISMATCH — Bitrix24's callback `?domain=` disagrees
with the install portal (replay across
portals defence).
- STATE-CLIENT-MISMATCH — clientId rotated between install +
callback (config drift defence).
Token exchange (POST to `https://oauth.bitrix24.tech/oauth/token/`):
- EXCHANGE-NETWORK (502) — fetch rejected (DNS / ECONNREFUSED).
- EXCHANGE-NON-JSON (502) — response body wasn't JSON (upstream
maintenance page).
- EXCHANGE-FAIL (502) — Bitrix24 returned `{ error: ... }` or
non-2xx status. `invalid_grant` is the
canonical "code reused / expired" path.
- EXCHANGE-BAD-USER-ID (502) — Bitrix24 returned a non-numeric user_id
(defence in depth).
The `client_secret` is sent in the POST body (form-urlencoded) so it
never appears in any URL-shaped log line even by accident. The redactor
extended in step 1 catches OAuth secrets in BOTH URL-query and
JSON-body shapes — so even an `err.message` carrying a fetched URL or
a logged response body gets scrubbed.
Happy path:
1. `upsertTokens` writes oauth_tokens (audit-first via the PR-2b
invariant — recordAuditEvent BEFORE the SQLite write).
2. `createMcpToken` mints a Bearer for the (member_id, user_id) pair.
The persisted portal hostname is used as the default label so the
PR-2c-pending "list my Bearers" operator tool (issue #212) can
identify it without the user naming it.
3. The CSRF cookie is cleared (`deleteCookie` emits `Max-Age=0`).
4. Response headers: `Cache-Control: no-store, no-cache`, `Pragma:
no-cache`, `Content-Type: text/html; charset=utf-8`. The Bearer is
embedded in a `<pre>` block, never in a URL.
5. HTML page is intentionally JS-free (no external script-src) — the
operator pastes the token manually.
Logging (§11 taxonomy):
- oauth.callback.start (INFO — statePrefix 8 chars only)
- oauth.callback.deny.<reason> (WARN/ERROR)
- oauth.callback.exchange.fail (ERROR — Bitrix24 error
code, NOT error_description
or raw URL/body)
- oauth.callback.exchange.ok (INFO — memberId, userId,
bearerHashPrefix 'sha256-'+8 hex,
portal; NEVER the raw Bearer)
Tests (`tests/unit/api/oauth/callback.test.ts`, 16 cases):
- Each verification branch with a distinct seed scenario.
- Each exchange failure mode with a corresponding `fetchMock` shape
(network reject, invalid_grant, 5xx, non-JSON, bad-user-id).
- State is consumed atomically — second callback with same state →
STATE-MISSING.
- Happy path: oauth_tokens row landed with correct fields, mcp_tokens
Bearer minted, the round-trip `hashBearer(raw) === findByBearerHash`
invariant holds.
- The raw Bearer NEVER appears in any logged context (full set scan).
- The POST body shape is pinned: grant_type, client_id, client_secret,
code, redirect_uri.
Fetch is mocked via `vi.stubGlobal('fetch', vi.fn())` instead of `msw`
— one fetch site, no need for the dev-dependency surface.
Total: 628 passed (+16 callback). Typecheck + eslint clean.
Refs: §3, §8, §11.
`useBitrix24OAuth(memberId, userId): B24OAuth` is the per-tenant
client factory `docs/OAUTH-DESIGN.md §5 + §7` promised. Used by
`useBitrix24Tenant()` when `NUXT_BITRIX24_OAUTH_ENABLED=true` and a
tenant context is bound; tools never call it directly.
## Caching strategy
- In-memory LRU keyed on `${memberId}:${userId}`, capped at 100
entries (§5 sizing — covers the realistic per-process working set).
Map-based with promote-on-access (delete + re-set preserves
insertion order which IS the LRU order).
- Synchronous construction. JavaScript's single-threaded event loop
means two callers cannot be mid-init for the same key
simultaneously, so no mutex is needed; the first caller's `lruSet`
settles before the next caller's `lruGet` runs. The §5 "per-tenant
mutex" narrative applies only to the REFRESH path inside the SDK
(`setCustomRefreshAuth` is async-callable by the SDK), and that's
handled by the callback below.
- Cache eviction caveat (§5 known edge case): if an entry is evicted
while a refresh is in flight inside the SDK, a concurrent call after
eviction creates a second instance, both refresh attempts write the
same new tokens (idempotent). Eviction is logged so the LRU can be
sized correctly.
Keeping the factory sync also keeps `useBitrix24Tenant()` sync —
which avoids touching the 20+ tool call sites that PR-2d migrated.
The async-ness in §7's draft signature was design-doc-driven but
isn't actually needed: B24OAuth construction is sync, and refresh is
already async via the SDK's setCustomRefreshAuth contract.
## Refresh handling — `setCustomRefreshAuth`
Chose `setCustomRefreshAuth` over `setCallbackRefreshAuth` so the HTTP
exchange happens on OUR fetch (testable via `vi.stubGlobal('fetch', ...)`),
and the error classification is direct:
- `invalid_grant` → `markRefreshFailed(memberId, userId)` stamps every
active Bearer `revoked_at`. Evicts the cached instance so a
re-authorise lands a fresh one. The user re-authorises via /install.
- Network error / 5xx → log + re-throw WITHOUT `markRefreshFailed`.
Refresh failure may be transient (DNS, brief outage); Bearers stay
alive.
Process-local `refreshStatus` tracker captures `lastRefreshOk` /
`lastRefreshFail` timestamps. Exported as `_readRefreshStatus()` for
the `/api/oauth/_health` endpoint to populate the corresponding fields
(currently stubbed `null` — wired into _health in the next commit).
## Logging (§11 taxonomy)
- `oauth.refresh.start` (INFO)
- `oauth.refresh.ok` (INFO — new accessExpiresAt)
- `oauth.refresh.fail.invalid-grant` (ERROR — refresh-token death)
- `oauth.refresh.fail.transient` (ERROR — network / 5xx / non-JSON)
- `oauth.factory.lru.evicted` (DEBUG — eviction signal so the
operator can size the LRU)
The raw `refresh_token` / `access_token` / `client_secret` NEVER appear
in any logged context — verified by an end-to-end log-scan test.
## Tests (tests/unit/utils/bitrix24-oauth.test.ts, 11 cases)
Caching (6):
- Returns a `B24OAuth` instance for an existing tenant.
- Two sequential calls → same instance (identity).
- N=10 concurrent calls → all share one instance (sync construction
in JS's single-threaded loop = no race window).
- Different tenants → different instances.
- Missing tenant row → throws `oauth_tokens row missing`.
- Missing CLIENT_ID → throws.
Refresh (4):
- HTTP POST hits `oauth.bitrix24.tech/oauth/token/` with correct
`grant_type=refresh_token` body. New tokens persisted via
`upsertTokens('refresh')`. `lastRefreshOk` non-null. `oauth.refresh.ok`
logged.
- `invalid_grant` → `markRefreshFailed` evicts active Bearers, cache
evicted (subsequent call gets a fresh instance), `lastRefreshFail`
non-null, `oauth.refresh.fail.invalid-grant` logged.
- 5xx → re-throws WITHOUT touching Bearers (transient class),
`oauth.refresh.fail.transient` logged.
- Log-scan invariant: raw `refresh_token`, `access_token`, `client_secret`
values NEVER appear in any logged context.
Status tracker (1):
- `_readRefreshStatus()` starts both fields at `null` on a fresh
process.
The test sniffer pattern (`Object.defineProperty` on
`B24OAuth.prototype.setCustomRefreshAuth`) captures the factory's
refresh callback so it can be invoked directly — equivalent to "the
SDK detected expiry and called refresh" without round-tripping through
the SDK's internal expiry check.
Refs: §5, §7, §11.
61cc972 to
0cca4d4
Compare
…ep 8) `useBitrix24Tenant()`'s OAuth-on branch previously threw "OAuth path not yet implemented (lands in PR-2c)" with the tenant ids on the operator-side logger. This commit replaces the throw with the real `useBitrix24OAuth(tenant.memberId, tenant.userId)` call — the per-tenant B24OAuth factory from step 7. Tools that have been routing through the dispatcher since PR-2d (#213) now resolve to per-tenant clients when the flag is on, without a single tool-side edit. Wired changes: - `server/utils/bitrix24-tenant.ts`: - Import `useBitrix24OAuth` from `~/server/utils/bitrix24-oauth`. - OAuth-on + tenant-bound path now `return useBitrix24OAuth(memberId, userIdNum)`. - `tenant.userId` is a string in `TenantContext` (matches audit-log shape); the factory expects a number. Defensive `Number.parseInt` with a `Number.isFinite` guard that throws + logs `oauth.tenant.dispatch.bad-user-id` if a non-numeric value slips through. In practice the Bearer middleware will only construct `TenantContext` from a verified `findByBearerHash` result whose `user_id` column is INTEGER NOT NULL, so this is belt-and-braces. - JSDoc rewritten — the "not yet implemented" caveat is gone; the behaviour description now matches the wired-up reality. - The no-tenant-scope branch also logs via the structured logger now (`oauth.tenant.dispatch.no-tenant-scope`) for consistency with the bad-user-id branch. - `server/api/oauth/_health.get.ts`: - Import `_readRefreshStatus` from `~/server/utils/bitrix24-oauth`. - `lastRefreshOk` / `lastRefreshFail` now reflect the factory's process-local tracker (both `null` on a fresh process, correctly signalling "no refresh has been attempted yet"). The log payload includes the refresh status fields alongside counts so a single `oauth.health.ok` log line carries the full health snapshot. Tests: - `tests/unit/utils/bitrix24-tenant.test.ts`: - Mock `~/server/utils/bitrix24-oauth` with a sentinel client so the dispatcher's tenant-to-client resolution is controllable without seeding a real SQLite store from inside the dispatcher's unit test. - Replaced "throws not yet implemented" tests with the corresponding wired-up assertions: factory called with correct args, dispatcher returns the factory's result. - Added a non-numeric-userId guard test. - The N=10 concurrent test now pins the bijection on factory call arguments instead of error-message payloads (cross-tenant leak guard survives at the new layer). - `tests/unit/tools/users/current-user.test.ts`: - The "tenant-scope-bound branch" test now asserts on the factory's `row missing` error (the realistic failure with no oauth_tokens seed) rather than the now-removed "not yet implemented" string. Total: 640 passed (+1 dispatcher test rename, -1 factory wiring replaces the stub). Refs: §7, §11, #214.
Three CI regressions on the previous push, all fixed: 1) **Commit messages** check ran on the PR title (not the commits) and flagged 'OAuth' as pascal-case. PR title updated to 'oauth install/ callback handshake + §11 observability surface (PR-2c, WIP draft)' via the GitHub MCP tool — separately from this commit, since the title isn't a repo artefact. This commit doesn't carry the title fix; mentioned for traceability. 2) **Unit tests** — `Cannot find package 'h3'` in the three new `tests/unit/api/oauth/*.test.ts` files. h3 is a transitive Nuxt dependency, accessible during local dev (pnpm hoists it), but CI's stricter ESM resolution refuses to load a direct import that isn't declared in package.json. Added `h3` to devDependencies pinning to the version already in node_modules (1.15.11). The tests use h3's `createApp` + `toNodeListener` to drive routes through the real h3 framework end-to-end (cookies, redirects, headers); mocking h3's internals would be brittle. 3) **Unit tests** — `EACCES: permission denied, mkdir '/data'` in `tests/unit/tools/users/current-user.test.ts`. The flag-on guard test runs the dispatcher → factory → `useTokenStore()` chain, and the lazy `useTokenStore()` open tries to mkdir `/data` on the CI runner. The factory's behaviour is already exhaustively covered by `tests/unit/utils/bitrix24-oauth.test.ts`; the catalogue test only needs to assert that the catalogue surfaces the factory's loud failure. Mocked `~/server/utils/bitrix24-oauth.useBitrix24OAuth` to throw a stable `oauth_tokens row missing` error so the test asserts on shape without touching the filesystem. Local validation: - `npx vitest run` — 640 passed, 1 skipped, 2 todo. - `npx nuxt typecheck` — clean. - `npx eslint .` — clean. No production code changed; only test isolation + devDep declaration.
After the previous CI-fix push, `tests/unit/api/oauth/callback.test.ts`
still failed with EACCES on `mkdir /data` from `audit-log.ts:248`. The
callback handler calls `store.upsertTokens()` and `store.createMcpToken()`,
both of which trigger `recordAuditEvent` (the PR-2b audit-first
invariant). The real audit-log lazily creates
`${NUXT_AUDIT_DIR ?? /data/audit}/` on first write — fails on a CI
runner without /data write permission, and the 500 from the handler
cascades into three assertion failures on the happy-path tests.
Same hazard would apply to `_health.test.ts` once the "counts" test
runs against an environment without /data; defensive-mock there too.
The audit-first invariant is already exhaustively tested in
`token-store.test.ts` via the same `vi.hoisted` + `vi.mock` pattern —
copying that pattern into the two route tests gives uniform isolation
without changing what the route tests actually assert (statusCode,
headers, body, persisted DB state).
`install.test.ts` doesn't trigger audit (only `createState` which is
not audit-first), so it stays unchanged.
Local: 640 passed, typecheck + eslint clean.
…ow-up) After the previous CI-fix push (403cac9), `tests/unit/utils/bitrix24-oauth.test.ts` was still leaving 7 unhandled rejections from `audit-log.ts:248` (EACCES on `mkdir /data`). The factory tests call `store.upsertTokens(SAMPLE_TENANT, 'install')` in setup + `createMcpToken` in body; both trigger `recordAuditEvent` via the PR-2b audit-first invariant. The audit-log lazily writes to `${NUXT_AUDIT_DIR ?? /data/audit}/`, which fails with EACCES on CI runners without /data write permission. Same `vi.hoisted` + `vi.mock('~/server/utils/audit-log', ...)` pattern as `token-store.test.ts`, `callback.test.ts`, `_health.test.ts`. The audit- first invariant itself is exhaustively tested in `token-store.test.ts`; the factory tests only need the DB-side effects. Last test file in this PR that exercises real `recordAuditEvent` — every other OAuth-related test either mocks audit-log directly or mocks the factory/token-store layer above it. Local: 640 passed, typecheck + eslint clean.
Two single-file run-it-and-paste-the-output scripts for the operator
manually verifying the new OAuth install/callback/health surface
without needing a real Bitrix24 portal:
- scripts/manual-qa-pr2c.sh (bash, Linux/macOS/WSL/Git-Bash)
- scripts/manual-qa-pr2c.ps1 (PowerShell, Windows native)
Both detect which scenario the running server is in by probing
/api/oauth/install with no params, then run the appropriate assertion
set:
- Scenario A (flag off, default): /api/health still 200; OAuth
endpoints refuse with errorCode FLAG-OFF. This is the
pre-deployment smoke — proves the OAuth diff is byte-identical
to webhook-only behaviour.
- Scenario B (flag on, configured): portal allow-list rejects
junk; happy install → 302 + CSRF cookie with HttpOnly +
SameSite=Lax; callback gates work (PARAMS-MISSING, STATE-MISSING);
/api/oauth/_health gating (200 from localhost, 401 with token).
- Scenario C (flag on, missing CLIENT_ID/REDIRECT_URL): the script
detects + names the missing config so the operator knows where
to look.
Real Bitrix24 round-trip (install → authorize → callback with a real
code) is OUT of scope for the scripts — that needs a registered
Marketplace app + a real user. The scripts pin the gates that DON'T
need it, which catches every regression class except "the actual
token-exchange POST fails", and that is covered by the
`tests/unit/api/oauth/callback.test.ts` fetch-mock suite.
Exit code 0 if all assertions pass, 1 if any failed, 2 if the script
couldn't reach the server.
Usage:
pnpm dev # or: docker compose up
./scripts/manual-qa-pr2c.sh
# or on Windows:
.\scripts\manual-qa-pr2c.ps1
ShellCheck CI flagged `exit $([ $FAIL -eq 0 ] && echo 0 || echo 1)` at line 180 (SC2046: command substitution without quotes is word-split). Replaced the cleverness with a plain if/else — clearer and lint-clean: if [ "$FAIL" -eq 0 ]; then exit 0; else exit 1; fi `bash -n` passes; the other command substitutions in the file are all assigned to variables (not word-split), so they're safe.
6 tasks
Five Sonnet agents (docs/programmer/tester/security/CTO) reviewed the install/callback/factory surface. Consolidated fixes, all in this PR except the Bearer-middleware tracking issue (#217, agreed split). SECURITY (Security #1, 🔴) — X-Forwarded-For localhost spoof on _health: `getRequestIP(event, { xForwardedFor: true })` trusted a client- supplied header, so `curl -H 'X-Forwarded-For: 127.0.0.1'` from the internet passed the localhost gate and read fleet counts. Switched to the raw socket IP (`getRequestIP(event)`). New test proves the spoof now gets 503. SECURITY/Programmer (🔴) — _health leaked dbPath: Removed `dbPath` from the response body — a filesystem path is infra topology that aids a post-auth attacker / a misconfigured-nginx exposure. Added `processStartedAt` instead so a dashboard can tell "lastRefreshOk null because just restarted" from "null because no refresh ever ran" (CTO #3). Docs/Programmer (🔴) — STATE-EXPIRED vs STATE-MISSING: `consumeState` folded expired + never-existed into one `undefined`, so `oauth.callback.deny.state-expired` was unemittable and a slow user looked identical to a probe in the logs. `consumeState` now returns the row regardless of expiry (still deletes it — replay protection); the callback checks `expiresAt` and emits STATE-EXPIRED (INFO) vs STATE-MISSING (WARN) with distinct errorCodes. Docs (🔴) — §11 taxonomy didn't match the code: Rewrote the §11 event taxonomy to match every emitted event (install/callback/health/refresh/dispatch), removed the phantom `portal-host` (the one regex covers format + TLD), added the `mcp.auth.deny.*` block as explicitly "(deferred)". §10 rollout table updated: PR-2c is #216, Bearer middleware split to #217. Docs (🔴) — NUXT_BITRIX24_OAUTH_ADMIN_TOKEN undocumented: Added it to `.env.example` with the privilege-separation rationale + the multi-instance scaling caveat (CTO #5). Without this an operator hit a silent 503 with no idea which variable to set. Security (🟡) — CSRF cookie compare: `!==` → constant-time `timingSafeEqual` (matches the _health admin token compare). Security (🟡) — member_id validation: Bitrix24's `member_id` is now regex-validated (`^[a-zA-Z0-9._:-]{1,128}$`) before it becomes a SQLite PK + log field — a compromised upstream can't smuggle a path-traversal / over-long string. New EXCHANGE-BAD-MEMBER-ID errorCode + test. Security (🟡) — domain binding optional: When Bitrix24 omits `?domain=`, the portal binding can't be checked; now logs `oauth.callback.domain-absent` (WARN) so an operator can spot a flow missing the binding. The other three §8 checks still hold. Programmer (🟡) — callback HTTP semantics: Bitrix24 5xx → 503 (retryable upstream outage), 4xx/explicit error → 502. Was uniformly 502. New tests for both. Programmer (🟡) — OAUTH_JSON_RE under-redaction: `[^"\\]+` stopped at a stray backslash and leaked the token tail. Changed to `[^"]+` (OAuth tokens never contain a literal quote). Corrected the misleading comment. New fixture test. Tester (🔴) — LRU eviction untested: Added `_setLruMaxForTests` hook + two tests: eviction-at-capacity logs `oauth.factory.lru.evicted`, and cache-hit promotes to MRU (true LRU order). Tester (🟡) — audit-first at the handler boundary: New test: a failing `recordAuditEvent` aborts the callback BEFORE a Bearer lands (the end-to-end audit-first contract, previously only pinned at the token-store layer). Tester (🟡) — privilege-separation completeness: New test: even with mcpAuthToken set to a value, the admin gate checks the admin-token FIELD — a bare string match can't sneak the agent token through. Programmer (🟢) — getRequestId @throws JSDoc; LRU `cache.has` no-op removed; comment cleanups. Tests: 648 → 660+ (12 new), typecheck + eslint clean. Refs: #217 (Bearer middleware follow-up).
The "Manual-QA scaffold sync" CI gate (SKILL.md Ground Rule 5) requires that any `.env.example` change is mirrored in `skills/run-manual-qa/references/issue-scaffold.md` in the same PR. The round-3 fix added `NUXT_BITRIX24_OAUTH_ADMIN_TOKEN` to `.env.example` but missed the scaffold — this commit closes that. Also refreshes the OAuth-scaffolding paragraph to reflect reality: install/callback + the B24OAuth factory have landed, but the `/mcp` Bearer middleware is still pending (#217), so a minted Bearer isn't accepted by `/mcp` yet. Last-reviewed stamp bumped to 2026-06-05.
Final review round (5 Sonnet agents). CTO verdict: ship after these.
All fixes in this PR; no new deferrals.
Docs (🔴) — §11 drift from code:
- The _health response example still showed `dbPath` (removed in
round-3); replaced with `processStartedAt`, added a "no dbPath"
note.
- The errorCode rule ("suffix after the last dot") didn't cover the
compound `EXCHANGE-*` codes on `oauth.callback.exchange.fail`;
documented the exception explicitly.
- Three references to `NUXT_OAUTH_ADMIN_TOKEN` were missing the
`BITRIX24` segment — corrected to `NUXT_BITRIX24_OAUTH_ADMIN_TOKEN`
(an operator copying the typo would have hit a silent 503).
- Added the `oauth.callback.state-row-corrupt` event to the taxonomy.
Security (🟡) — _health loopback range too narrow:
`LOCALHOST_IPS` was a 3-element Set; a legitimate orchestrator probe
from `127.0.0.2` (the whole 127.0.0.0/8 is loopback per RFC 5735) got
503. Replaced with an `isLoopback()` range check. New test.
Security (🟡) — XSS defence-in-depth on the error page:
`callbackErrorPage` escaped `detail` but not `errorCode`. Both are
code literals today, but a future refactor passing a Bitrix24-
controlled `exchange.error` into that slot would reflect unescaped.
Extracted a shared `htmlEscape()` and applied it to both slots (and
reused it in `bearerSuccessPage`).
Security (🟢→doc) — X-Forwarded-For deployment caveat:
The raw-socket-IP fix is complete for the `node-server` preset, but
edge presets (Cloudflare/Vercel) populate `event.context.clientAddress`
from forwarded headers. §11 now documents that forks on an edge preset
MUST use admin-token mode, not localhost isolation.
Programmer (🟡) — corrupt-DB CSRF guard:
`timingSafeEqual('', '')` returns true. If a corrupt state row had an
empty `csrf_cookie`, a request with NO cookie would pass. Added an
explicit non-empty guard → 500 STATE-ROW-CORRUPT before the compare.
Programmer (🟡) — STATE-EXPIRED boundary:
Documented the intentional one-second skew between the callback's
`< now` accept and `_health`'s `> now` pending-count (cosmetic against
the 5-min TTL).
Tester (🔴) — audit-first handler test strengthened:
Now asserts the exact 500 status AND counts `mcp_tokens` rows == 0
(proves NO Bearer landed, not just no oauth_tokens row).
Tester (🟡) — X-Forwarded-For test now distinguishes mechanisms:
Added a case: remote IP + spoofed XFF=127.0.0.1 + admin token set →
still 401 (the spoof didn't grant a localhost bypass), which the
503-no-config path alone didn't prove.
Tester (🟡) — processStartedAt asserts a sane recent value (catches a
ms-vs-seconds regression), not just `typeof === 'number'`.
Tester (🟢) — domain-absent branch now tested: a callback with no
`?domain=` succeeds (other 3 §8 bindings hold) and logs
`oauth.callback.domain-absent`.
PR body Changelog framing expanded to list the round-3/4 security
fixes (X-Forwarded-For, dbPath removal, member_id validation, CSRF
constant-time, HTML escaping) — these are operator-relevant history.
Tests: 652 passed (+7), typecheck + eslint clean.
Refs: #217 (Bearer middleware follow-up).
This was referenced Jun 5, 2026
IgorShevchik
pushed a commit
that referenced
this pull request
Jun 5, 2026
Five Sonnet agents reviewed the Bearer middleware PR. CTO verdict was
"ship", all critical findings in-PR. Programmer/Security/Tester each
caught 2-3 actionable items.
Docs (stale text now that the middleware is live):
- §11 dropped "(once the Bearer middleware lands)" — the WWW-Authenticate
header is set NOW, not "future". Added an explicit example of the
header shape.
- §10 step 5: removed "callback HTML page warns the user about this"
(the warning was deleted earlier in this PR) — now reads as historical
context ("between #216 and #217 the page warned about Y").
- §7 token-store inventory updated: inspectBearer added next to
findByBearerHash with the rationale for the two-verb split.
- §6 mcp-auth.ts description rewritten to reflect the toolkit-vs-h3
middleware split + the three distinct errorCodes.
- §10 step 8 (PR-5) gained the operator-facing migration warning
requirement from the CTO review: "NUXT_MCP_AUTH_TOKEN is bypassed on
/mcp when OAUTH_ENABLED=true" must land in DEPLOYMENT.md before any
public OAuth announcement.
Tester (3 must-fix):
- ALS scope cleanup test: assert getTenantContext() is undefined
OUTSIDE the middleware wrap. Defends against a future als.enterWith
regression that would leak tenant context across requests.
- WWW-Authenticate full RFC 6750 §3 shape now pinned (Bearer
error="invalid_token" + errorCode + error_description) on
BEARER-REVOKED and BEARER-ORPHAN tests, not just the errorCode
substring.
- BEARER-REVOKED takes precedence over BEARER-ORPHAN test added —
catches a future refactor that swaps the two if-blocks.
- Concurrent N=2 cross-tenant ALS-isolation test added: two parallel
middleware invocations with different Bearers each see THEIR OWN
tenant inside next(). The middleware-layer counterpart of PR-2a's
#64 ALS spike.
Programmer:
- Replaced `denyBearer(...)` calls with `return denyBearer(...)` so
TypeScript narrows `token` to string without a non-null assertion.
The `token!` on the hash construction is gone.
- Added a one-line comment on the String(inspection.userId)
coercion documenting the PR-2a TenantContext.userId-is-string
contract.
Security:
- wwwAuthHeader now escapes backslash and quote in `code` /
`description` defensively (current callers pass literals, but a
future refactor threading a Bitrix24-controlled value into
`description` shouldn't be able to inject extra header attributes).
- server/middleware/mcp-auth.ts no longer yields unconditionally on
flag=true — defence-in-depth: it requires AT LEAST the
`Authorization: Bearer …` shape before yielding. Worst case if the
toolkit middleware is missing (HMR / module-init bug), /mcp still
returns 401, not an auth bypass.
Tests: 667 passed (+4 round-1 cases), typecheck + eslint clean.
Round 1 is the FINAL review round per CTO verdict.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-2c lands the user-visible half of the OAuth multi-tenant flow:
/api/oauth/install— portal allow-list, state nonce, CSRF cookie, redirect to Bitrix24 authorize./api/oauth/callback— state verification (4 §8 CSRF bindings), code-to-token exchange, Bearer mint, HTML page that shows the Bearer once./api/oauth/_health— operator-tier counts endpoint with fails-closed authentication (loopback127.0.0.0/8ORNUXT_BITRIX24_OAUTH_ADMIN_TOKEN, never the agent token).useBitrix24OAuth(memberId, userId)— per-tenantB24OAuthfactory with LRU(100) caching + refresh-on-expiry viasetCustomRefreshAuth+markRefreshFailedoninvalid_grant.useBitrix24Tenant()) now routes to the factory when the flag is on; tools that PR-2d migrated need ZERO further code changes.pruneExpiredStatesscheduler wired into the Nitro schema-bootstrap plugin (closes PR-2c: wire pruneExpiredStates scheduler (5-min interval) #211).OAUTH-DESIGN.md §11ordering invariant, so no OAuth log line can leak.getRequestId(): stringthrowing accessor (closes PR-2c: requestId — populate unconditionally + getRequestId() throwing helper #214) — forward-compat anchor for the Bearer middleware that completes the wire-up.Default-off-safe end-to-end: with
NUXT_BITRIX24_OAUTH_ENABLED=false(the production default) the new endpoints respond with503 FLAG-OFF, no/datadirectory is touched, the existing webhook flow is byte-identical.What's in
Logger-redactor OAuth extension →
getRequestId()accessor → install route →_health+ scheduler → callback route →B24OAuthfactory → dispatcher wiring → manual-QA scripts. Plus four review rounds of fixes (CI green-up, then security/correctness hardening). See the commit history; on squash-merge this collapses to one.652 unit tests passed (+113 vs main), typecheck + eslint + shellcheck + zizmor clean.
Manual QA — run these scripts before merging
Auto-detects the running scenario and prints PASS/FAIL per assertion:
NUXT_BITRIX24_OAUTH_ENABLED=false(default):/api/healthstill 200, OAuth endpoints refuse withFLAG-OFF. The merge-readiness smoke — proves byte-identical-to-webhook._health200 from localhost.CLIENT_ID/REDIRECT_URLmissing: names the missing variable.Full Bitrix24 round-trip (real Marketplace app + portal + user) is out of script scope — covered by
callback.test.tswhich mocks the token exchange viavi.stubGlobal('fetch', ...).What's NOT in (deferred, all tracked)
Bearer → findByBearerHash → runWithTenant) on/mcpdefineMcpHandler({ middleware })integration — its own review round. The last wire that makes a minted Bearer authenticate.vi.mock('~/server/utils/bitrix24-tenant')docs/DEPLOYMENT.md)Without the Bearer middleware,
/mcpstill usesNUXT_MCP_AUTH_TOKEN. An operator can flip the flag, complete/install → /callback, and get a Bearer — but/mcpdoesn't accept it until #217. The callback HTML page warns the user about this explicitly ("⚠ Not active yet").Refs / closes
main.Changelog framing (no BREAKING CHANGE)
/api/oauth/_healthfails closed without operator-tier auth, reads the raw socket IP (noX-Forwarded-Forspoof of the localhost gate), and never exposes the DB path; CSRF cookie compared in constant time; Bitrix24member_idregex-validated before it becomes a SQLite key; callback HTML escapes all interpolated values.scripts/manual-qa-pr2c.{sh,ps1}for operator pre-merge verification.