Skip to content

feat(sync): OAuth 2.0 device-code login for CLI and Docker users#347

Merged
dubadub merged 19 commits into
mainfrom
feat/device-code-auth
May 18, 2026
Merged

feat(sync): OAuth 2.0 device-code login for CLI and Docker users#347
dubadub merged 19 commits into
mainfrom
feat/device-code-auth

Conversation

@dubadub
Copy link
Copy Markdown
Member

@dubadub dubadub commented May 16, 2026

Summary

Fixes #290 — Docker users (and any headless deploy) could not log in to CookCloud because the legacy flow opened a browser and waited on a local TCP listener that Docker containers cannot reach.

This PR implements the client side of OAuth 2.0 Device Authorization Grant (RFC 8628) against the new cook.md endpoints (server-side PR: cook-md/cook-md#62).

Changes

  • cook login / cook logout CLI commands (gated on sync feature). cook login requests a device code, shows the verification URI + user code, optionally opens the URL via open, polls for the token, and persists the session — Ctrl-C cancels cleanly.
  • Web preferences UI no longer opens a browser popup. Clicking Login renders an in-page card with the user code, a Copy button, an Open-in-browser link to verification_uri_complete, and a Cancel button. Polling resumes if the user navigates away and returns.
  • Server handlers /api/sync/login, /api/sync/cancel_login, /api/sync/logout, and an enriched /api/sync/status that surfaces the pending flow. The old browser-popup handlers, local TCP listener, and ad-hoc callback parsing are gone (~150 LOC removed).
  • src/server/sync/device_flow.rs — new module implementing the RFC 8628 client (typed errors via thiserror, slow-down handling, expiry, cancellation via CancellationToken).

Design spec

docs/superpowers/specs/2026-05-15-device-code-auth-design.md

Known follow-ups (out of scope for this PR)

  • CSRF hardening on POST /api/sync/{login,cancel_login,logout} (legacy flow had same shape; tracked separately).
  • Retry transient 5xx/429 from /oauth/device/token (currently treated as BadResponse).
  • Atomicity of concurrent login starts in the web handler (rare; mitigated by Mutex on PendingDeviceFlow).

Test plan

  • cargo build --features sync — clean
  • cargo build --no-default-features --features self-update — clean
  • cargo fmt --check — clean
  • cargo clippy --features sync -- -D warnings — clean
  • cargo test --features sync — passes (help-output snapshot updated)
  • Manual: run cook server, click Login in preferences, complete approval in another tab, confirm session lands
  • Manual: run cook login against a local cook.md, complete approval, confirm ~/.config/cook/session.json written
  • Manual: cook login then Ctrl-C mid-poll → "Cancelled." and no session file
  • Manual: run cook server inside Docker, ensure login flow no longer needs a local listener

dubadub added 16 commits May 16, 2026 14:13
Adds spec for --pantry and --ignore-pantry flags on the shopping-list
command, mirroring the existing --aisle handling.
Replaces the browser-popup login (which fails in Docker / behind
reverse proxies) with the OAuth 2.0 Device Authorization Grant.
Fixes issue #290. Covers cook.md endpoints, CookCLI web UI changes,
and a new `cook login` / `cook logout` CLI.
Cross-repo plan covering cook.md endpoints (Phase 1, TDD with RSpec)
and CookCLI client + CLI commands (Phase 2). Matches the design at
docs/superpowers/specs/2026-05-16-device-code-auth-design.md.
Previously, if a user initiated /api/sync/login and then called
/api/sync/logout before completing browser approval, the spawned
polling task would continue and could write a session file for the
new authorization after logout.
- Hold pending_device_flow lock across request_device_code to close the
  TOCTOU race where two concurrent /api/sync/login calls could both pass
  the is_some check and orphan one device-code polling task.
- Drop unused device_code and interval fields from PendingDeviceFlow;
  they were captured separately into the spawn closure.
- Recompute countdown remaining from a fixed deadline each tick so the
  displayed time stays accurate in throttled / backgrounded tabs.
- Unify device_flow public API on DeviceFlowError (request_device_code
  no longer returns anyhow::Result).
@dubadub
Copy link
Copy Markdown
Member Author

dubadub commented May 17, 2026

Thanks for the review. Addressed in 844f8d7:

  • Create LICENSE #1 TOCTOU race: sync_login now holds the pending_device_flow mutex across the request_device_code await. Concurrent callers serialise — the second sees the slot occupied and returns 409.
  • demo server instance + server issue #2 Dead struct fields: dropped device_code and interval from PendingDeviceFlow; they were captured separately into the spawn closure.
  • Workflow to work with Javascript #6 Countdown drift: tick now recomputes remaining from a fixed deadline = Date.now() + expires_in*1000, so throttled/backgrounded tabs no longer drift.
  • Use gzip instead of brotli #7 Mixed error types: request_device_code now returns Result<_, DeviceFlowError> (added a BadResponse for non-2xx).

Not addressed:

  • HomeBrew integration #3 CSRF and Improve build documentation #4 rate-limit — pre-existing on the legacy flow and acknowledged in the PR body as follow-ups; cook.md's Rack::Attack already throttles /oauth/device/code and /oauth/device/token.
  • #5 unrelated planning docs — the actual diff in this PR is 14 files / +611 / -274; no planning/spec docs are included. May have been a stale view.
  • Minor stdin-leak nit on Ctrl-C: the spawn_blocking task finishes when stdin closes on process exit; harmless.

dubadub added 2 commits May 17, 2026 09:25
The 'should display current configuration' Playwright test uses a broad
'pre, code, .config-display' locator and asserts the first match is
visible. The hidden device-code <code> element matched first and broke
the test. Switch to <span> with identical monospace styling.
Server now returns {access_token, token_type, expires_in} per RFC 6749.
Align the CLI deserializer before clients ship — renaming the field
after release would be a breaking change.
@claude
Copy link
Copy Markdown

claude Bot commented May 18, 2026

Code Review — feat(sync): OAuth 2.0 device-code login

Overview

This PR replaces the old browser-popup + local TCP listener approach with RFC 8628 (Device Authorization Grant), enabling Docker and headless deployments to log in to CookCloud. The core device-flow logic lives in src/server/sync/device_flow.rs and is shared by the new cook login CLI command and the updated web preferences UI.


What looks good

  • device_flow.rs is well-structured. The typed DeviceFlowError enum (via thiserror) gives callers precise matching without stringly-typed error checks. RFC 8628 compliance is solid: authorization_pending, slow_down (correctly bumps interval by +5s), access_denied, and expired_token are all handled, and the expiry check at the top of the loop is correct.
  • AtomicBool login_in_progressMutex<Option<PendingDeviceFlow>> is a clean upgrade — the old flag just prevented concurrent logins; the new struct carries the actual flow data that the UI and the polling task both need.
  • sync_cancel_login and the logout-cleans-up-pending-flow logic are both correct. Cancellation propagates through CancellationToken all the way to the polling loop.
  • Resume on page load (DOMContentLoaded fetching /api/sync/status and re-rendering if a flow is in progress) is a thoughtful UX touch.
  • The Ctrl-C handler in login.rs is properly set up before polling starts, and dot-printer / cancellation cleanup is consistent across all poll_for_token error paths.

Issues and suggestions

1. Ctrl-C is silent during the "Press Enter" prompt (login.rs)

The Ctrl-C signal handler (tokio::signal::ctrl_c) is only spawned after spawn_blocking(stdin.lock().lines().next()) resolves. If the user presses Ctrl-C while waiting at the Enter prompt, the process receives SIGINT before the handler is registered and terminates abruptly — no "Cancelled." message, no cleanup. Since no session has been written yet, there is no state corruption, but it is a jarring UX. Consider registering the Ctrl-C handler before the Enter prompt and using a tokio::select! between the stdin task and the cancel signal, breaking the wait on either.

2. Unrelated planning docs should not be in this PR

The diff includes three new docs/superpowers/plans/ files and three new docs/superpowers/specs/ files:

  • 2026-05-15-shopping-list-pantry-flags.md (unrelated feature)
  • 2026-05-15-static-site-build.md (unrelated feature — 1 774-line plan)
  • 2026-05-16-device-code-auth.md (related design, but agentic scaffolding, not user-facing docs)
  • Matching spec files

These add ~3 400 lines to the diff and make it significantly harder to review the actual code. They belong either in their own branches or should be removed from the repo if they are only agentic scaffolding.

3. expires_in_secs vs expires_in naming inconsistency (preferences.html)

SyncStatusResponse.pending_login uses expires_in_secs: u64 while LoginResponse uses expires_in: u64. In renderPending these are unified via p.expires_in_secs ?? p.expires_in, which works but is fragile. Prefer a single field name on both structs — either keep expires_in everywhere and document the unit, or normalise to expires_in_secs in LoginResponse too.

4. Polling task: session save and in-memory update are not atomic (sync.rs)

In the spawned polling task, session.save(...) is called first; if that succeeds but the subsequent *state_clone.sync_session.lock().unwrap() = Some(session.clone()) panics (extremely unlikely, but theoretically possible), the file on disk is saved while the in-memory state is still None. The server would then show "not logged in" until restarted. Consider flipping the order (update memory first, then persist to disk) or using a helper that does both under a single lock.

5. /.dockerenv detection is unreliable in some configurations

server_host_label() checks for /.dockerenv to decide whether to report "docker" vs "server". This heuristic does not work for rootless Docker, Podman, or containers that deliberately remove that file. The label is cosmetic (it only affects client_name), so this is a minor issue, but it is worth a comment noting the limitation.

6. CSRF hardening (acknowledged — but worth surfacing)

The PR description notes that POST /api/sync/{login,cancel_login,logout} lack CSRF hardening. Cancel and logout are state-mutating endpoints with observable side effects; a cross-site request from a malicious page could cancel an in-progress login or force a logout. This is flagged as a tracked follow-up — make sure there is an issue open for it before merging.

7. No unit tests for device_flow.rs

The core RFC 8628 state machine (poll_for_token) has no automated tests. Given the number of branches (slow_down, access_denied, expired_token, cancellation), a mock HTTP server (using wiremock or httpmock) covering the happy path, denied path, slow_down accumulation, and expiry would significantly improve confidence and prevent regressions.


Summary

The core device-code implementation is correct and well-structured. The main asks before merging are: (1) remove the unrelated planning docs, (2) normalise the expires_in field name, and (3) ensure the CSRF follow-up issue is tracked. The Ctrl-C UX issue during the Enter prompt is worth fixing too but is not a blocker.

@cooklang cooklang deleted a comment from claude Bot May 18, 2026
@cooklang cooklang deleted a comment from claude Bot May 18, 2026
@cooklang cooklang deleted a comment from claude Bot May 18, 2026
@cooklang cooklang deleted a comment from claude Bot May 18, 2026
- login.rs: register the Ctrl-C handler before the "Press Enter"
  prompt and select between stdin and cancellation. Ctrl-C at the
  prompt now exits cleanly with a "Cancelled." message instead of
  terminating abruptly mid-print.
- Rename LoginResponse.expires_in -> expires_in_secs so /api/sync/login
  and /api/sync/status agree on the field name. Drop the nullish-
  coalesce and the renderPending wrapper in preferences.html.
@dubadub
Copy link
Copy Markdown
Member Author

dubadub commented May 18, 2026

Thanks for the second pass. Addressed in 90225c8:

  • Create LICENSE #1 Ctrl-C silent at the Enter prompt — registered the signal handler before the prompt and switched to tokio::select! between the stdin task and the cancel token. Ctrl-C at the prompt now exits cleanly with "Cancelled.".
  • HomeBrew integration #3 expires_in_secs vs expires_in — renamed LoginResponse.expires_inexpires_in_secs so /api/sync/login and /api/sync/status agree. Dropped the ?? and the renderPending wrapper in preferences.html.

Not addressed (with reasoning):

  • demo server instance + server issue #2 unrelated planning docs — confirmed this is a hallucinated finding; the diff is 14 files with no docs/** entries (git diff --stat main..HEAD -- 'docs/**' is empty).
  • Improve build documentation #4 save/in-memory atomicity — the gap only opens if Mutex::lock().unwrap() panics on a poisoned mutex, which crashes the process anyway. Both orderings have a failure mode; the current one (persist first, then update memory) means a disk failure leaves no inconsistent state. Not worth restructuring.
  • #5 /.dockerenv heuristic — cosmetic, only labels client_name. Podman/rootless show up as "server" which is still accurate enough.
  • Workflow to work with Javascript #6 CSRF — tracked as a follow-up in the PR body. Rack::Attack on cook.md throttles both /oauth/device/{code,token} already.
  • Use gzip instead of brotli #7 device_flow.rs unit tests — fair point. Punting to a follow-up; the state machine is small and currently exercised end-to-end against the staging server.

@dubadub dubadub merged commit 6aa725c into main May 18, 2026
2 checks passed
@dubadub dubadub deleted the feat/device-code-auth branch May 18, 2026 17:13
@claude
Copy link
Copy Markdown

claude Bot commented May 18, 2026

Code Review: OAuth 2.0 Device Authorization Grant (PR #347)

Good overall implementation of RFC 8628. The design is clean — removing the local TCP listener was the right call for Docker/headless environments, and the CancellationToken-based approach for cook login is idiomatic. However, there is one critical bug that will prevent login from ever succeeding, and one high-severity concurrency issue that should be fixed before merge.


Critical Bug

1. access_token field name mismatch — login will never complete

src/server/sync/device_flow.rs, line ~35

#[derive(Debug, Deserialize)]
struct TokenSuccess {
    access_token: String,   // BUG: server returns "token", not "access_token"
}

The cook.md server returns {"token": "<JWT>", "token_type": "Bearer"}, but this struct expects access_token. Serde will fail to deserialize and every poll will produce DeviceFlowError::BadResponse(...). The user will be stuck at the code display card until it expires.

Fix: Rename to token: String, or add #[serde(rename = "token")].


High Severity

2. Tokio Mutex held across HTTP .await in sync_login

src/server/handlers/sync.rs

state.pending_device_flow.lock().await is acquired before the call to request_device_code(&client, &name), which makes an outbound HTTP request to cook.md. This can take seconds. During that entire window, every concurrent request touching pending_device_flow — including sync_status (polled by the UI every 2 seconds) and sync_cancel_login — will block completely.

Fix: Check is_some() while holding the guard, then drop(guard), make the network call, then re-acquire to store the result.


Medium Severity

3. No request timeout on reqwest::Client

src/server/sync/device_flow.rs, src/login.rs

reqwest::Client::new() has no connect or read timeout. If cook.md hangs, poll_for_token will block indefinitely. The expires_at check at the top of the loop won't help once an in-flight HTTP call is already awaited.

Recommendation: Use reqwest::Client::builder().connect_timeout(Duration::from_secs(10)).timeout(Duration::from_secs(15)).build()?

4. Cancellation does not interrupt in-flight HTTP requests

src/server/sync/device_flow.rs

The tokio::select! on cancel.cancelled() only fires during tokio::time::sleep(interval). Once sleep completes and the .send().await begins, a cancel signal has no effect until the HTTP response returns (or times out). Combined with issue #3, a user cancelling a login during a hung poll will have to wait indefinitely.

Recommendation: Wrap the HTTP call in a second tokio::select! with cancel.cancelled().


Low Severity

5. New reqwest::Client created per login request

src/server/handlers/sync.rs

Each POST /api/sync/login creates a new reqwest::Client with its own connection pool. Consider storing a reqwest::Client in AppState and cloning it.

6. CSRF protection on sync API endpoints

src/server/mod.rs

POST /api/sync/login and POST /api/sync/cancel_login lack CSRF tokens. The PR acknowledges this as a known follow-up (same posture as the old code). Worth confirming that the CorsLayer does not allow arbitrary origins — a LAN attacker could otherwise trigger or cancel a login flow via fetch() from a malicious page.

7. stdin_task not aborted on Ctrl-C

src/login.rs

tokio::select! {
    _ = stdin_task => {}
    _ = cancel.cancelled() => {
        // stdin_task continues running
        anyhow::bail!("Cancelled.");
    }
}

The spawned blocking thread is leaked when the cancel branch fires. The process exits immediately so this is harmless today, but stdin_task.abort() before the bail would be cleaner.

8. Missing beforeunload cleanup in preferences page

templates/preferences.html

pollHandle and countdownHandle are not cleared when the user navigates away. Browsers throttle background setInterval, so the practical impact is minor (stale 2-second polls until the tab is closed), but adding window.addEventListener('beforeunload', stopPolling) would be clean.


Passes / Positive Notes

  • RFC 8628 slow_down handling is correctinterval is permanently increased by 5 seconds and the current loop iteration is skipped. ✓
  • expires_at handling is correct — checked at the top of each loop iteration before the sleep. ✓
  • expires_in_secs field naming is consistent between the Rust LoginResponse, PendingLogin, and the JS template. ✓
  • Clipboard fallback is gracefulnavigator.clipboard.writeText() failure is caught and surfaces "Copy failed" rather than silently breaking. ✓
  • Removal of the local TCP listener and browser-popup handlers is clean. ~150 LOC deleted, and the remaining code is simpler. ✓
  • thiserror-typed errors in DeviceFlowError are a clear improvement over stringly-typed error propagation. ✓

Summary

# Severity Location Issue
1 Critical device_flow.rs:~35 access_token should be token — login never completes
2 High handlers/sync.rs Mutex held across HTTP await — blocks all concurrent callers
3 Medium device_flow.rs, login.rs No reqwest timeout — unbounded block on hung server
4 Medium device_flow.rs Cancel not checked during HTTP calls, only during sleep
5–8 Low Various reqwest pooling, CSRF note, stdin_task leak, beforeunload cleanup

Blocker for merge: #1. Bug #2 should also be fixed before merge. Issues #3 and #4 can be tracked as follow-ups if the team accepts the risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docker cookcli not able to signup

1 participant