feat(auth): discover Claude OAuth credentials via fallback chain when CLAUDE_CODE_OAUTH_TOKEN is missing or stale

## Background

Follow-up to #99 (closed). That incident exposed a systemic gap: `askcc` inherits whatever `CLAUDE_CODE_OAUTH_TOKEN` is in the parent shell/systemd environment, performs no validation, and on a stale or missing token surfaces an opaque HTTP 401 from the spawned `claude -p` subprocess. Three days of automated processing were blocked while the actual cause (env-file token had drifted from the canonical token file) was diagnosed by hand.

The downstream consumer (ellen-core) has shipped a workaround that overlays `CLAUDE_CODE_OAUTH_TOKEN` from a canonical token file at every `claude` fan-out point (ellen-core PR#125). That works, but it pushes auth-resolution logic into every caller of askcc instead of solving it once in askcc itself. The current `askcc/runners.py::ClaudeRunner.run()` does only `env = os.environ.copy()` — no awareness that an absent or invalid OAuth token is the most common failure mode in headless deployments.

## Proposal

When `askcc` invokes a Claude runner, resolve `CLAUDE_CODE_OAUTH_TOKEN` via an explicit discovery chain rather than relying on whatever the parent process happened to inherit. Stop on the first source that produces a non-empty token; log which source won at INFO level so failures are diagnosable from the run log alone.

### Discovery chain (in order)

1. **`CLAUDE_CODE_OAUTH_TOKEN` env var** — current behavior; preserves existing setups.
2. **`CLAUDE_OAUTH_TOKEN_FILE` env var** — when set, read the file at that path. Lets operators point askcc at a custom location (also useful for tests).
3. **Conventional headless token file** — `~/.tokens/.claude-oauth-token`. De-facto canonical location used by `~/.bashrc` to populate the env var for interactive shells; reusing it here means cron/systemd sees the same token a developer sees on the command line.
4. **XDG fallback** — `${XDG_CONFIG_HOME:-~/.config}/claude/oauth-token`. Covers operators who prefer XDG-compliant layouts.
5. **`~/.claude/.credentials.json`** — last resort. Parse `claudeAiOauth.accessToken`. Emit a WARNING when this source is used, because the interactive Claude Code app refreshes its access token in RAM and does not always write back, so this file can be months stale even while the live IDE works fine. (See ellen-core's `INTEGRATED_AUTH_DETAILS.md` for the in-memory ↔ on-disk asymmetry.)

If none of the sources produce a non-empty token, fail fast with a clear error message naming each location checked, **before** invoking `claude` — so the user sees `no Claude credentials found in any of: env var, $CLAUDE_OAUTH_TOKEN_FILE, ~/.tokens/.claude-oauth-token, ...` instead of a 401 from a subprocess.

### Edge-case handling (resolved per @monkut's review)

- **Unreadable file (`PermissionError`)** — log a WARNING naming the path and continue to the next source.
- **Malformed `~/.claude/.credentials.json`** — log a WARNING with the parse error and continue.
- **Trailing whitespace in token files** — `.strip()` file contents before treating as non-empty.

### Optional: pre-flight validation (deferred)

A `claude -p "ok"` probe before the real call would catch a stale-but-present token early. Out of scope for the first cut; revisit if the discovery chain alone leaves the stale-token case unhandled.

## Acceptance Criteria

- [ ] **WHERE** `CLAUDE_CODE_OAUTH_TOKEN` is unset or empty, **WHEN** askcc runs any action, **THEN** it shall consult sources 2–5 in order and use the first non-empty token found.
- [ ] **WHERE** a token is loaded from a non-env source, **WHEN** the action runs, **THEN** an INFO log line shall name the source (e.g. `auth: loaded CLAUDE_CODE_OAUTH_TOKEN from ~/.tokens/.claude-oauth-token`).
- [ ] **WHERE** the source is `~/.claude/.credentials.json`, **WHEN** the action runs, **THEN** a WARNING log line shall note that this source can be stale.
- [ ] **WHERE** no source produces a non-empty token, **WHEN** askcc runs any action, **THEN** it shall exit non-zero with an error listing every path/var checked, **without** invoking `claude`.
- [ ] **WHERE** `CLAUDE_CODE_OAUTH_TOKEN` is set in the env, **WHEN** askcc runs, **THEN** behavior is unchanged from today (no regression for working setups).
- [ ] **WHERE** a candidate token file is unreadable (`PermissionError`) or `~/.claude/.credentials.json` is malformed JSON, **WHEN** the chain runs, **THEN** a WARNING is logged naming the path and the chain continues to the next source.
- [ ] **WHERE** a token file's contents include trailing whitespace/newline, **WHEN** read, **THEN** the value is `.strip()`-ed before being treated as non-empty or used.
- [ ] Unit tests cover: env-var present (no fallback used), env-var missing + each fallback source present in isolation, malformed `credentials.json`, unreadable token file, trailing-newline stripping, and the all-sources-empty error path.
- [ ] `uv run pytest`, `uv run poe check`, and `uv run poe typecheck` all pass.

## Dependencies

None identified. The change is self-contained within `askcc/runners.py` and `askcc/cli.py`; uses only stdlib (`os`, `pathlib`, `json`, `logging`).

## Implementation Plan

### Current state

- `askcc/runners.py:88` — `env = os.environ.copy()` is the only env handling in `ClaudeRunner.run()`. No auth awareness.
- `askcc/runners.py:89` — `env.pop("CLAUDECODE", None)` is the only existing pre-subprocess env mutation.
- `askcc/runners.py:101` — `subprocess.run(...)` is the call to the `claude` binary.
- `askcc/cli.py:288–306` — `runner.run(...)` runs inside a `try/finally`; only handles tempfile cleanup, no exception types caught today.
- `tests/test_askcc.py:563` — `TestClaudeRunner` is the existing runner test class (uses `patch("askcc.runners.subprocess.run", ...)`); follow this pattern.
- `tests/test_askcc.py:2037` — `TestClaudeRunnerThinkingOptions` shows the env-mutation test pattern (`mock_run.call_args[1]["env"]`); reuse for asserting the OAuth token was injected.
- Negative grep — no existing references to `CLAUDE_CODE_OAUTH_TOKEN`, `_resolve_oauth_token`, `CLAUDE_OAUTH_TOKEN_FILE`, `claude-oauth-token`, or `credentials.json` anywhere in the repo. This is a green-field add.

### Tasks

1. **Add `OAuthTokenNotFoundError(RuntimeError)` and module-level constants in `askcc/runners.py`** (above `ClaudeRunner`):
   - `OAUTH_TOKEN_ENV = "CLAUDE_CODE_OAUTH_TOKEN"`
   - `OAUTH_TOKEN_FILE_ENV = "CLAUDE_OAUTH_TOKEN_FILE"`
   - `CONVENTIONAL_TOKEN_FILE: Path` → `Path.home() / ".tokens" / ".claude-oauth-token"`
   - `CREDENTIALS_JSON_FILE: Path` → `Path.home() / ".claude" / ".credentials.json"`
   - The XDG path is computed inside `_resolve_oauth_token()` so `XDG_CONFIG_HOME` is read at call time, not import time.
2. **Add `_resolve_oauth_token() -> tuple[str, str]` helper in `askcc/runners.py`.**
   - Returns `(token, source_label)` where `source_label` is human-readable (`"env CLAUDE_CODE_OAUTH_TOKEN"`, `"file ~/.tokens/.claude-oauth-token"`, etc.).
   - Walks the chain in order. For each file source: `FileNotFoundError` → silent (source absent); `PermissionError` and `json.JSONDecodeError` (or missing keys) → `logger.warning` and continue.
   - For each non-empty file content, `.strip()` before checking truthiness.
   - For `credentials.json`, parse JSON and read `data["claudeAiOauth"]["accessToken"]`; missing keys treated as parse failure (warn + continue). Add a one-line comment naming the observed schema so future schema drift is detectable.
   - Raises `OAuthTokenNotFoundError` listing every path/env var checked when no source yields a token.
3. **Wire `_resolve_oauth_token()` into `ClaudeRunner.run()` at `askcc/runners.py:88–89`.**
   - Immediately after `env.pop("CLAUDECODE", None)`:
     - `token, source = _resolve_oauth_token()`
     - `env[OAUTH_TOKEN_ENV] = token`
     - When `source` is the env var itself → log nothing (no-op for working installs).
     - When `source` is any other → `logger.info("[%s] auth: loaded %s from %s", issue_url, OAUTH_TOKEN_ENV, source)`.
     - When `source` is the credentials.json fallback → also `logger.warning(...)` about staleness.
4. **Catch `OAuthTokenNotFoundError` in `askcc/cli.py:main()`.**
   - Wrap `runner.run(...)` (currently `cli.py:288–298`) so the existing `finally:` tempfile cleanup still runs:
     - `except OAuthTokenNotFoundError as e: logger.error(str(e)); sys.exit(1)`
   - Import the exception from `.runners`.
5. **Add a `TestResolveOAuthToken` class in `tests/test_askcc.py`** covering:
   - `test_env_var_present_no_fallback_used` — env set, no file IO, source label is the env var.
   - `test_token_file_env_var_used_when_main_env_missing` — `CLAUDE_OAUTH_TOKEN_FILE` → tmp file with token.
   - `test_conventional_path_used_when_envs_missing` — monkeypatch `CONVENTIONAL_TOKEN_FILE` to a `tmp_path`.
   - `test_xdg_path_used` — set `XDG_CONFIG_HOME` to a `tmp_path` containing `claude/oauth-token`.
   - `test_credentials_json_used_with_warning` — patch `CREDENTIALS_JSON_FILE` to a tmp file `{"claudeAiOauth":{"accessToken":"abc"}}`; assert WARNING in `caplog`.
   - `test_trailing_newline_stripped` — token file containing `"abc\n"` → returned token is `"abc"`.
   - `test_unreadable_file_warns_and_continues` — patch `Path.read_text` to raise `PermissionError`; chain continues; WARNING logged.
   - `test_malformed_credentials_json_warns_and_continues` — file contains `"{not json"`; raises `OAuthTokenNotFoundError`; WARNING logged.
   - `test_empty_token_file_falls_through` — file contains only whitespace; treated as absent.
   - `test_all_sources_empty_raises_with_paths` — assert error message names every checked location.
6. **Extend `TestClaudeRunner` (or add an integration test class)** in `tests/test_askcc.py`:
   - `test_runner_injects_resolved_token_into_subprocess_env` — patch `_resolve_oauth_token` to return `("xyz", "...")`; assert `mock_run.call_args[1]["env"][OAUTH_TOKEN_ENV] == "xyz"`.
   - `test_runner_logs_source_for_non_env_resolution` — assert INFO log line includes the source label.
   - `test_runner_logs_warning_when_credentials_json_used` — assert WARNING about staleness.
7. **Add CLI exit-path test in `tests/test_askcc.py`:**
   - `test_main_exits_nonzero_when_oauth_resolution_fails` — patch resolver to raise `OAuthTokenNotFoundError`; assert `SystemExit(1)` and the error message hits `caplog`.
8. **Update `README.md`:**
   - Add a row to the env-vars table (~line 96) for `CLAUDE_OAUTH_TOKEN_FILE`.
   - Add a short "Authentication" subsection documenting the discovery chain order and the `credentials.json` staleness caveat.
9. **Verification gate (mandatory before opening the PR):**
   - `uv run pytest -v` — all green
   - `uv run poe check` — ruff clean
   - `uv run poe typecheck` — pyright clean

### Risks / open questions

- **`~/.claude/.credentials.json` schema is undocumented.** If Claude Code changes the key path the fallback silently stops working. Mitigation: in-line comment naming the observed schema (`{"claudeAiOauth": {"accessToken": "..."}}`); the WARNING already covers staleness.
- **Pre-flight validation deliberately deferred.** First cut closes the known-incident root cause (silent env drift); a stale-but-present token will still 401 from the subprocess. If that recurs, follow-up issue should add a `claude -p "ok"` probe.
- **No `ANTHROPIC_API_KEY` handling.** Per the issue's "Out of scope" — separate auth path with different semantics.
- **No token rotation/refresh.** askcc consumes tokens, never mints them.

## Out of scope

- Token refresh / rotation logic (askcc should consume tokens, not mint them).
- Any handling of `ANTHROPIC_API_KEY` — separate auth path with different semantics; can be a follow-up if there's demand.
- Changing how Claude Code itself resolves credentials — this is purely about askcc's resolution before it spawns the subprocess.

## Rationale

- **Single point of resolution.** Today, every askcc consumer that runs in a headless context (cron, systemd, hooks) needs its own overlay logic. Solving it inside askcc removes that burden from every downstream caller.
- **Better failure messages.** A 401 from `claude -p` is several layers removed from the actual problem (env-file drift, expired token, missing config). A discovery chain with explicit logging means the runner log itself names the source on every successful run and the missing locations on failure.
- **Self-healing across token rotation.** When the operator writes a fresh token to `~/.tokens/.claude-oauth-token` (the canonical headless location), askcc picks it up automatically without an env-file resync or daemon-reload.
- **No-op for existing working installs.** Source #1 is the current behavior. Operators who already export `CLAUDE_CODE_OAUTH_TOKEN` see no change.

## References

- monkut/askcc-cli#99 — the original 2026-04-29/30 incident
- ellen-core PR#125 — the downstream workaround that this issue would let us retire
- ellen-core `INTEGRATED_AUTH_DETAILS.md` — the three-locations-of-truth runbook that motivated the discovery chain


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(auth): discover Claude OAuth credentials via fallback chain when CLAUDE_CODE_OAUTH_TOKEN is missing or stale #102

Background

Proposal

Discovery chain (in order)

Edge-case handling (resolved per @monkut's review)

Optional: pre-flight validation (deferred)

Acceptance Criteria

Dependencies

Implementation Plan

Current state

Tasks

Risks / open questions

Out of scope

Rationale

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(auth): discover Claude OAuth credentials via fallback chain when CLAUDE_CODE_OAUTH_TOKEN is missing or stale #102

Description

Background

Proposal

Discovery chain (in order)

Edge-case handling (resolved per @monkut's review)

Optional: pre-flight validation (deferred)

Acceptance Criteria

Dependencies

Implementation Plan

Current state

Tasks

Risks / open questions

Out of scope

Rationale

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions