Skip to content

HF bucket privacy + coordination correctness + docs overhaul#9

Merged
Arrmlet merged 23 commits into
mainfrom
handoff-v0.2.1
Jun 9, 2026
Merged

HF bucket privacy + coordination correctness + docs overhaul#9
Arrmlet merged 23 commits into
mainfrom
handoff-v0.2.1

Conversation

@Arrmlet

@Arrmlet Arrmlet commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Summary

Three-part pass over the SDK: HF backend onboarding/privacy (closes #7, closes #8), coordination correctness bugs, and a docs/dev-env cleanup. One commit per change, in review order.

HF backend (#7, #8)

  • ensure_bucket() actually creates the bucket via HfApi.create_bucket(exist_ok=True)private by default, --public to opt out (init --backend hf against a fresh bucket no longer dies cryptically on first write)
  • init prints the bucket's real visibility read back from bucket_info(), and warns loudly when a pre-existing bucket is PUBLIC (only remedy is delete + recreate — no update_bucket upstream)
  • Writes against a missing bucket raise an actionable error naming the bucket and pointing at tracecraft init, instead of the raw "repository and revision" error
  • exists() no longer swallows 401/403 as False

Coordination correctness

  • Message loss fixed: keys were second-resolution, so same-second sends from one sender silently overwrote each other; keys now carry ns-timestamp + uuid suffix. inbox merges direct + broadcast chronologically by sent_at
  • Ownership: complete rejects a non-owner of a claimed step unless --force
  • Crash window: claim.json (atomic) + status.json are two writes; readers now treat claim-without-status as in_progress by the claiming agent (invariant documented in CLAUDE.md)
  • wait-for fails fast on blocked steps instead of spinning to timeout; needs_review steps are named in the progress line

Hygiene & docs

  • Dropped unused httpx + pydantic deps (runtime is now click + boto3)
  • New tests: claim race, message bursts, inbox ordering, ownership, blocked fast-fail, crash window, and fully mocked HF onboarding (no network) — 91 passing
  • CI consolidated into ci.yml: ruff check (new) + format check + pytest on 3.10/3.12
  • README: session mirroring section, env-var credentials in Quick start, HF privacy docs, "Why not LangGraph / Redis / queues?", honest "Status & limitations", Python API snippet
  • docker-compose.dev.yml is MinIO-only; .env.example lists only variables the code reads

Test plan

  • pytest sdk/tests/ — 91 passed (moto-backed S3 + mocked HfApi/HfFileSystem, no network)
  • ruff check + ruff format --check clean
  • docker compose -f docker-compose.dev.yml config -q validates

🤖 Generated with Claude Code

Arrmlet and others added 23 commits June 6, 2026 16:31
Adds a 'format' job to the tests workflow that runs 'ruff format --check sdk/'
on every push/PR, so style drift fails CI instead of reaching review.

To make the gate green, ran 'ruff format' across sdk/ (15 files reformatted,
whitespace/line-wrap only — no logic changes). Verified the full suite still
passes (66 tests).

Also lands sdk/tests/test_handoff_v2.py — the 9-test suite for the 0.2.1
structured handoff (complete/blocked/needs_review state, next_action/--to,
git-derived changed_files, mutual-exclusion, no mandatory assumptions field).
The fixture chdirs to an isolated dir and writes config to both the CWD-local
and HOME paths, so a stray ./.tracecraft.json can't shadow it (this was making
the tests hit a real endpoint and fail).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ault

ensure_bucket was a no-op, so 'init --backend hf' against a brand-new
bucket failed cryptically on the first write. Now creates the bucket
with HfApi.create_bucket(exist_ok=True), private unless the caller
opts out — HF buckets default to public upstream, which is the wrong
default for internal coordination data and mirrored transcripts.

Fixes #7. Refs #8 (creation half).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Passed through to the backend's ensure_bucket via the store factory.
Plain S3 ignores it (bucket ACLs are out of scope there); the HF
backend uses it to decide bucket visibility at creation time.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The init line now reads 'Backend: HuggingFace Buckets  Bucket: user/x
(private)' with the state read back from the Hub, not assumed from the
flag — create_bucket(exist_ok=True) keeps a pre-existing bucket's
visibility, so flag and reality can disagree. Also surface the
best-effort-claims caveat for the HF backend at init time.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
If the bucket pre-exists as PUBLIC and the user did not pass --public,
init now spells out that coordination data and mirrored transcripts
will be publicly visible, and that huggingface_hub has no
update_bucket — delete + recreate as private is the only remedy.

Fixes #8.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A put against a nonexistent bucket used to surface HfFileSystem's raw
'repository and revision' resolution error. Now the error names the
bucket, points at 'tracecraft init', and suggests checking the
'username/bucket-name' handle. Applies to put_json and put_file.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
exists() caught every exception and returned False, so a bad or
under-scoped token looked identical to an empty bucket — and let the
best-effort claim path sail past its check-then-write guard. 401/403
now raise with a pointer at the token; genuine not-found stays False.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Message keys were messages/<recipient>/<int_seconds>_<sender>.json, so
two messages from one sender inside the same second collided and the
later silently overwrote the earlier. Keys now carry nanosecond
resolution plus a uuid4 suffix, so every send is a distinct object.

inbox now merges direct + broadcast messages and prints them in
sent_at order instead of raw list order, which interleaved the two
prefixes arbitrarily.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Any agent could previously complete any step, including one another
agent was actively working on. complete now reads claim.json first and
fails with a clear error when the claim belongs to a different agent,
unless --force is passed (escape hatch for crashed claim-holders).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
claim.json (atomic) and status.json are two separate writes; a crash
between them leaves a claim with no status. step-status and wait-for
now resolve that state as in_progress by the claiming agent via a
shared _effective_status helper — the claim is the authoritative
write. Invariant documented in CLAUDE.md.

Also warn at claim time on the HF backend that claims are best-effort
(no conditional-write upstream), matching the note init prints.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A blocked step never completes on its own, so waiters used to spin
until the full timeout. wait-for now exits non-zero immediately with
a clear message naming the blocked step. needs_review still counts as
waiting but is called out in the progress line so a human can step in.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Zero imports of either anywhere in the package or tests — leftovers
from the pre-pivot FastAPI scaffolding. Runtime deps are now just
click + boto3. Also remove two unused imports in test_session_cli.py
(pre-existing ruff check failures).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
moto-backed (no network):
- claim race: two claimers, exactly one wins, holder preserved
- same-second message bursts keep every message (uuid-suffixed keys)
- inbox merges direct + broadcast chronologically by sent_at
- complete rejects a non-owner without --force, allows with it
- wait-for fails fast on blocked, names needs_review while waiting
- claim.json-without-status.json reads as in_progress (crash window)

HF backend tests mock HfApi/HfFileSystem in-memory: private-by-default
creation, --public opt-out, real visibility readback, the
existing-public-bucket warning, actionable missing-bucket write errors,
and exists() raising on 401/403 instead of returning False.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replaces test.yml. Lint job runs ruff check (new — lint errors were
previously uncaught) plus the existing ruff format --check; pytest runs
on Python 3.10 and 3.12 with dev+huggingface extras, on push and PR.
README badge updated to point at the new workflow.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Same-second message collisions and the empty test suite are fixed;
claim TTL and heartbeat refresh stay open (need design decisions).
Bucket-layout sketch updated to the new message key shape.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Retitle the section and spell out the four harnesses, incremental
cursor uploads (safe to re-run on a cron; seq derived from the bucket),
default secret redaction with per-pattern counts in meta.json, and
replay via 'session show --tail'. Session commands were already in the
CLI reference block.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Inline --access-key/--secret-key flags leak into shell history; the
init command already reads AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY,
so show that path. Note that .tracecraft.json is written chmod 600 and
auto-added to .gitignore.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Private by default at creation, --public opt-out, real visibility
shown in init output, and the delete+recreate caveat (no update_bucket
upstream).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Positions tracecraft against in-process frameworks, server-backed
stores, and live wire protocols; and is honest about pre-alpha status:
no claim TTL, heartbeat not refreshed after init, HF claims
best-effort. Links to open issues.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The CLI is the stable interface; get_store() is the documented escape
hatch for direct bucket access from Python.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
docker-compose.dev.yml drops postgres, redis, and seaweedfs — leftovers
from the pre-pivot server design; the shipped CLI needs exactly one
S3-compatible bucket. .env.example rewritten to the variables the code
actually reads (AWS creds, HF_TOKEN, TRACECRAFT_AGENT, harness path
overrides) instead of JWT/UI/database/monitoring leftovers. Quick start
references the compose file.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Arrmlet Arrmlet merged commit 9db5910 into main Jun 9, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant