HF bucket privacy + coordination correctness + docs overhaul#9
Merged
Conversation
Adds a 'format' job to the tests workflow that runs 'ruff format --check sdk/' on every push/PR, so style drift fails CI instead of reaching review. To make the gate green, ran 'ruff format' across sdk/ (15 files reformatted, whitespace/line-wrap only — no logic changes). Verified the full suite still passes (66 tests). Also lands sdk/tests/test_handoff_v2.py — the 9-test suite for the 0.2.1 structured handoff (complete/blocked/needs_review state, next_action/--to, git-derived changed_files, mutual-exclusion, no mandatory assumptions field). The fixture chdirs to an isolated dir and writes config to both the CWD-local and HOME paths, so a stray ./.tracecraft.json can't shadow it (this was making the tests hit a real endpoint and fail). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ault ensure_bucket was a no-op, so 'init --backend hf' against a brand-new bucket failed cryptically on the first write. Now creates the bucket with HfApi.create_bucket(exist_ok=True), private unless the caller opts out — HF buckets default to public upstream, which is the wrong default for internal coordination data and mirrored transcripts. Fixes #7. Refs #8 (creation half). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Passed through to the backend's ensure_bucket via the store factory. Plain S3 ignores it (bucket ACLs are out of scope there); the HF backend uses it to decide bucket visibility at creation time. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The init line now reads 'Backend: HuggingFace Buckets Bucket: user/x (private)' with the state read back from the Hub, not assumed from the flag — create_bucket(exist_ok=True) keeps a pre-existing bucket's visibility, so flag and reality can disagree. Also surface the best-effort-claims caveat for the HF backend at init time. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
If the bucket pre-exists as PUBLIC and the user did not pass --public, init now spells out that coordination data and mirrored transcripts will be publicly visible, and that huggingface_hub has no update_bucket — delete + recreate as private is the only remedy. Fixes #8. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A put against a nonexistent bucket used to surface HfFileSystem's raw 'repository and revision' resolution error. Now the error names the bucket, points at 'tracecraft init', and suggests checking the 'username/bucket-name' handle. Applies to put_json and put_file. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
exists() caught every exception and returned False, so a bad or under-scoped token looked identical to an empty bucket — and let the best-effort claim path sail past its check-then-write guard. 401/403 now raise with a pointer at the token; genuine not-found stays False. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Message keys were messages/<recipient>/<int_seconds>_<sender>.json, so two messages from one sender inside the same second collided and the later silently overwrote the earlier. Keys now carry nanosecond resolution plus a uuid4 suffix, so every send is a distinct object. inbox now merges direct + broadcast messages and prints them in sent_at order instead of raw list order, which interleaved the two prefixes arbitrarily. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Any agent could previously complete any step, including one another agent was actively working on. complete now reads claim.json first and fails with a clear error when the claim belongs to a different agent, unless --force is passed (escape hatch for crashed claim-holders). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
claim.json (atomic) and status.json are two separate writes; a crash between them leaves a claim with no status. step-status and wait-for now resolve that state as in_progress by the claiming agent via a shared _effective_status helper — the claim is the authoritative write. Invariant documented in CLAUDE.md. Also warn at claim time on the HF backend that claims are best-effort (no conditional-write upstream), matching the note init prints. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A blocked step never completes on its own, so waiters used to spin until the full timeout. wait-for now exits non-zero immediately with a clear message naming the blocked step. needs_review still counts as waiting but is called out in the progress line so a human can step in. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Zero imports of either anywhere in the package or tests — leftovers from the pre-pivot FastAPI scaffolding. Runtime deps are now just click + boto3. Also remove two unused imports in test_session_cli.py (pre-existing ruff check failures). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
moto-backed (no network): - claim race: two claimers, exactly one wins, holder preserved - same-second message bursts keep every message (uuid-suffixed keys) - inbox merges direct + broadcast chronologically by sent_at - complete rejects a non-owner without --force, allows with it - wait-for fails fast on blocked, names needs_review while waiting - claim.json-without-status.json reads as in_progress (crash window) HF backend tests mock HfApi/HfFileSystem in-memory: private-by-default creation, --public opt-out, real visibility readback, the existing-public-bucket warning, actionable missing-bucket write errors, and exists() raising on 401/403 instead of returning False. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replaces test.yml. Lint job runs ruff check (new — lint errors were previously uncaught) plus the existing ruff format --check; pytest runs on Python 3.10 and 3.12 with dev+huggingface extras, on push and PR. README badge updated to point at the new workflow. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Same-second message collisions and the empty test suite are fixed; claim TTL and heartbeat refresh stay open (need design decisions). Bucket-layout sketch updated to the new message key shape. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Retitle the section and spell out the four harnesses, incremental cursor uploads (safe to re-run on a cron; seq derived from the bucket), default secret redaction with per-pattern counts in meta.json, and replay via 'session show --tail'. Session commands were already in the CLI reference block. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Inline --access-key/--secret-key flags leak into shell history; the init command already reads AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY, so show that path. Note that .tracecraft.json is written chmod 600 and auto-added to .gitignore. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Private by default at creation, --public opt-out, real visibility shown in init output, and the delete+recreate caveat (no update_bucket upstream). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Positions tracecraft against in-process frameworks, server-backed stores, and live wire protocols; and is honest about pre-alpha status: no claim TTL, heartbeat not refreshed after init, HF claims best-effort. Links to open issues. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The CLI is the stable interface; get_store() is the documented escape hatch for direct bucket access from Python. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
docker-compose.dev.yml drops postgres, redis, and seaweedfs — leftovers from the pre-pivot server design; the shipped CLI needs exactly one S3-compatible bucket. .env.example rewritten to the variables the code actually reads (AWS creds, HF_TOKEN, TRACECRAFT_AGENT, harness path overrides) instead of JWT/UI/database/monitoring leftovers. Quick start references the compose file. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…+d8a9fa7; trees verified identical)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three-part pass over the SDK: HF backend onboarding/privacy (closes #7, closes #8), coordination correctness bugs, and a docs/dev-env cleanup. One commit per change, in review order.
HF backend (#7, #8)
ensure_bucket()actually creates the bucket viaHfApi.create_bucket(exist_ok=True)— private by default,--publicto opt out (init --backend hfagainst a fresh bucket no longer dies cryptically on first write)initprints the bucket's real visibility read back frombucket_info(), and warns loudly when a pre-existing bucket is PUBLIC (only remedy is delete + recreate — noupdate_bucketupstream)tracecraft init, instead of the raw "repository and revision" errorexists()no longer swallows 401/403 asFalseCoordination correctness
inboxmerges direct + broadcast chronologically bysent_atcompleterejects a non-owner of a claimed step unless--forceclaim.json(atomic) +status.jsonare two writes; readers now treat claim-without-status as in_progress by the claiming agent (invariant documented in CLAUDE.md)wait-forfails fast on blocked steps instead of spinning to timeout; needs_review steps are named in the progress lineHygiene & docs
httpx+pydanticdeps (runtime is now click + boto3)ci.yml:ruff check(new) + format check + pytest on 3.10/3.12docker-compose.dev.ymlis MinIO-only;.env.examplelists only variables the code readsTest plan
pytest sdk/tests/— 91 passed (moto-backed S3 + mocked HfApi/HfFileSystem, no network)ruff check+ruff format --checkcleandocker compose -f docker-compose.dev.yml config -qvalidates🤖 Generated with Claude Code