diff --git a/README.md b/README.md
index 71d2d7a..7be3945 100644
--- a/README.md
+++ b/README.md
@@ -1,32 +1,19 @@
# tracecraft
[](https://pypi.org/project/tracecraft-ai/)
+[](https://pypi.org/project/tracecraft-ai/)
[](https://opensource.org/licenses/MIT)
+[](https://github.com/Arrmlet/tracecraft/actions/workflows/test.yml)
-Persistent shared memory and coordination layer for AI agents. Any agent can store, share, and retrieve data from the same bucket — memory, messages, tasks, and artifacts. Works with any S3 or HuggingFace bucket.
+**Tracecraft is a CLI coordination layer for multi-agent AI systems** — shared **memory**, a **mailbox**, atomic task **claims**, **handoffs**, and **artifacts**, plus mirrored **session transcripts**, all stored as plain JSON in any **S3** or **HuggingFace** bucket. No server. No database. No SDK lock-in.
-```
- Agent 1 (designer) Agent 2 (developer)
- ┌──────────────────────┐ ┌──────────────────────┐
- │ tracecraft claim │ │ tracecraft wait-for │
- │ design │ │ design │
- │ │ │ ...waiting... │
- │ tracecraft complete │ ──────> │ │
- │ design --note "done"│ │ ✓ design complete │
- │ │ │ │
- │ │ <────── │ tracecraft send │
- │ │ │ designer "starting" │
- └──────────────────────┘ └──────────────────────┘
- \ /
- \ /
- ┌──────────────────────┐
- │ Any S3 bucket │
- │ (MinIO, AWS, R2, │
- │ HuggingFace) │
- └──────────────────────┘
-```
+
+
+
+
+> Two agents, one bucket — they can't grab the same work, enforced by an S3 conditional write. No server, no lock service. All state is plain JSON you own; open it in the MinIO console or [HuggingFace Hub](https://huggingface.co/buckets/arrmlet/tracecraft-test) and watch it live.
-
+---
## Quick start
@@ -34,108 +21,145 @@ Persistent shared memory and coordination layer for AI agents. Any agent can sto
pip install tracecraft-ai
```
-Start MinIO locally (or use AWS S3, Cloudflare R2, HuggingFace Buckets):
+The only infra is a bucket. For local dev, run MinIO (in production, point at AWS / R2 / HF instead):
+
```bash
-docker run -d -p 9000:9000 -e MINIO_ROOT_USER=admin -e MINIO_ROOT_PASSWORD=admin123456 minio/minio server /data
+docker run -d -p 9000:9000 \
+ -e MINIO_ROOT_USER=admin -e MINIO_ROOT_PASSWORD=admin123456 \
+ minio/minio server /data
```
-Initialize two agents:
+Register two agents against the same project:
+
```bash
# Terminal 1
-tracecraft init --project myproject --agent designer \
+tracecraft init --project demo --agent designer \
--endpoint http://localhost:9000 --bucket tracecraft \
--access-key admin --secret-key admin123456
-# Terminal 2
-tracecraft init --project myproject --agent developer \
+# Terminal 2 — same flags, --agent developer
+tracecraft init --project demo --agent developer \
--endpoint http://localhost:9000 --bucket tracecraft \
--access-key admin --secret-key admin123456
```
-Now they can coordinate:
-```bash
-# Designer claims a task and shares state
+Now the core move — **two agents cannot grab the same work**, with no lock service and no server to run:
+
+```console
+# Terminal 1 — designer claims the task
$ tracecraft claim design
Claimed step design as designer
-$ tracecraft memory set design.status "complete"
-Set design.status = complete
+# Terminal 2 — developer tries the SAME task, atomically rejected (S3 If-None-Match)
+$ tracecraft claim design
+Error: Step design already claimed by designer
-$ tracecraft send developer "Design is ready"
-Sent to developer: Design is ready
+# designer finishes and leaves a handoff note for whoever picks up next
+$ tracecraft complete design --note "API in api.py, see memory key design.contract"
+Completed step design
-# Developer checks messages and picks it up
-$ tracecraft inbox
-[2026-03-24T14:00:00Z] (direct) designer: Design is ready
+# developer was blocked on it — now it unblocks
+$ tracecraft wait-for design
+All steps complete: design
+```
-$ tracecraft memory get design.status
-complete
+Every call is stateless. Everything you just did is JSON files in the bucket — no server stayed running, nothing to tear down.
-$ tracecraft claim implementation
-Claimed step implementation as developer
-```
+---
+
+## Agents talk to each other
+
+Beyond claiming work, agents coordinate by messaging through the bucket — direct messages and broadcasts, each one a JSON file in a per-agent mailbox.
-Everything is stored as JSON files in S3. No servers. No databases.
+
+
+
+
+```bash
+tracecraft send developer "contract is in memory key design.contract"
+tracecraft inbox # read your direct + broadcast messages
+tracecraft send _broadcast "v1 cut at 3pm, wrap your tasks"
+```
---
-## What agents get
+## Why tracecraft
-- **Shared memory** — `tracecraft memory set/get/list` — persistent key-value state any agent can read/write
-- **Messaging** — `tracecraft send/inbox` — direct messages or broadcast to all agents
-- **Task claiming** — `tracecraft claim/complete` — claim steps so agents don't collide
-- **Barriers** — `tracecraft wait-for step1 step2` — block until dependencies complete
-- **Handoffs** — `tracecraft complete step --note "context for next agent"`
-- **Artifacts** — `tracecraft artifact upload/download/list` — share files between agents
-- **Agent registry** — `tracecraft agents` — see who's online and what they're working on
+- **Atomic task claims** — two agents never grab the same work, enforced by S3 `If-None-Match` conditional puts, with no central coordinator.
+- **Coordinate across hosts** — the bucket *is* the coordinator, so agents on different machines or clouds work together by default — not just processes sharing one laptop.
+- **No server, no database** — every CLI call is stateless; all state is JSON in a bucket you already own.
+- **Any backend, zero lock-in** — AWS, Cloudflare R2, MinIO, Backblaze B2, Wasabi, SeaweedFS, and HuggingFace Buckets all work today.
+- **Harness-agnostic** — Claude Code, Codex, OpenClaw, Hermes, bash, Python, or anything that can run a shell command.
+- **Coordination + reasoning together** — the events *and* each agent's full session transcript live in one bucket, not two systems.
-Works with any process that can call a CLI — Claude Code, OpenClaw, Hermes Agent, Codex, bash scripts, Python, anything.
+> Frameworks like CrewAI and LangGraph own the agent loop; memory layers like Mem0 store one agent's recall; in-process coordination tools assume every agent shares one machine. Tracecraft owns neither the loop nor the model — just the shared bucket the agents coordinate *through* — so it works across hosts, across clouds, and with any harness, via a plain CLI.
---
-## Storage backends
+## Coordination + reasoning in one bucket
-No vendor lock-in. Bring your own S3:
+Most coordination tools store the *events* — who claimed what, who messaged whom. Tracecraft stores those **and** each agent's full reasoning, by mirroring coding-agent session transcripts into the same bucket. When a run goes sideways, one `tracecraft session show` gives you the handoffs **and** the chain of thought behind them — same place, same JSON, no second system to wire up.
```bash
-# Local development (recommended to start)
-tracecraft init --endpoint http://localhost:9000 ... # MinIO
-tracecraft init --endpoint http://localhost:8333 ... # SeaweedFS
+tracecraft session mirror --harness claude-code # tail this session into the bucket
+tracecraft session show --tail 50 # read coordination + reasoning together
+```
-# Production
-tracecraft init --endpoint https://s3.amazonaws.com ... # AWS S3
-tracecraft init --endpoint https://xxx.r2.cloudflarestorage.com ... # Cloudflare R2
+Works with **Claude Code, Codex, OpenClaw, and Hermes**. Source transcripts are never modified; secret-shape redaction (AWS / Anthropic / OpenAI / HF / GitHub / Slack token patterns) is on by default and counted in metadata.
-# HuggingFace Buckets (browsable on the Hub)
-pip install tracecraft-ai[huggingface]
-tracecraft init --backend hf --bucket username/my-bucket ...
-```
+Harness matrix, storage formats, and redaction details → **[docs/session-mirror.md](docs/session-mirror.md)**
---
## How it works
-All coordination state is JSON files in S3:
+Every agent action is a JSON file under `//`:
```
-s3://bucket/project/
- agents/designer.json ← who's alive, what they're doing
- memory/design/status.json ← shared key-value state
- messages/developer/1234.json ← agent inboxes
- steps/design/claim.json ← who claimed what
- steps/design/status.json ← pending → in_progress → complete
- steps/design/handoff.json ← notes for the next agent
- artifacts/design/mockup.html ← shared files
+s3://bucket/demo/
+ agents/designer.json ← who's alive, what they're doing
+ memory/design/contract.json ← shared key-value state
+ messages/developer/1738f3_designer.json ← per-agent mailbox
+ steps/design/claim.json ← who claimed what (atomic)
+ steps/design/status.json ← pending → in_progress → complete
+ steps/design/handoff.json ← note for the next agent
+ artifacts/design/mockup.html ← shared files
+ sessions/claude-code//part-00000-….jsonl ← mirrored agent transcript
+ sessions/claude-code//meta.json ← cumulative session metadata
```
-Any agent that can call `tracecraft` can participate. Any S3 browser (MinIO console, AWS console, HuggingFace Hub) lets you watch agents coordinate in real-time.
+Any process that can call `tracecraft` participates. Any S3 browser (MinIO console, AWS console, HuggingFace Hub) lets you watch agents coordinate in real time. Atomicity details and the HuggingFace fallback are in **[docs/s3-architecture.md](docs/s3-architecture.md)**.
+
+---
+
+## Backends
+
+Bring your own bucket — no vendor lock-in:
+
+| Backend | `init` flag | Notes |
+|---|---|---|
+| MinIO | `--endpoint http://localhost:9000` | recommended for local dev |
+| SeaweedFS | `--endpoint http://localhost:8333` | self-hosted |
+| AWS S3 | `--endpoint https://s3.amazonaws.com` | |
+| Cloudflare R2 | `--endpoint https://.r2.cloudflarestorage.com` | zero egress fees |
+| Backblaze B2 / Wasabi | S3-compatible endpoint | |
+| HuggingFace Buckets | `--backend hf --bucket user/name` | browsable on the Hub; `pip install tracecraft-ai[huggingface]` |
---
-## CLI reference
+## Use cases
+
+- **Multi-agent coding** — run several Claude Code / Codex agents in parallel; they claim modules, share artifacts, wait at barriers, and hand off context instead of stepping on each other.
+- **Autonomous research** — agents claim experiments, share results via memory, and avoid duplicating work across a fleet.
+- **Pipelines** — lint → test → build → deploy as claimed steps; each stage waits for its dependencies.
+
+---
+
+
+Full CLI reference
```bash
-tracecraft init # Configure S3 + project + agent
+tracecraft init # Configure backend + project + agent
tracecraft agents # Who's online?
tracecraft memory set # Write (dots become path separators)
@@ -147,50 +171,37 @@ tracecraft send _broadcast # Broadcast to all
tracecraft inbox # Read messages
tracecraft inbox --delete # Read and clear
-tracecraft claim # Claim a step
-tracecraft complete [--note X] # Mark done + handoff
+tracecraft claim # Claim a step (atomic)
+tracecraft complete [--note X] # Mark done + handoff note
tracecraft step-status # Check status
tracecraft wait-for # Block until complete (default 300s timeout)
-tracecraft artifact upload [--step id] # Share a file
-tracecraft artifact download [--step id] # Get a file
+tracecraft artifact upload [--step id] # Share a file
+tracecraft artifact download [--step id] # Get a file
tracecraft artifact list [--step id] # List files
+
+tracecraft session mirror --harness # Mirror a session into the bucket
+tracecraft session list # Browse mirrored sessions
+tracecraft session show [--tail N] # Inspect meta + transcript tail
+tracecraft session stop # Clear local state, mark ended
```
-For multiple agents in the same directory, set identity via env var:
+Run multiple agents from one directory by overriding identity per call:
+
```bash
-TRACECRAFT_AGENT=designer tracecraft inbox
+TRACECRAFT_AGENT=designer tracecraft inbox
TRACECRAFT_AGENT=developer tracecraft inbox
```
----
-
-## Use cases
-
-**Multi-agent coding** — Run 4 Claude Code agents in worktrees. They claim modules, share artifacts, wait at barriers, hand off context.
-
-**Autonomous research** — Run hundreds of autoresearch experiments. Agents claim experiments, share results via memory, avoid duplicating work.
-
-**Collaborative knowledge bases** — Multiple agents build a wiki together. One processes papers, another writes summaries, a third checks consistency. All coordinated through shared memory and messaging.
-
-**CI/CD pipelines** — Lint → test → build → deploy as tracecraft steps. Each stage claims its step and waits for dependencies.
-
----
-
-## Example coordination
-
-Two Claude Code agents coordinating through tracecraft via HuggingFace Buckets:
-
-
-
-> See full coordination data (agents, memory, messages, steps, artifacts) stored as JSON on the Hub:
-> [huggingface.co/buckets/arrmlet/tracecraft-test](https://huggingface.co/buckets/arrmlet/tracecraft-test)
+
---
-## Works with
+## More
-Tested with Claude Code, OpenAI Codex, and Hermes Agent. Works with any agent or script that can run a shell command.
+- [docs/session-mirror.md](docs/session-mirror.md) — session mirroring: harnesses, formats, redaction
+- [docs/s3-architecture.md](docs/s3-architecture.md) — atomicity, key layout, HuggingFace fallback
+- [plans/](plans/) — roadmap, research, and known gaps
---
diff --git a/docs/CI_CD_GUIDE.md b/docs/CI_CD_GUIDE.md
deleted file mode 100644
index e9092bf..0000000
--- a/docs/CI_CD_GUIDE.md
+++ /dev/null
@@ -1,429 +0,0 @@
-# CI/CD for tracecraft — a learning doc
-
-This is a working-engineer's introduction to CI/CD, grounded in the two workflows tracecraft uses today (`.github/workflows/test.yml` and `release.yml`). By the end you'll understand every line of YAML in this repo, the trust model that lets GitHub publish to PyPI without a stored password, and how to debug failures.
-
-Written for someone who knows Python and git but hasn't owned a CI/CD pipeline before. Skip sections you already know.
-
----
-
-## Part 1 — The concepts, in 5 minutes
-
-### What CI/CD actually is
-
-Two related-but-distinct things:
-
-- **Continuous Integration (CI)** — every time someone pushes code or opens a PR, a fresh computer (a "runner") checks the code out, installs dependencies, and runs your tests. If anything is broken, you see it within seconds on the commit/PR. The point is to catch breakage *before* it merges to `main`, not after.
-
-- **Continuous Delivery (CD)** — when you mark a commit as a release (cut a tag, click "Publish release"), a fresh computer builds the shippable artifact (a Python wheel, a Docker image, a binary) and uploads it to wherever users get it (PyPI, Docker Hub, App Store). The point is to make releases boring and repeatable — no "did I remember to bump the version in both files?" mistakes.
-
-Sometimes people add **Continuous Deployment** (same acronym, different word) — automatically pushing every green commit to production. Tracecraft has no servers, so that doesn't apply here.
-
-### Why it exists
-
-Before CI/CD, releases were a checklist a human did by hand. Six steps, easy to skip one, easy to miss "this works on my machine" bugs. CI runs the checklist in a known-clean environment, every time, and fails loudly when it can't.
-
-The deeper point: **CI is the executable documentation of how your project works.** Someone reading your repo can look at `.github/workflows/test.yml` and learn "this is how you install and test this code." Conversely, if your CI passes on a fresh machine, you've proven the install instructions in your README actually work.
-
-### The GitHub Actions vocabulary
-
-GitHub Actions is one of many CI/CD systems. Others: CircleCI, GitLab CI, Jenkins, Travis CI. The concepts below are mostly universal; the keywords are GitHub-specific.
-
-- **Workflow** — one YAML file under `.github/workflows/`. One workflow = one purpose (run tests, publish release, run nightly job, etc.).
-- **Job** — a single unit of work inside a workflow. Jobs run in parallel unless you tell them to depend on each other. One job runs on one runner.
-- **Step** — a command inside a job. Steps run sequentially. If a step fails, the rest of the job stops.
-- **Runner** — the VM that executes the job. GitHub provides `ubuntu-latest`, `macos-latest`, `windows-latest`. You can also self-host runners.
-- **Trigger / `on:`** — what causes the workflow to fire. `push`, `pull_request`, `release`, `schedule` (cron), `workflow_dispatch` (manual button), and more.
-- **Matrix** — a single job that runs N times with different variable values (e.g., one per Python version). Saves duplication.
-- **Action** — a reusable building block, e.g. `actions/checkout@v4`. Other people's code you call from your workflow. Hosted on the GitHub Marketplace.
-- **Secrets / variables** — encrypted values stored on GitHub, available to workflows. Used for API tokens, etc. *We deliberately don't use stored secrets for PyPI — see Part 4.*
-- **Concurrency** — controls whether multiple runs of the same workflow can run at once. Useful to cancel old runs when you push twice in a row.
-- **Artifact** — files a job produces that you want to keep (build outputs, screenshots, coverage reports). Stored on GitHub for 90 days by default.
-
-### Cost
-
-For tracecraft (public repo): **free, unlimited**. GitHub gives unlimited Actions minutes to public repos. PyPI is always free for public packages.
-
-For private repos: free tier is 2,000 minutes/month, then ~$0.008/min on Linux. A 30-second run × 10 pushes/day × 30 days = 1.5 hours of CI/month — well under the free tier.
-
----
-
-## Part 2 — `test.yml` line by line
-
-Here's the actual file in this repo, with annotations.
-
-```yaml
-name: tests
-```
-The display name for the workflow in GitHub's UI. Shows up as "tests" on commits and PRs.
-
-```yaml
-on:
- push:
- branches: [main]
- pull_request:
- branches: [main]
-```
-**The trigger.** "Run this workflow when (a) someone pushes to `main`, or (b) someone opens/updates a PR targeting `main`." If we removed the `branches:` filter, the workflow would also run on pushes to feature branches — wasteful since the PR run already covers that.
-
-> *Note:* YAML 1.1 interprets the unquoted word `on` as the boolean `true` when parsed by some libraries. GitHub Actions handles this correctly. Just leave it as `on:` — no need to quote it.
-
-```yaml
-jobs:
- pytest:
-```
-One job named `pytest`. The name appears as the status check on PRs (`pytest (3.10)`, `pytest (3.11)`, etc., because of the matrix below).
-
-```yaml
- runs-on: ubuntu-latest
-```
-Use GitHub's latest Ubuntu runner. As of 2026 that's Ubuntu 24.04. Other choices: `ubuntu-22.04`, `macos-latest`, `windows-latest`. Ubuntu is the cheapest and fastest; we add macOS/Windows only when needed.
-
-```yaml
- strategy:
- fail-fast: false
- matrix:
- python-version: ["3.10", "3.11", "3.12", "3.13"]
-```
-The **matrix**. This single job definition gets expanded into 4 parallel runs, each with `${{ matrix.python-version }}` set to one of the listed versions. `fail-fast: false` means "if 3.10 fails, keep running 3.11/3.12/3.13 anyway" — useful because failures are often version-specific and you want to see them all.
-
-```yaml
- steps:
- - uses: actions/checkout@v4
-```
-**Step 1**: `actions/checkout@v4` is an official GitHub action that does `git clone` into the runner. The `@v4` is a version pin — major version 4. You should always pin actions; `@main` would mean "whatever they push" which can break you.
-
-```yaml
- - name: Set up Python ${{ matrix.python-version }}
- uses: actions/setup-python@v5
- with:
- python-version: ${{ matrix.python-version }}
- cache: pip
-```
-**Step 2**: installs the matrix Python version. `cache: pip` tells the action to cache `~/.cache/pip` between runs — speeds up subsequent runs by 30-60s because dependencies don't redownload.
-
-`${{ matrix.python-version }}` is GitHub's expression syntax: it substitutes the current matrix value. So this step runs four times across the four matrix cells with `3.10`, `3.11`, `3.12`, `3.13`.
-
-```yaml
- - name: Install package + dev extras
- working-directory: sdk
- run: |
- python -m pip install --upgrade pip
- pip install -e ".[dev,huggingface]"
-```
-**Step 3**: install tracecraft + its test/dev dependencies. The `|` lets you write multiline shell. `working-directory: sdk` means commands run from `sdk/`. `[dev,huggingface]` pulls the optional extras defined in `sdk/pyproject.toml`.
-
-```yaml
- - name: Run tests
- run: pytest sdk/tests/ -v
-```
-**Step 4**: actually run the tests. `working-directory` is back to repo root because we didn't specify one here. `-v` is verbose (one line per test). Exit code 0 = green check, non-zero = red X.
-
-That's the whole file. Less than 30 lines of YAML, and it gives you "the tests pass on 4 Python versions on Ubuntu" on every push.
-
-### What you'll see in the GitHub UI
-
-- On the commit list, a small ✓ or ✗ icon next to the commit hash.
-- On a PR, a "Checks" tab showing each matrix cell separately.
-- Click into a run to see logs per step.
-- The workflow file itself appears in the "Actions" tab.
-
----
-
-## Part 3 — `release.yml` line by line
-
-```yaml
-name: release
-
-on:
- release:
- types: [published]
-```
-Triggered by the `release.published` event, which fires when you click "Publish release" in the GitHub UI (or run `gh release create v0.2.0 ...`). NOT triggered by simply pushing a tag — there's a distinction. A tag is just a label on a commit; a "release" is a tag plus optional metadata (notes, attached binaries). We use the release event because it gives you a confirmation step before publication.
-
-```yaml
-jobs:
- build-and-publish:
- runs-on: ubuntu-latest
- environment:
- name: pypi
- url: https://pypi.org/project/tracecraft-ai/
-```
-The job runs in an **environment** called `pypi`. Environments are a GitHub feature for adding extra protection around sensitive jobs:
-- You can require manual approval before the job runs.
-- You can restrict which branches can deploy to the environment.
-- The environment shows up in the GitHub UI with the URL above as a clickable link.
-
-For tracecraft, the environment also matches what we'll tell PyPI to trust (in Part 4).
-
-```yaml
- permissions:
- id-token: write # required for PyPI trusted publishing
- contents: read
-```
-**This is the magic.** GitHub Actions has a per-job permission model. By default, the `GITHUB_TOKEN` (auto-generated for each run) has read-only access. `id-token: write` is what lets the job request an **OIDC token** — a short-lived JWT signed by GitHub that proves to PyPI "yes, this is the genuine release.yml workflow on Arrmlet/tracecraft running right now."
-
-`contents: read` keeps the rest of the permissions minimal — we don't need to write to the repo, only read its files.
-
-```yaml
- steps:
- - uses: actions/checkout@v4
- with:
- ref: ${{ github.event.release.tag_name }}
-```
-Checkout the repo, but specifically at the tag of the release that triggered this. Without `ref:`, it would check out the default branch — which might be ahead of the tag if someone pushed to `main` after creating the release. Using the tag means the wheel we publish is exactly the code in the release.
-
-```yaml
- - name: Set up Python
- uses: actions/setup-python@v5
- with:
- python-version: "3.12"
-```
-Only one Python version needed for building — the wheel is pure Python (`py3-none-any.whl`), so any modern Python builds it correctly.
-
-```yaml
- - name: Sync root README into sdk/ before build
- run: cp README.md sdk/README.md
-```
-A tracecraft-specific quirk. The Python package source lives in `sdk/`, but the README we want users to see on PyPI lives at the repo root. We copy it into `sdk/` before build so setuptools picks it up.
-
-```yaml
- - name: Build sdist + wheel
- working-directory: sdk
- run: |
- python -m pip install --upgrade pip build
- python -m build
-```
-Runs Python's modern build tool. Produces two files in `sdk/dist/`:
-- `tracecraft_ai-X.Y.Z.tar.gz` — the *source distribution* (sdist). What `pip` falls back to if no wheel is available.
-- `tracecraft_ai-X.Y.Z-py3-none-any.whl` — the *wheel*. Pre-built, no compilation needed on install.
-
-```yaml
- - name: Verify artifacts
- working-directory: sdk
- run: |
- pip install twine
- twine check dist/*
- python -m venv /tmp/verify
- /tmp/verify/bin/pip install dist/*.whl
- /tmp/verify/bin/tracecraft --version
-```
-Three sanity checks before publishing:
-1. `twine check` validates the wheel metadata (README rendering, classifiers, etc.).
-2. Install the wheel in a fresh venv — proves it's installable.
-3. Run `tracecraft --version` — proves the CLI entry point actually works.
-
-If any of these fail, the workflow stops here and doesn't publish a broken package.
-
-```yaml
- - name: Publish to PyPI
- uses: pypa/gh-action-pypi-publish@release/v1
- with:
- packages-dir: sdk/dist/
-```
-**The actual publish step.** This action sends the contents of `sdk/dist/` to PyPI using the OIDC token from earlier. Notice there is no `password:` or `username:` or `token:` field — that's the whole point of trusted publishing.
-
-This step fails until you configure PyPI to trust this workflow (Part 4).
-
----
-
-## Part 4 — PyPI Trusted Publishing setup
-
-This is the one-time browser configuration that unlocks `release.yml`. It's required because PyPI doesn't blindly accept uploads from any GitHub workflow — it needs to know which workflows you trust.
-
-### Background — why trusted publishing exists
-
-The old way: generate a PyPI API token, save it as a GitHub Secret, reference it in the workflow. Problems:
-- The token is long-lived. If your GitHub account is breached, the attacker has your PyPI publish access.
-- Hard to rotate; everyone forgets to.
-- One leak from any project = total PyPI account takeover.
-
-The new way (introduced 2023, mature in 2024-2025): **OIDC trusted publishing.** GitHub generates a short-lived token *per run*, signed by GitHub, that proves "this is genuinely the `release.yml` workflow in `Arrmlet/tracecraft` running right now, on a runner GitHub controls, for the tag `v0.2.0`." PyPI verifies that signature and accepts the upload.
-
-Properties:
-- The token is valid for ~10 minutes and only inside that specific job.
-- No secret stored anywhere — there's nothing to leak.
-- Scoped to one workflow file in one repo. An attacker would need to compromise GitHub itself.
-
-### Step-by-step setup
-
-You do this once, today. After that you never touch PyPI tokens for this project again.
-
-1. **Sign in to PyPI** at https://pypi.org/.
-
-2. **Go to project settings** — https://pypi.org/manage/project/tracecraft-ai/settings/publishing/
-
- If that URL 404s, navigate manually: Account dropdown → "Your projects" → click `tracecraft-ai` → "Publishing" in the left sidebar.
-
-3. **Click "Add a new pending publisher"** or "Add a new trusted publisher."
-
-4. **Choose "GitHub" as the publisher.**
-
-5. **Fill in the form exactly:**
- - **Owner:** `Arrmlet`
- - **Repository name:** `tracecraft`
- - **Workflow filename:** `release.yml` (just the filename, not the path)
- - **Environment name:** `pypi`
-
- The `Environment name` here MUST match the `environment.name:` in `release.yml` (which is `pypi`). Case matters.
-
-6. **Click "Add."**
-
-That's it. PyPI now trusts `release.yml`. The next time you create a GitHub Release, the workflow will run end-to-end and publish to PyPI without any prompt.
-
-### Verifying it works
-
-Don't ship a real release just to test. Instead, the first release that goes through the workflow IS the test. Recommended:
-
-1. Make a tiny code change (a comment, a typo fix in README).
-2. Bump version to `0.1.6` in `sdk/pyproject.toml` and `sdk/tracecraft/__init__.py`.
-3. Commit, push, tag, push tag.
-4. `gh release create v0.1.6 --title "v0.1.6 — CI/CD test" --notes "Testing trusted publishing"`
-5. Watch the workflow in the Actions tab. Should turn green in ~1 minute.
-6. Verify on PyPI: `pip install --upgrade tracecraft-ai` → version should be `0.1.6`.
-
-If step 5 fails at "Publish to PyPI" with a 403 — go back and check the publisher config matches exactly (owner case, workflow filename, environment name).
-
----
-
-## Part 5 — Reading the GitHub Actions UI
-
-When you push or create a PR, here's where the action is in the UI:
-
-### Per-commit status
-On the commit list, look for a circle/check/X next to the commit hash:
-- Yellow dot = running
-- Green check = all green
-- Red X = at least one job failed
-
-Hover or click for a summary. Click the icon to see the workflow details.
-
-### Per-PR status
-At the bottom of the PR, the "Checks" section shows each workflow. For a matrix workflow you'll see one row per matrix cell (`pytest (3.10)`, `pytest (3.11)`, etc.). Click "Details" to see logs.
-
-### Actions tab (`github.com/Arrmlet/tracecraft/actions`)
-The full history of all workflow runs. Filter by workflow on the left, by branch/event/status at the top. Click a run for detailed logs.
-
-### Inside a run
-You see the matrix cells (or single job) listed. Click one to expand the steps. Each step has its own logs and timing. Failed steps are highlighted red and auto-expand to show the error.
-
-### Re-running failed jobs
-If a run failed due to a flake (network blip, etc.), the "Re-run failed jobs" button at the top right re-runs only the failed cells. Re-runs preserve the commit SHA, so the new attempt is genuinely a do-over of the same code.
-
----
-
-## Part 6 — `gh` (GitHub CLI) for local interaction
-
-You don't have to use the browser UI. The `gh` CLI is faster:
-
-```bash
-# Watch the most recent run for the current branch
-gh run watch
-
-# List recent runs
-gh run list --limit 10
-
-# View the details of a specific run
-gh run view
-
-# View just the failed step logs
-gh run view --log-failed
-
-# Re-run a failed run
-gh run rerun
-
-# Cancel a stuck run
-gh run cancel
-
-# Create a release (this triggers release.yml)
-gh release create v0.2.0 --title "v0.2.0" --notes "..."
-
-# List releases
-gh release list
-
-# View one release
-gh release view v0.1.5
-```
-
----
-
-## Part 7 — Common failures and how to debug them
-
-### Test workflow goes red
-
-1. **Open the failed run** (Actions tab → click the red run).
-2. **Find the failing matrix cell.** Maybe only 3.10 failed — that narrows the cause to "Python 3.10 specific."
-3. **Expand "Run tests"** to see the pytest output. Same format as your local terminal.
-4. **Reproduce locally** with the same Python version: `pyenv install 3.10` → `pyenv local 3.10` → `pip install -e "sdk/[dev]"` → `pytest sdk/tests/`.
-5. **Fix and push again.** The workflow re-runs.
-
-### Workflow doesn't trigger at all
-
-- Check `on:` filters — pushing to a feature branch with `branches: [main]` only triggers on PR, not on direct push.
-- Check `.github/workflows/` path. Typos like `.github/workflow/` won't be picked up.
-- Check the workflow YAML is valid. GitHub UI will show a "Workflow invalid" error in the Actions tab.
-
-### Release workflow fails at "Publish to PyPI"
-
-- The error message is usually `403 Forbidden` or `Invalid or non-existent authentication information`.
-- Cause: trusted publishing config doesn't match the actual workflow run.
-- Fix: double-check the PyPI publishing form. Owner case-sensitive. Workflow filename is just `release.yml` (no path). Environment name `pypi` matches the `environment.name:` in the YAML.
-
-### "Resource not accessible by integration" error
-
-- Cause: missing `permissions:` in the YAML. The default permissions are read-only.
-- Fix: explicitly request what you need (`id-token: write`, `contents: write`, etc.) in the job.
-
-### Action versions deprecated
-
-You may see a banner: "Node.js 20 actions are deprecated." This is GitHub's runtime, not your code. Fix by bumping action versions (e.g., `actions/checkout@v4` → `actions/checkout@v5` when released). Non-urgent unless the runner refuses to execute the action.
-
-### Cached pip install picks up wrong package
-
-If you change `pyproject.toml` deps but CI still uses the cached old version, force a cache refresh by changing the `cache:` config or, easiest, the lockfile/`pyproject.toml` hash will already invalidate the cache automatically (which is the point of `cache: pip`).
-
----
-
-## Part 8 — What we deliberately did NOT add (and why)
-
-These are common CI additions that aren't worth it for a small Python OSS project. Add them if you grow into the need; don't add them just because.
-
-| Feature | Why we skipped |
-|---|---|
-| Coverage reporting (codecov) | 12 tests at this scale tell you more than a coverage % does. Add when team-size justifies. |
-| Linting gate (ruff in CI) | Ruff is in `dev` extras; run locally. Blocking PRs on lint is friction for a solo maintainer. |
-| Pre-commit hooks | Local-only friction. Helpful with 3+ contributors; overkill solo. |
-| Dependabot / Renovate | Adds noise. Manual quarterly review of deps is fine at this scale. |
-| Branch protection rules | You're solo. Self-review is acceptable. Add when contributors arrive. |
-| Auto-version-bump (release-please, semantic-release) | Overengineering until 5+ releases/quarter. |
-| Windows / macOS runners | Add on first user bug report from those platforms. |
-| Nightly cron tests against real S3 | Premature; moto covers correctness, real S3 issues are rare. |
-| CodeQL security scanning | Free if you enable it. Useful eventually; not on the critical path. |
-| Slack / Discord notifications | The Actions email is enough until you have a team channel. |
-
-The principle: **CI complexity should match project stakes.** Right now tracecraft is small and the maintainer is one person; the two workflows we have are the right size. Re-evaluate when stakes change.
-
----
-
-## Part 9 — Where to learn more
-
-- **GitHub Actions docs** — https://docs.github.com/en/actions. The official reference. The "Quickstart" and "Workflow syntax" pages are the most useful.
-- **PyPI trusted publishing docs** — https://docs.pypi.org/trusted-publishers/
-- **awesome-actions** — https://github.com/sdras/awesome-actions. Curated list of useful actions.
-- **Anatomy of a workflow** — https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions
-- **GitHub Actions security hardening** — https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions. Becomes relevant when you start using secrets, deployments, or third-party actions.
-
----
-
-## TL;DR — what you have now
-
-- **`test.yml`** runs the 12 backtests on Python 3.10/3.11/3.12/3.13 every push and PR. Free, ~30s, catches regressions before merge.
-- **`release.yml`** builds + publishes to PyPI on every GitHub Release. Requires one-time trusted publishing setup at https://pypi.org/manage/project/tracecraft-ai/settings/publishing/ (Owner: `Arrmlet`, Repo: `tracecraft`, Workflow: `release.yml`, Environment: `pypi`).
-- **No tokens stored anywhere.** OIDC-based trust.
-- **Free** for public repos.
-
-The next time you ship is:
-```
-# bump version in two files, commit
-gh release create v0.2.0 --title "..." --notes "..."
-# walk away; PyPI has it in ~60 seconds
-```
diff --git a/docs/assets/tracecraft-claim-race.gif b/docs/assets/tracecraft-claim-race.gif
new file mode 100644
index 0000000..febf41a
Binary files /dev/null and b/docs/assets/tracecraft-claim-race.gif differ
diff --git a/docs/assets/tracecraft-messaging.gif b/docs/assets/tracecraft-messaging.gif
new file mode 100644
index 0000000..449037e
Binary files /dev/null and b/docs/assets/tracecraft-messaging.gif differ
diff --git a/docs/session-mirror.md b/docs/session-mirror.md
new file mode 100644
index 0000000..9837ab5
--- /dev/null
+++ b/docs/session-mirror.md
@@ -0,0 +1,126 @@
+# Session mirror
+
+`tracecraft session mirror` copies a coding agent's session transcript into your
+bucket, alongside the coordination state (memory, messages, claims, artifacts)
+that tracecraft already stores under the same `/` prefix. One bucket
+ends up holding the full record of a multi-agent run: every agent's reasoning
+**and** every message between them.
+
+Sessions are never modified at the source. The mirror is a read-only tail.
+
+## Supported harnesses
+
+| `--harness` | Source | Storage |
+|---|---|---|
+| `claude-code` | `~/.claude/projects//.jsonl` | append-only JSONL |
+| `codex` | `~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl` | append-only JSONL |
+| `openclaw` | `/agents//sessions/.jsonl` | append-only JSONL |
+| `hermes` | `~/.hermes/state.db` (`messages` table) | SQLite (WAL) |
+
+All four expose the same interface to the mirror loop via the `Harness`
+protocol (`sdk/tracecraft/harness/base.py`). Adding a fifth harness is one
+file plus a `REGISTRY` entry.
+
+### Harness notes
+
+- **OpenClaw** state dir resolves `OPENCLAW_STATE_DIR` → `OPENCLAW_HOME` →
+ `~/.openclaw`. `--dev`/`--profile ` map to `~/.openclaw-dev` /
+ `~/.openclaw-` — point `OPENCLAW_STATE_DIR` at those if you use them.
+ The mutable `sessions.json` index and `*.tmp` staging files are skipped.
+ Session ids are unique only within an `agentId`, so the mirrored id is
+ `__`.
+- **Hermes** is SQLite, not a file. The adapter opens the DB **read-only**
+ (`mode=ro`, never `immutable`) so it is safe to run while Hermes is writing —
+ WAL mode allows concurrent readers. It reads new rows with
+ `WHERE id > :cursor ORDER BY id` (the same incremental pattern Hermes uses
+ internally) and synthesizes one JSON line per message. Multimodal `content`
+ stored with Hermes' `\x00json:` sentinel is decoded back to JSON.
+
+## Commands
+
+```bash
+tracecraft session mirror --harness [--session-id ID] [--cwd PATH]
+ [--no-redact] [--min-bytes N]
+tracecraft session list [--harness NAME] [--limit N] [--sort-by recent|size]
+tracecraft session show [--tail N]
+tracecraft session stop
+```
+
+### mirror
+
+Single-shot. Reads everything new since the last run, redacts, uploads it as a
+new part, updates `meta.json`, and advances the cursor. Safe to run repeatedly
+(e.g. from a cron, a `SessionEnd` hook, or a `while sleep 5` loop).
+
+```bash
+# Auto-pick the most recent claude-code session for the current directory
+tracecraft session mirror --harness claude-code
+
+# Explicit session, codex
+tracecraft session mirror --harness codex --session-id abc123
+
+# Hermes (session id is the sessions.id TEXT value, e.g. 20260529_120000_abc123)
+tracecraft session mirror --harness hermes --session-id 20260529_120000_abc123
+```
+
+If `--session-id` is omitted, the most recently active session is chosen
+(for Hermes, the session owning the highest message id).
+
+### list / show / stop
+
+```bash
+tracecraft session list # every mirrored session
+tracecraft session show # print meta.json
+tracecraft session show --tail 50 # + last 50 lines of the transcript
+tracecraft session stop # clear local state, mark ended_at
+```
+
+## Bucket layout
+
+Additive — does not touch existing coordination keys.
+
+```
+//
+ agents/ memory/ messages/ steps/ artifacts/ ← coordination
+ sessions/
+ /
+ /
+ part-00000-.jsonl ← one per mirror flush, disjoint
+ part-00001-.jsonl
+ meta.json ← cumulative metadata + redaction counts
+```
+
+Parts are append-disjoint and reassemble byte-for-byte (file harnesses) or
+row-for-row (Hermes). The `` suffix makes concurrent flushes from
+different machines collision-safe; reassembly sorts by sequence number.
+
+## The cursor model
+
+The mirror tracks a per-session **cursor** in
+`~/.tracecraft/mirror-state/.json`. The cursor is opaque:
+
+- file harnesses → a **byte offset**
+- Hermes → the highest **`messages.id`** (an AUTOINCREMENT rowid)
+
+`read_new(session, cursor)` returns `(new_bytes, new_cursor)` so advancement is
+race-free — the loop advances to exactly what it consumed, never to a
+separately-sampled size. Losing the state file is non-destructive: the next run
+re-derives the next part sequence number from a bucket LIST, and overlap is
+re-uploaded as a fresh part rather than clobbering existing ones.
+
+## Redaction
+
+Redaction is **on by default** and runs before any bytes leave the machine. It
+is a regex denylist (`sdk/tracecraft/redact.py`) covering AWS, Anthropic,
+OpenAI, HuggingFace, GitHub, and Slack token shapes plus bearer tokens. Every
+match is **counted** in `meta.json` (`redaction_counts`), never silently
+dropped.
+
+```bash
+tracecraft session mirror --harness claude-code # redaction on (default)
+tracecraft session mirror --harness claude-code --no-redact # raw, trusted buckets only
+```
+
+Redaction v0 catches well-known token shapes. It does **not** detect arbitrary
+secrets, custom internal token formats, or proprietary content. Treat it as a
+safety net, not a guarantee — and prefer a private bucket for session data.
diff --git a/plans/TRACES_V1_PLAN.md b/plans/TRACES_V1_PLAN.md
new file mode 100644
index 0000000..86bb5da
--- /dev/null
+++ b/plans/TRACES_V1_PLAN.md
@@ -0,0 +1,435 @@
+# traces-v1 — Session Mirror & Replay
+
+**Branch:** `traces-v1`
+**Target release:** `0.2.0`
+**Estimated effort:** 12–14 working days
+**Status:** drafted 2026-05-20
+
+---
+
+## 1. Why this exists (the only thing that matters)
+
+The 2026-05 market scan (`plans/MARKET_REPORT_SESSIONS_2026_05.md`) found
+session-mirroring is **commodity**:
+
+- Anthropic ships **SessionStore** (Claude-Code-native, opaque cloud)
+- HuggingFace ships **Storage Buckets + Agent Trace Viewer** (HF-only)
+- **DataClaw** (2.1k★) and **claude-sync** (119★) already mirror local JSONL
+
+So copying any of them is a waste. Tracecraft's session mirror only earns
+its place if it does **three things none of them do**:
+
+1. **Cross-backend.** Any S3-compatible bucket (AWS, R2, MinIO, B2, Wasabi)
+ *and* HF Buckets. The user owns the data; we never see it.
+2. **Sessions + coordination in one bucket.** Tracecraft already stores
+ memory / mailbox / claims / artifacts under `/`. Putting
+ harness sessions under the same `/sessions/` namespace
+ means one bucket holds the *entire* multi-agent history.
+3. **Cross-harness replay.** Claude Code JSONL + Codex JSONL + tracecraft
+ coordination events merged into one timeline. This is the killer
+ demo: "watch four Claude Code agents coordinate, see each one's
+ reasoning, see the messages between them, in a single HTML."
+
+If at any point during implementation we feel pulled toward features
+that don't serve those three goals, stop and re-read this section.
+
+---
+
+## 2. Non-goals
+
+These look tempting and are deliberately excluded from `0.2.0`:
+
+- **Real-time UI.** Replay is a static HTML render of a finished bucket.
+ No live websocket, no dashboard server.
+- **LLM-based redaction.** Regex denylist v0 only; LLM redaction is a
+ later-tier item once we know the false-positive rate.
+- **Trace signing / SN13 submission.** That's `SN13_AGENT_TRACES_PITCH.md`
+ territory, separate 3-week de-risk plan.
+- **Anthropic SessionStore integration.** Their API, their schema,
+ their lock-in. We mirror the local JSONL — that's the open path.
+- **MCP server.** Already decided redundant given CLI + SKILL.md.
+- **Cursor / Cline / Aider support.** Claude Code + Codex first. Others
+ follow only if there's demand and a JSONL-equivalent format.
+- **TTL claims, heartbeat refresh, message-key collision.** These are
+ Tier 1 fixes from `RESEARCH_2026_05.md`. Bundle them in `0.2.1` if
+ traces-v1 didn't subsume the need.
+
+---
+
+## 3. Scope: nine deliverables
+
+| # | Deliverable | Approx LoC | Days |
+|---|-------------|-----------|------|
+| D1 | `tracecraft session mirror` (Claude Code) | 150 | 2 |
+| D2 | Claude Code plugin (`.claude-plugin/`) | 250 | 1 |
+| D3 | Codex variant | 80 | 1 |
+| D4 | `tracecraft session list / show` | 80 | 1 |
+| D5 | `tracecraft replay` (the killer demo) | 350 | 2 |
+| D6 | Redaction v0 (regex denylist) | 100 | 0.5 |
+| D7 | Tests (moto + golden JSONL fixtures) | 250 | 1.5 |
+| D8 | Docs (README + SKILL.md + plugin README) | — | 1 |
+| D9 | Launch artifact (4-agent demo recording) | — | 1 |
+
+Total: ~1,260 LoC, 11 working days + 1 day slack.
+
+---
+
+## 4. Bucket layout (additive — does not touch existing keys)
+
+```
+//
+ …existing keys (agents/, memory/, messages/, steps/, artifacts/)…
+ sessions/
+ claude-code/
+ .jsonl ← raw JSONL stream (append-only)
+ .meta.json ← cwd, started_at, ended_at, agent_id,
+ line_count, redacted_count, schema_version
+ codex/
+ .jsonl
+ .meta.json
+ _index.json ← list of all sessions (rebuilt on each upload)
+```
+
+**Why a separate top-level `sessions/` instead of nesting under `agents/`:**
+sessions belong to a *harness instance*, not always to a registered tracecraft
+agent. A solo dev running Claude Code with no `tracecraft init agents/...` still
+benefits from the mirror. Linking to an `agent_id` is optional metadata.
+
+---
+
+## 5. D1 — `tracecraft session mirror` (the foundation)
+
+### Command
+```
+tracecraft session mirror [--harness claude-code|codex] [--session-id ]
+ [--watch-dir ] [--batch-seconds 5]
+ [--once] [--detach]
+```
+
+### Behaviour
+1. Auto-detect the active session if `--session-id` is omitted:
+ - **Claude Code:** glob `~/.claude/projects//*.jsonl`,
+ pick the one with the most recent `mtime`.
+ - **Codex:** glob `~/.codex/sessions////rollout-*.jsonl`,
+ same heuristic.
+2. Tail the file (resume from byte offset stored in
+ `~/.tracecraft/mirror-.state`).
+3. Every `--batch-seconds` (default 5), flush the new bytes to
+ `sessions//.jsonl` using
+ **multipart append via copy-then-put** (S3 has no native append;
+ we re-upload the growing object, see §5.3).
+4. Update `.meta.json` on every flush.
+5. Track PID in `~/.tracecraft/mirror.pid` (per-session, not global) so
+ the user can `tracecraft session stop ` cleanly.
+6. `--detach` forks a background process (Unix `os.fork()`,
+ on Windows fall back to subprocess + log file).
+7. `--once` does a single sync and exits (good for cron / hooks).
+
+### 5.1 Append strategy on S3
+
+S3 has no `append`. Options considered:
+
+| Option | Pros | Cons | Verdict |
+|--------|------|------|---------|
+| Re-upload full file every batch | Trivial | Cost grows O(n²) for long sessions | ✗ |
+| One object per batch (`..jsonl`) | Cheap, no read-back | Replay must list+merge | ✓ chosen |
+| S3 multipart upload kept open | True append-ish | Multipart sessions abort on agent crash | ✗ |
+
+**Chosen:** one object per batch. Final layout:
+```
+sessions/claude-code//
+ part-00000.jsonl
+ part-00001.jsonl
+ …
+ meta.json
+```
+Replay/show concatenates parts in order. `tracecraft session compact `
+(later) merges into one file for archival.
+
+Trade-off accepted: more list operations during replay. Cheap on S3
+($0.005 per 1000 LIST). For long sessions this is materially better.
+
+### 5.2 State file format
+
+`~/.tracecraft/mirror-state/.json`:
+```json
+{
+ "harness": "claude-code",
+ "session_id": "abc123",
+ "source_path": "/Users/x/.claude/projects/.../abc123.jsonl",
+ "bucket_prefix": "sessions/claude-code/abc123/",
+ "byte_offset": 142857,
+ "next_part_seq": 12,
+ "last_flush": "2026-05-20T10:15:00Z",
+ "pid": 4523
+}
+```
+
+### 5.3 Graceful shutdown
+- `SIGTERM` / `SIGINT` → flush pending buffer, write final meta, remove pid.
+- Crash → state file lets next `mirror` invocation resume from `byte_offset`.
+- Idempotency: if `part-.jsonl` already exists at the target key,
+ bump `next_part_seq` until empty slot found (defends against duplicate
+ uploads after partial crash).
+
+---
+
+## 6. D2 — Claude Code plugin
+
+### Why a plugin (vs a hook the user installs manually)
+The whole point is **zero-friction**. If the user has to edit JSON
+config files, we lose. `/plugin install tracecraft` should be the path.
+
+### Files in `plugins/claude-code/`
+```
+plugins/claude-code/
+ .claude-plugin/
+ plugin.json ← name, version, hooks, commands
+ hooks/
+ session-start.sh ← spawns `tracecraft session mirror --detach`
+ session-end.sh ← `tracecraft session stop $CLAUDE_SESSION_ID`
+ skills/
+ tracecraft.md ← SKILL.md so Claude inside Claude Code knows
+ how to use tracecraft for coordination
+ commands/
+ tc-mirror.md ← /tc-mirror slash command (start/stop/status)
+ tc-replay.md ← /tc-replay slash command
+ README.md
+```
+
+### Submission target
+Anthropic's plugin marketplace + GitHub direct-install path
+(`/plugin install Arrmlet/tracecraft`).
+
+### Open question to resolve during impl
+Does `SessionStart` hook fire on `claude --resume`? If not, we also
+need a `UserPromptSubmit` hook with a "have we started mirroring?" guard.
+(Test on day 1 of D2; cheap to verify.)
+
+---
+
+## 7. D3 — Codex variant
+
+Codex CLI writes to `~/.codex/sessions////rollout-*.jsonl`.
+Schema differs (it's not Claude-Code JSONL) but the *act of tailing* is
+identical. ~50 LoC: just a new `Harness` adapter that knows the
+glob pattern and (optionally) translates entries to a normalized schema.
+
+For replay we'll keep entries in their native schema and let the
+renderer handle two harness types side-by-side. **No premature
+normalization** — if a third harness lands, then we extract a base.
+
+---
+
+## 8. D4 — `session list` / `session show`
+
+```
+tracecraft session list [--harness claude-code|codex] [--limit 20]
+tracecraft session show [--tail 50]
+tracecraft session stop
+```
+
+Reads `/sessions/_index.json`. `_index.json` is rewritten on
+each meta update (write whole file — it's tiny; ~1 KB per 100 sessions).
+
+---
+
+## 9. D5 — `tracecraft replay` (the killer demo)
+
+This is where tracecraft stops looking like "yet another session
+mirror" and becomes a coordination viewer.
+
+### Command
+```
+tracecraft replay [--project ] [--out replay.html] [--open]
+ [--since ] [--until ]
+```
+
+### What it does
+1. Pulls **all** of `/`:
+ - `agents/*.json` (registered agents)
+ - `memory/*.json` (every memory write — but memory keys don't have
+ timestamps; we'll need to add `_updated_at` to memory writes —
+ small backwards-compatible change)
+ - `messages/**/*.json` (every message)
+ - `steps/**/*.json` (every claim/handoff/status)
+ - `sessions/**/part-*.jsonl` (every harness event)
+2. Builds a unified timeline (single sorted array of events,
+ each tagged with `event_type` and `agent_id`).
+3. Renders a **single self-contained HTML file** (no server) with:
+ - vertical timeline (newest at top or oldest at top, toggle)
+ - one swim-lane per agent
+ - colour-coding: coordination events (claim/message/memory) vs
+ harness events (tool-use, reasoning, file-edit)
+ - click any event → expand JSON
+ - filter by agent / event type / text search
+
+### Tech for the HTML
+- Pure HTML + vanilla JS embedded in one file. **No build step.**
+ React/Vite would be faster to write but harder to ship and harder
+ for users to inspect/trust.
+- One inlined `