From e71c0494dc3c8194f5cfd8b4234e56bbca79c4ce Mon Sep 17 00:00:00 2001 From: arrmlet Date: Fri, 29 May 2026 03:22:44 +0300 Subject: [PATCH 1/5] Add Harness adapter framework: claude-code, codex, openclaw, hermes Harness Protocol with read_new(cursor) -> (bytes, new_cursor) so the mirror loop is race-free and works for both file-tail (claude-code, codex, openclaw) and SQLite-poll (hermes, read-only WAL) backends. Redaction v0 (regex denylist, counted in meta) included. OpenClaw + Hermes formats verified against upstream GitHub source. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/CI_CD_GUIDE.md | 429 -------------------------- sdk/tracecraft/harness/__init__.py | 42 +++ sdk/tracecraft/harness/base.py | 68 ++++ sdk/tracecraft/harness/claude_code.py | 65 ++++ sdk/tracecraft/harness/codex.py | 63 ++++ sdk/tracecraft/harness/hermes.py | 164 ++++++++++ sdk/tracecraft/harness/openclaw.py | 96 ++++++ sdk/tracecraft/redact.py | 52 ++++ 8 files changed, 550 insertions(+), 429 deletions(-) delete mode 100644 docs/CI_CD_GUIDE.md create mode 100644 sdk/tracecraft/harness/__init__.py create mode 100644 sdk/tracecraft/harness/base.py create mode 100644 sdk/tracecraft/harness/claude_code.py create mode 100644 sdk/tracecraft/harness/codex.py create mode 100644 sdk/tracecraft/harness/hermes.py create mode 100644 sdk/tracecraft/harness/openclaw.py create mode 100644 sdk/tracecraft/redact.py diff --git a/docs/CI_CD_GUIDE.md b/docs/CI_CD_GUIDE.md deleted file mode 100644 index e9092bf..0000000 --- a/docs/CI_CD_GUIDE.md +++ /dev/null @@ -1,429 +0,0 @@ -# CI/CD for tracecraft — a learning doc - -This is a working-engineer's introduction to CI/CD, grounded in the two workflows tracecraft uses today (`.github/workflows/test.yml` and `release.yml`). By the end you'll understand every line of YAML in this repo, the trust model that lets GitHub publish to PyPI without a stored password, and how to debug failures. - -Written for someone who knows Python and git but hasn't owned a CI/CD pipeline before. Skip sections you already know. - ---- - -## Part 1 — The concepts, in 5 minutes - -### What CI/CD actually is - -Two related-but-distinct things: - -- **Continuous Integration (CI)** — every time someone pushes code or opens a PR, a fresh computer (a "runner") checks the code out, installs dependencies, and runs your tests. If anything is broken, you see it within seconds on the commit/PR. The point is to catch breakage *before* it merges to `main`, not after. - -- **Continuous Delivery (CD)** — when you mark a commit as a release (cut a tag, click "Publish release"), a fresh computer builds the shippable artifact (a Python wheel, a Docker image, a binary) and uploads it to wherever users get it (PyPI, Docker Hub, App Store). The point is to make releases boring and repeatable — no "did I remember to bump the version in both files?" mistakes. - -Sometimes people add **Continuous Deployment** (same acronym, different word) — automatically pushing every green commit to production. Tracecraft has no servers, so that doesn't apply here. - -### Why it exists - -Before CI/CD, releases were a checklist a human did by hand. Six steps, easy to skip one, easy to miss "this works on my machine" bugs. CI runs the checklist in a known-clean environment, every time, and fails loudly when it can't. - -The deeper point: **CI is the executable documentation of how your project works.** Someone reading your repo can look at `.github/workflows/test.yml` and learn "this is how you install and test this code." Conversely, if your CI passes on a fresh machine, you've proven the install instructions in your README actually work. - -### The GitHub Actions vocabulary - -GitHub Actions is one of many CI/CD systems. Others: CircleCI, GitLab CI, Jenkins, Travis CI. The concepts below are mostly universal; the keywords are GitHub-specific. - -- **Workflow** — one YAML file under `.github/workflows/`. One workflow = one purpose (run tests, publish release, run nightly job, etc.). -- **Job** — a single unit of work inside a workflow. Jobs run in parallel unless you tell them to depend on each other. One job runs on one runner. -- **Step** — a command inside a job. Steps run sequentially. If a step fails, the rest of the job stops. -- **Runner** — the VM that executes the job. GitHub provides `ubuntu-latest`, `macos-latest`, `windows-latest`. You can also self-host runners. -- **Trigger / `on:`** — what causes the workflow to fire. `push`, `pull_request`, `release`, `schedule` (cron), `workflow_dispatch` (manual button), and more. -- **Matrix** — a single job that runs N times with different variable values (e.g., one per Python version). Saves duplication. -- **Action** — a reusable building block, e.g. `actions/checkout@v4`. Other people's code you call from your workflow. Hosted on the GitHub Marketplace. -- **Secrets / variables** — encrypted values stored on GitHub, available to workflows. Used for API tokens, etc. *We deliberately don't use stored secrets for PyPI — see Part 4.* -- **Concurrency** — controls whether multiple runs of the same workflow can run at once. Useful to cancel old runs when you push twice in a row. -- **Artifact** — files a job produces that you want to keep (build outputs, screenshots, coverage reports). Stored on GitHub for 90 days by default. - -### Cost - -For tracecraft (public repo): **free, unlimited**. GitHub gives unlimited Actions minutes to public repos. PyPI is always free for public packages. - -For private repos: free tier is 2,000 minutes/month, then ~$0.008/min on Linux. A 30-second run × 10 pushes/day × 30 days = 1.5 hours of CI/month — well under the free tier. - ---- - -## Part 2 — `test.yml` line by line - -Here's the actual file in this repo, with annotations. - -```yaml -name: tests -``` -The display name for the workflow in GitHub's UI. Shows up as "tests" on commits and PRs. - -```yaml -on: - push: - branches: [main] - pull_request: - branches: [main] -``` -**The trigger.** "Run this workflow when (a) someone pushes to `main`, or (b) someone opens/updates a PR targeting `main`." If we removed the `branches:` filter, the workflow would also run on pushes to feature branches — wasteful since the PR run already covers that. - -> *Note:* YAML 1.1 interprets the unquoted word `on` as the boolean `true` when parsed by some libraries. GitHub Actions handles this correctly. Just leave it as `on:` — no need to quote it. - -```yaml -jobs: - pytest: -``` -One job named `pytest`. The name appears as the status check on PRs (`pytest (3.10)`, `pytest (3.11)`, etc., because of the matrix below). - -```yaml - runs-on: ubuntu-latest -``` -Use GitHub's latest Ubuntu runner. As of 2026 that's Ubuntu 24.04. Other choices: `ubuntu-22.04`, `macos-latest`, `windows-latest`. Ubuntu is the cheapest and fastest; we add macOS/Windows only when needed. - -```yaml - strategy: - fail-fast: false - matrix: - python-version: ["3.10", "3.11", "3.12", "3.13"] -``` -The **matrix**. This single job definition gets expanded into 4 parallel runs, each with `${{ matrix.python-version }}` set to one of the listed versions. `fail-fast: false` means "if 3.10 fails, keep running 3.11/3.12/3.13 anyway" — useful because failures are often version-specific and you want to see them all. - -```yaml - steps: - - uses: actions/checkout@v4 -``` -**Step 1**: `actions/checkout@v4` is an official GitHub action that does `git clone` into the runner. The `@v4` is a version pin — major version 4. You should always pin actions; `@main` would mean "whatever they push" which can break you. - -```yaml - - name: Set up Python ${{ matrix.python-version }} - uses: actions/setup-python@v5 - with: - python-version: ${{ matrix.python-version }} - cache: pip -``` -**Step 2**: installs the matrix Python version. `cache: pip` tells the action to cache `~/.cache/pip` between runs — speeds up subsequent runs by 30-60s because dependencies don't redownload. - -`${{ matrix.python-version }}` is GitHub's expression syntax: it substitutes the current matrix value. So this step runs four times across the four matrix cells with `3.10`, `3.11`, `3.12`, `3.13`. - -```yaml - - name: Install package + dev extras - working-directory: sdk - run: | - python -m pip install --upgrade pip - pip install -e ".[dev,huggingface]" -``` -**Step 3**: install tracecraft + its test/dev dependencies. The `|` lets you write multiline shell. `working-directory: sdk` means commands run from `sdk/`. `[dev,huggingface]` pulls the optional extras defined in `sdk/pyproject.toml`. - -```yaml - - name: Run tests - run: pytest sdk/tests/ -v -``` -**Step 4**: actually run the tests. `working-directory` is back to repo root because we didn't specify one here. `-v` is verbose (one line per test). Exit code 0 = green check, non-zero = red X. - -That's the whole file. Less than 30 lines of YAML, and it gives you "the tests pass on 4 Python versions on Ubuntu" on every push. - -### What you'll see in the GitHub UI - -- On the commit list, a small ✓ or ✗ icon next to the commit hash. -- On a PR, a "Checks" tab showing each matrix cell separately. -- Click into a run to see logs per step. -- The workflow file itself appears in the "Actions" tab. - ---- - -## Part 3 — `release.yml` line by line - -```yaml -name: release - -on: - release: - types: [published] -``` -Triggered by the `release.published` event, which fires when you click "Publish release" in the GitHub UI (or run `gh release create v0.2.0 ...`). NOT triggered by simply pushing a tag — there's a distinction. A tag is just a label on a commit; a "release" is a tag plus optional metadata (notes, attached binaries). We use the release event because it gives you a confirmation step before publication. - -```yaml -jobs: - build-and-publish: - runs-on: ubuntu-latest - environment: - name: pypi - url: https://pypi.org/project/tracecraft-ai/ -``` -The job runs in an **environment** called `pypi`. Environments are a GitHub feature for adding extra protection around sensitive jobs: -- You can require manual approval before the job runs. -- You can restrict which branches can deploy to the environment. -- The environment shows up in the GitHub UI with the URL above as a clickable link. - -For tracecraft, the environment also matches what we'll tell PyPI to trust (in Part 4). - -```yaml - permissions: - id-token: write # required for PyPI trusted publishing - contents: read -``` -**This is the magic.** GitHub Actions has a per-job permission model. By default, the `GITHUB_TOKEN` (auto-generated for each run) has read-only access. `id-token: write` is what lets the job request an **OIDC token** — a short-lived JWT signed by GitHub that proves to PyPI "yes, this is the genuine release.yml workflow on Arrmlet/tracecraft running right now." - -`contents: read` keeps the rest of the permissions minimal — we don't need to write to the repo, only read its files. - -```yaml - steps: - - uses: actions/checkout@v4 - with: - ref: ${{ github.event.release.tag_name }} -``` -Checkout the repo, but specifically at the tag of the release that triggered this. Without `ref:`, it would check out the default branch — which might be ahead of the tag if someone pushed to `main` after creating the release. Using the tag means the wheel we publish is exactly the code in the release. - -```yaml - - name: Set up Python - uses: actions/setup-python@v5 - with: - python-version: "3.12" -``` -Only one Python version needed for building — the wheel is pure Python (`py3-none-any.whl`), so any modern Python builds it correctly. - -```yaml - - name: Sync root README into sdk/ before build - run: cp README.md sdk/README.md -``` -A tracecraft-specific quirk. The Python package source lives in `sdk/`, but the README we want users to see on PyPI lives at the repo root. We copy it into `sdk/` before build so setuptools picks it up. - -```yaml - - name: Build sdist + wheel - working-directory: sdk - run: | - python -m pip install --upgrade pip build - python -m build -``` -Runs Python's modern build tool. Produces two files in `sdk/dist/`: -- `tracecraft_ai-X.Y.Z.tar.gz` — the *source distribution* (sdist). What `pip` falls back to if no wheel is available. -- `tracecraft_ai-X.Y.Z-py3-none-any.whl` — the *wheel*. Pre-built, no compilation needed on install. - -```yaml - - name: Verify artifacts - working-directory: sdk - run: | - pip install twine - twine check dist/* - python -m venv /tmp/verify - /tmp/verify/bin/pip install dist/*.whl - /tmp/verify/bin/tracecraft --version -``` -Three sanity checks before publishing: -1. `twine check` validates the wheel metadata (README rendering, classifiers, etc.). -2. Install the wheel in a fresh venv — proves it's installable. -3. Run `tracecraft --version` — proves the CLI entry point actually works. - -If any of these fail, the workflow stops here and doesn't publish a broken package. - -```yaml - - name: Publish to PyPI - uses: pypa/gh-action-pypi-publish@release/v1 - with: - packages-dir: sdk/dist/ -``` -**The actual publish step.** This action sends the contents of `sdk/dist/` to PyPI using the OIDC token from earlier. Notice there is no `password:` or `username:` or `token:` field — that's the whole point of trusted publishing. - -This step fails until you configure PyPI to trust this workflow (Part 4). - ---- - -## Part 4 — PyPI Trusted Publishing setup - -This is the one-time browser configuration that unlocks `release.yml`. It's required because PyPI doesn't blindly accept uploads from any GitHub workflow — it needs to know which workflows you trust. - -### Background — why trusted publishing exists - -The old way: generate a PyPI API token, save it as a GitHub Secret, reference it in the workflow. Problems: -- The token is long-lived. If your GitHub account is breached, the attacker has your PyPI publish access. -- Hard to rotate; everyone forgets to. -- One leak from any project = total PyPI account takeover. - -The new way (introduced 2023, mature in 2024-2025): **OIDC trusted publishing.** GitHub generates a short-lived token *per run*, signed by GitHub, that proves "this is genuinely the `release.yml` workflow in `Arrmlet/tracecraft` running right now, on a runner GitHub controls, for the tag `v0.2.0`." PyPI verifies that signature and accepts the upload. - -Properties: -- The token is valid for ~10 minutes and only inside that specific job. -- No secret stored anywhere — there's nothing to leak. -- Scoped to one workflow file in one repo. An attacker would need to compromise GitHub itself. - -### Step-by-step setup - -You do this once, today. After that you never touch PyPI tokens for this project again. - -1. **Sign in to PyPI** at https://pypi.org/. - -2. **Go to project settings** — https://pypi.org/manage/project/tracecraft-ai/settings/publishing/ - - If that URL 404s, navigate manually: Account dropdown → "Your projects" → click `tracecraft-ai` → "Publishing" in the left sidebar. - -3. **Click "Add a new pending publisher"** or "Add a new trusted publisher." - -4. **Choose "GitHub" as the publisher.** - -5. **Fill in the form exactly:** - - **Owner:** `Arrmlet` - - **Repository name:** `tracecraft` - - **Workflow filename:** `release.yml` (just the filename, not the path) - - **Environment name:** `pypi` - - The `Environment name` here MUST match the `environment.name:` in `release.yml` (which is `pypi`). Case matters. - -6. **Click "Add."** - -That's it. PyPI now trusts `release.yml`. The next time you create a GitHub Release, the workflow will run end-to-end and publish to PyPI without any prompt. - -### Verifying it works - -Don't ship a real release just to test. Instead, the first release that goes through the workflow IS the test. Recommended: - -1. Make a tiny code change (a comment, a typo fix in README). -2. Bump version to `0.1.6` in `sdk/pyproject.toml` and `sdk/tracecraft/__init__.py`. -3. Commit, push, tag, push tag. -4. `gh release create v0.1.6 --title "v0.1.6 — CI/CD test" --notes "Testing trusted publishing"` -5. Watch the workflow in the Actions tab. Should turn green in ~1 minute. -6. Verify on PyPI: `pip install --upgrade tracecraft-ai` → version should be `0.1.6`. - -If step 5 fails at "Publish to PyPI" with a 403 — go back and check the publisher config matches exactly (owner case, workflow filename, environment name). - ---- - -## Part 5 — Reading the GitHub Actions UI - -When you push or create a PR, here's where the action is in the UI: - -### Per-commit status -On the commit list, look for a circle/check/X next to the commit hash: -- Yellow dot = running -- Green check = all green -- Red X = at least one job failed - -Hover or click for a summary. Click the icon to see the workflow details. - -### Per-PR status -At the bottom of the PR, the "Checks" section shows each workflow. For a matrix workflow you'll see one row per matrix cell (`pytest (3.10)`, `pytest (3.11)`, etc.). Click "Details" to see logs. - -### Actions tab (`github.com/Arrmlet/tracecraft/actions`) -The full history of all workflow runs. Filter by workflow on the left, by branch/event/status at the top. Click a run for detailed logs. - -### Inside a run -You see the matrix cells (or single job) listed. Click one to expand the steps. Each step has its own logs and timing. Failed steps are highlighted red and auto-expand to show the error. - -### Re-running failed jobs -If a run failed due to a flake (network blip, etc.), the "Re-run failed jobs" button at the top right re-runs only the failed cells. Re-runs preserve the commit SHA, so the new attempt is genuinely a do-over of the same code. - ---- - -## Part 6 — `gh` (GitHub CLI) for local interaction - -You don't have to use the browser UI. The `gh` CLI is faster: - -```bash -# Watch the most recent run for the current branch -gh run watch - -# List recent runs -gh run list --limit 10 - -# View the details of a specific run -gh run view - -# View just the failed step logs -gh run view --log-failed - -# Re-run a failed run -gh run rerun - -# Cancel a stuck run -gh run cancel - -# Create a release (this triggers release.yml) -gh release create v0.2.0 --title "v0.2.0" --notes "..." - -# List releases -gh release list - -# View one release -gh release view v0.1.5 -``` - ---- - -## Part 7 — Common failures and how to debug them - -### Test workflow goes red - -1. **Open the failed run** (Actions tab → click the red run). -2. **Find the failing matrix cell.** Maybe only 3.10 failed — that narrows the cause to "Python 3.10 specific." -3. **Expand "Run tests"** to see the pytest output. Same format as your local terminal. -4. **Reproduce locally** with the same Python version: `pyenv install 3.10` → `pyenv local 3.10` → `pip install -e "sdk/[dev]"` → `pytest sdk/tests/`. -5. **Fix and push again.** The workflow re-runs. - -### Workflow doesn't trigger at all - -- Check `on:` filters — pushing to a feature branch with `branches: [main]` only triggers on PR, not on direct push. -- Check `.github/workflows/` path. Typos like `.github/workflow/` won't be picked up. -- Check the workflow YAML is valid. GitHub UI will show a "Workflow invalid" error in the Actions tab. - -### Release workflow fails at "Publish to PyPI" - -- The error message is usually `403 Forbidden` or `Invalid or non-existent authentication information`. -- Cause: trusted publishing config doesn't match the actual workflow run. -- Fix: double-check the PyPI publishing form. Owner case-sensitive. Workflow filename is just `release.yml` (no path). Environment name `pypi` matches the `environment.name:` in the YAML. - -### "Resource not accessible by integration" error - -- Cause: missing `permissions:` in the YAML. The default permissions are read-only. -- Fix: explicitly request what you need (`id-token: write`, `contents: write`, etc.) in the job. - -### Action versions deprecated - -You may see a banner: "Node.js 20 actions are deprecated." This is GitHub's runtime, not your code. Fix by bumping action versions (e.g., `actions/checkout@v4` → `actions/checkout@v5` when released). Non-urgent unless the runner refuses to execute the action. - -### Cached pip install picks up wrong package - -If you change `pyproject.toml` deps but CI still uses the cached old version, force a cache refresh by changing the `cache:` config or, easiest, the lockfile/`pyproject.toml` hash will already invalidate the cache automatically (which is the point of `cache: pip`). - ---- - -## Part 8 — What we deliberately did NOT add (and why) - -These are common CI additions that aren't worth it for a small Python OSS project. Add them if you grow into the need; don't add them just because. - -| Feature | Why we skipped | -|---|---| -| Coverage reporting (codecov) | 12 tests at this scale tell you more than a coverage % does. Add when team-size justifies. | -| Linting gate (ruff in CI) | Ruff is in `dev` extras; run locally. Blocking PRs on lint is friction for a solo maintainer. | -| Pre-commit hooks | Local-only friction. Helpful with 3+ contributors; overkill solo. | -| Dependabot / Renovate | Adds noise. Manual quarterly review of deps is fine at this scale. | -| Branch protection rules | You're solo. Self-review is acceptable. Add when contributors arrive. | -| Auto-version-bump (release-please, semantic-release) | Overengineering until 5+ releases/quarter. | -| Windows / macOS runners | Add on first user bug report from those platforms. | -| Nightly cron tests against real S3 | Premature; moto covers correctness, real S3 issues are rare. | -| CodeQL security scanning | Free if you enable it. Useful eventually; not on the critical path. | -| Slack / Discord notifications | The Actions email is enough until you have a team channel. | - -The principle: **CI complexity should match project stakes.** Right now tracecraft is small and the maintainer is one person; the two workflows we have are the right size. Re-evaluate when stakes change. - ---- - -## Part 9 — Where to learn more - -- **GitHub Actions docs** — https://docs.github.com/en/actions. The official reference. The "Quickstart" and "Workflow syntax" pages are the most useful. -- **PyPI trusted publishing docs** — https://docs.pypi.org/trusted-publishers/ -- **awesome-actions** — https://github.com/sdras/awesome-actions. Curated list of useful actions. -- **Anatomy of a workflow** — https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions -- **GitHub Actions security hardening** — https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions. Becomes relevant when you start using secrets, deployments, or third-party actions. - ---- - -## TL;DR — what you have now - -- **`test.yml`** runs the 12 backtests on Python 3.10/3.11/3.12/3.13 every push and PR. Free, ~30s, catches regressions before merge. -- **`release.yml`** builds + publishes to PyPI on every GitHub Release. Requires one-time trusted publishing setup at https://pypi.org/manage/project/tracecraft-ai/settings/publishing/ (Owner: `Arrmlet`, Repo: `tracecraft`, Workflow: `release.yml`, Environment: `pypi`). -- **No tokens stored anywhere.** OIDC-based trust. -- **Free** for public repos. - -The next time you ship is: -``` -# bump version in two files, commit -gh release create v0.2.0 --title "..." --notes "..." -# walk away; PyPI has it in ~60 seconds -``` diff --git a/sdk/tracecraft/harness/__init__.py b/sdk/tracecraft/harness/__init__.py new file mode 100644 index 0000000..d28f357 --- /dev/null +++ b/sdk/tracecraft/harness/__init__.py @@ -0,0 +1,42 @@ +"""Harness adapters — each one knows how to find and read sessions from a +specific coding agent (Claude Code, Codex, OpenClaw, Pi, OpenCode, Hermes, …). + +The base `Harness` protocol is intentionally tiny: discover sessions, parse a +session id from a path, and return the new bytes since a known offset. The +mirror loop in `tracecraft session mirror` is harness-agnostic. + +Adding a new harness should be a single file under this package plus one entry +in `REGISTRY` below. +""" + +from .base import Harness, Session +from .claude_code import ClaudeCodeHarness +from .codex import CodexHarness +from .hermes import HermesHarness +from .openclaw import OpenClawHarness + +REGISTRY: dict[str, type[Harness]] = { + ClaudeCodeHarness.name: ClaudeCodeHarness, + CodexHarness.name: CodexHarness, + OpenClawHarness.name: OpenClawHarness, + HermesHarness.name: HermesHarness, +} + + +def get_harness(name: str) -> Harness: + if name not in REGISTRY: + known = ", ".join(sorted(REGISTRY)) or "(none registered)" + raise ValueError(f"unknown harness '{name}'. Known: {known}") + return REGISTRY[name]() + + +__all__ = [ + "Harness", + "Session", + "ClaudeCodeHarness", + "CodexHarness", + "OpenClawHarness", + "HermesHarness", + "REGISTRY", + "get_harness", +] diff --git a/sdk/tracecraft/harness/base.py b/sdk/tracecraft/harness/base.py new file mode 100644 index 0000000..db4a444 --- /dev/null +++ b/sdk/tracecraft/harness/base.py @@ -0,0 +1,68 @@ +"""Harness protocol — the only contract a new coding-agent adapter needs to meet.""" + +from __future__ import annotations + +from dataclasses import dataclass +from pathlib import Path +from typing import Protocol, runtime_checkable + + +@dataclass(frozen=True) +class Session: + """A discovered session: where it lives and what we call it.""" + + path: Path + session_id: str + cwd: Path | None = None # the project dir this session ran in, if knowable + + +@runtime_checkable +class Harness(Protocol): + """Minimum surface a harness adapter must expose. + + Concrete harnesses are instantiated with no arguments; per-call context + (cwd, session id, byte offset) is passed in. State belongs to the mirror + loop, not the harness — adapters stay stateless and easy to test. + """ + + name: str + + def discover(self, cwd: Path) -> list[Session]: + """Return every session this harness knows about for the given cwd. + + Implementations should be cheap to call repeatedly (the mirror loop + polls); avoid network and avoid loading file contents. + """ + ... + + def active_session(self, cwd: Path) -> Session | None: + """Return the most recently active session for cwd, or None.""" + ... + + def read_new(self, session: Session, cursor: int) -> tuple[bytes, int]: + """Return (new_jsonl_bytes, new_cursor) for everything after `cursor`. + + `cursor` is an opaque per-harness position. For file-backed harnesses + it's a byte offset and the returned cursor is `offset + len(bytes)`. + For SQLite (Hermes) it's a rowid and the returned cursor is the max + rowid read; the bytes are synthesized JSONL of the new rows. + + Returning the new cursor alongside the bytes makes advancement + race-free: the loop advances to exactly what was consumed, never to a + separately-sampled `size()` that may have moved between calls. + """ + ... + + def read_new_bytes(self, session: Session, offset: int) -> bytes: + """Back-compat: bytes-only view of read_new(). Prefer read_new().""" + ... + + def size(self, session: Session) -> int: + """Return the current end-of-stream cursor for `session`. + + Used only for the cheap "is there anything new?" pre-check. The + authoritative advancement comes from read_new()'s returned cursor. + For file-backed harnesses this is `path.stat().st_size`; for SQLite + it's the current max rowid. + """ + ... diff --git a/sdk/tracecraft/harness/claude_code.py b/sdk/tracecraft/harness/claude_code.py new file mode 100644 index 0000000..8d8f11e --- /dev/null +++ b/sdk/tracecraft/harness/claude_code.py @@ -0,0 +1,65 @@ +"""Claude Code adapter. + +Claude Code persists every session under + ~/.claude/projects//.jsonl + +`` replaces path separators with hyphens and prefixes a leading +hyphen, e.g. `/Users/x/proj` -> `-Users-x-proj`. We mirror that encoding here +so we can find the right project directory for the user's current cwd. +""" + +from __future__ import annotations + +import os +from pathlib import Path + +from .base import Session + + +def _encode_cwd(cwd: Path) -> str: + """Encode an absolute path the way Claude Code does for its projects dir. + + Claude Code uses the resolved absolute path with `/` swapped for `-`, + keeping the leading separator's effect (so `/foo/bar` -> `-foo-bar`). + """ + resolved = cwd.expanduser().resolve() + return str(resolved).replace(os.sep, "-") + + +class ClaudeCodeHarness: + name = "claude-code" + + def __init__(self, root: Path | None = None) -> None: + self.root = root or (Path.home() / ".claude" / "projects") + + def _project_dir(self, cwd: Path) -> Path: + return self.root / _encode_cwd(cwd) + + def discover(self, cwd: Path) -> list[Session]: + pdir = self._project_dir(cwd) + if not pdir.is_dir(): + return [] + sessions: list[Session] = [] + for jsonl in pdir.glob("*.jsonl"): + sessions.append(Session(path=jsonl, session_id=jsonl.stem, cwd=cwd)) + return sessions + + def active_session(self, cwd: Path) -> Session | None: + sessions = self.discover(cwd) + if not sessions: + return None + return max(sessions, key=lambda s: s.path.stat().st_mtime) + + def read_new(self, session: Session, cursor: int) -> tuple[bytes, int]: + data = self.read_new_bytes(session, cursor) + return data, cursor + len(data) + + def read_new_bytes(self, session: Session, offset: int) -> bytes: + if offset < 0: + raise ValueError(f"offset must be non-negative, got {offset}") + with open(session.path, "rb") as f: + f.seek(offset) + return f.read() + + def size(self, session: Session) -> int: + return session.path.stat().st_size diff --git a/sdk/tracecraft/harness/codex.py b/sdk/tracecraft/harness/codex.py new file mode 100644 index 0000000..dc89c22 --- /dev/null +++ b/sdk/tracecraft/harness/codex.py @@ -0,0 +1,63 @@ +"""Codex CLI adapter. + +Codex writes session rollouts under + ~/.codex/sessions///
/rollout--.jsonl + +Codex doesn't shard by cwd, so `discover` walks the whole sessions tree +(scoped to the most recent few days for performance) and returns every +rollout. The mirror loop is responsible for picking which to follow. +""" + +from __future__ import annotations + +import re +from pathlib import Path + +from .base import Session + + +_ROLLOUT_RE = re.compile(r"rollout-\d{4}-\d{2}-\d{2}T\d{2}-\d{2}-\d{2}-(?P[A-Za-z0-9_-]+)\.jsonl$") + + +class CodexHarness: + name = "codex" + + def __init__(self, root: Path | None = None) -> None: + self.root = root or (Path.home() / ".codex" / "sessions") + + def _all_rollouts(self) -> list[Path]: + if not self.root.is_dir(): + return [] + # YYYY/MM/DD/rollout-*.jsonl + return list(self.root.glob("*/*/*/rollout-*.jsonl")) + + def discover(self, cwd: Path) -> list[Session]: + # Codex sessions are not partitioned by cwd; return everything we see. + # The mirror loop / caller decides which session to actually follow. + del cwd + sessions: list[Session] = [] + for path in self._all_rollouts(): + m = _ROLLOUT_RE.search(path.name) + session_id = m.group("id") if m else path.stem + sessions.append(Session(path=path, session_id=session_id)) + return sessions + + def active_session(self, cwd: Path) -> Session | None: + sessions = self.discover(cwd) + if not sessions: + return None + return max(sessions, key=lambda s: s.path.stat().st_mtime) + + def read_new(self, session: Session, cursor: int) -> tuple[bytes, int]: + data = self.read_new_bytes(session, cursor) + return data, cursor + len(data) + + def read_new_bytes(self, session: Session, offset: int) -> bytes: + if offset < 0: + raise ValueError(f"offset must be non-negative, got {offset}") + with open(session.path, "rb") as f: + f.seek(offset) + return f.read() + + def size(self, session: Session) -> int: + return session.path.stat().st_size diff --git a/sdk/tracecraft/harness/hermes.py b/sdk/tracecraft/harness/hermes.py new file mode 100644 index 0000000..e78a615 --- /dev/null +++ b/sdk/tracecraft/harness/hermes.py @@ -0,0 +1,164 @@ +"""Hermes Agent adapter (Nous Research). + +Hermes moved off per-session JSONL to a single SQLite database: + ~/.hermes/state.db (or $HERMES_HOME/state.db), WAL mode + +So this adapter does NOT tail a file. It opens the DB read-only and reads new +rows from the `messages` table, synthesizing one JSON line per message. The +mirror loop treats the synthesized bytes exactly like a file tail. + +Cursor semantics (the reason base.Harness decoupled cursor from byte count): + cursor == the highest `messages.id` already mirrored. + messages.id is INTEGER PRIMARY KEY AUTOINCREMENT — strictly increasing, + never reused even after Hermes prunes old sessions, so it's a safe + high-water mark. We read `WHERE session_id=? AND id>:cursor ORDER BY id`. + +Verified against hermes_state.py (github.com/NousResearch/hermes-agent) May 2026: + - sessions(id TEXT PK, source, model, started_at REAL, ended_at, title, ...) + - messages(id INTEGER PK AUTOINCREMENT, session_id TEXT FK, role, content, + tool_calls, tool_name, timestamp REAL, token_count, ...) + - content may be sentinel-prefixed '\\x00json:' for multimodal payloads. + - schema_version table; columns can be added across versions, so we read + whatever columns exist rather than a hardcoded list. + +Safety: open with mode=ro (NOT immutable — the DB is live). A short +busy_timeout rides out the brief moments Hermes holds the write lock. We never +write or checkpoint. +""" + +from __future__ import annotations + +import json +import os +import sqlite3 +from pathlib import Path + +from .base import Session + +# Sentinel Hermes uses to mark a JSON-encoded (multimodal) content payload. +_CONTENT_JSON_PREFIX = "\x00json:" +_BUSY_TIMEOUT_MS = 4000 + + +def _resolve_db_path() -> Path: + home = os.environ.get("HERMES_HOME") + base = Path(home) if home else (Path.home() / ".hermes") + return base / "state.db" + + +def _connect_ro(db_path: Path) -> sqlite3.Connection: + """Read-only connection safe to use against a live WAL database.""" + uri = f"file:{db_path}?mode=ro" + conn = sqlite3.connect(uri, uri=True, timeout=_BUSY_TIMEOUT_MS / 1000) + conn.row_factory = sqlite3.Row + conn.execute(f"PRAGMA busy_timeout={_BUSY_TIMEOUT_MS}") + return conn + + +def _decode_content(value): + """Hermes stores multimodal content as '\\x00json:'; scalars as-is.""" + if isinstance(value, str) and value.startswith(_CONTENT_JSON_PREFIX): + try: + return json.loads(value[len(_CONTENT_JSON_PREFIX):]) + except json.JSONDecodeError: + return value + return value + + +class HermesHarness: + name = "hermes" + + def __init__(self, db_path: Path | None = None) -> None: + self.db_path = db_path or _resolve_db_path() + + # ---- discovery ---- + + def discover(self, cwd: Path) -> list[Session]: + # Hermes sessions aren't keyed by cwd. We surface every session row; + # session.path is the DB (shared by all sessions), session_id is the + # sessions.id TEXT value. + del cwd + if not self.db_path.exists(): + return [] + conn = _connect_ro(self.db_path) + try: + rows = conn.execute( + "SELECT id FROM sessions ORDER BY started_at DESC" + ).fetchall() + except sqlite3.Error: + return [] + finally: + conn.close() + return [Session(path=self.db_path, session_id=r["id"]) for r in rows] + + def active_session(self, cwd: Path) -> Session | None: + if not self.db_path.exists(): + return None + conn = _connect_ro(self.db_path) + try: + row = conn.execute( + # Most-recently-active = the session owning the highest message id. + "SELECT session_id FROM messages ORDER BY id DESC LIMIT 1" + ).fetchone() + except sqlite3.Error: + row = None + finally: + conn.close() + if not row: + # Fall back to newest session even if it has no messages yet. + sessions = self.discover(cwd) + return sessions[0] if sessions else None + return Session(path=self.db_path, session_id=row["session_id"]) + + # ---- read ---- + + def size(self, session: Session) -> int: + """Current max messages.id for this session — the cursor high-water.""" + if not self.db_path.exists(): + return 0 + conn = _connect_ro(self.db_path) + try: + row = conn.execute( + "SELECT MAX(id) AS m FROM messages WHERE session_id = ?", + (session.session_id,), + ).fetchone() + except sqlite3.Error: + return 0 + finally: + conn.close() + return int(row["m"]) if row and row["m"] is not None else 0 + + def read_new(self, session: Session, cursor: int) -> tuple[bytes, int]: + """Synthesize JSONL for messages with id > cursor; return (bytes, new_cursor).""" + if cursor < 0: + raise ValueError(f"cursor must be non-negative, got {cursor}") + if not self.db_path.exists(): + return b"", cursor + conn = _connect_ro(self.db_path) + try: + rows = conn.execute( + "SELECT * FROM messages WHERE session_id = ? AND id > ? ORDER BY id ASC", + (session.session_id, cursor), + ).fetchall() + except sqlite3.Error: + return b"", cursor + finally: + conn.close() + + lines: list[str] = [] + max_id = cursor + for r in rows: + d = dict(r) + if "content" in d: + d["content"] = _decode_content(d["content"]) + # tool_calls / reasoning_details are JSON strings; leave as-is — + # consumers can parse. We just emit the row faithfully. + lines.append(json.dumps(d, default=str, ensure_ascii=False)) + if d.get("id") is not None: + max_id = max(max_id, int(d["id"])) + + blob = ("\n".join(lines) + "\n").encode("utf-8") if lines else b"" + return blob, max_id + + def read_new_bytes(self, session: Session, offset: int) -> bytes: + return self.read_new(session, offset)[0] diff --git a/sdk/tracecraft/harness/openclaw.py b/sdk/tracecraft/harness/openclaw.py new file mode 100644 index 0000000..545acd3 --- /dev/null +++ b/sdk/tracecraft/harness/openclaw.py @@ -0,0 +1,96 @@ +"""OpenClaw adapter. + +OpenClaw persists session transcripts as append-only JSONL under + /agents//sessions/.jsonl + +where resolves (highest precedence first): + OPENCLAW_STATE_DIR → OPENCLAW_HOME → ~/.openclaw +(--dev and --profile map to ~/.openclaw-dev / ~/.openclaw-; a +caller using those can pass root= explicitly.) + +Verified against OpenClaw source (src/config/sessions/paths.ts) May 2026. + +Files in the sessions dir that are NOT transcripts and must be skipped: + - sessions.json mutable session index, rewritten atomically + - *.tmp half-written atomic-store staging files + +Topic sessions are named -topic-.jsonl and compaction +successors .checkpoint..jsonl — both are real transcripts +and we surface them as-is. Session ids are only unique within an agentId, so +the stable key we expose is /. +""" + +from __future__ import annotations + +import os +from pathlib import Path + +from .base import Session + + +def _resolve_state_dir() -> Path: + """OpenClaw state dir, honoring its env-var precedence.""" + if os.environ.get("OPENCLAW_STATE_DIR"): + return Path(os.environ["OPENCLAW_STATE_DIR"]) + if os.environ.get("OPENCLAW_HOME"): + return Path(os.environ["OPENCLAW_HOME"]) + return Path.home() / ".openclaw" + + +class OpenClawHarness: + name = "openclaw" + + def __init__(self, root: Path | None = None) -> None: + # `root` is the agents dir. Default derives from the active state dir. + self.root = root or (_resolve_state_dir() / "agents") + + def _stable_id(self, path: Path) -> str: + """__ — agentId is the dir between 'agents/' and 'sessions/'. + + Joined with '__' (not '/') so the id is safe as a single bucket-key + path segment; OpenClaw sessionIds are only unique within an agentId, + so the agentId prefix disambiguates across agents. + """ + stem = path.stem # filename without .jsonl + # path = //sessions/.jsonl + try: + agent_id = path.parent.parent.name + except Exception: + agent_id = "unknown" + return f"{agent_id}__{stem}" + + def _all_sessions(self) -> list[Path]: + if not self.root.is_dir(): + return [] + out: list[Path] = [] + for p in self.root.glob("*/sessions/*.jsonl"): + name = p.name + if name == "sessions.json" or name.endswith(".tmp"): + continue + out.append(p) + return out + + def discover(self, cwd: Path) -> list[Session]: + # OpenClaw shards by agentId, not cwd — cwd is ignored. + del cwd + return [Session(path=p, session_id=self._stable_id(p)) for p in self._all_sessions()] + + def active_session(self, cwd: Path) -> Session | None: + sessions = self.discover(cwd) + if not sessions: + return None + return max(sessions, key=lambda s: s.path.stat().st_mtime) + + def read_new(self, session: Session, cursor: int) -> tuple[bytes, int]: + data = self.read_new_bytes(session, cursor) + return data, cursor + len(data) + + def read_new_bytes(self, session: Session, offset: int) -> bytes: + if offset < 0: + raise ValueError(f"offset must be non-negative, got {offset}") + with open(session.path, "rb") as f: + f.seek(offset) + return f.read() + + def size(self, session: Session) -> int: + return session.path.stat().st_size diff --git a/sdk/tracecraft/redact.py b/sdk/tracecraft/redact.py new file mode 100644 index 0000000..c670a05 --- /dev/null +++ b/sdk/tracecraft/redact.py @@ -0,0 +1,52 @@ +"""Redaction v0 — regex denylist applied before bytes leave the machine. + +Goal: catch the obvious shapes of credentials and tokens in trace data so that +users mirroring sessions to a bucket don't accidentally publish keys. This is +NOT a real DLP system — it cannot catch arbitrary secrets, custom internal +token formats, or business-logic data. It catches well-known token shapes. + +Every redaction is *counted*, never silent. Counts go into meta.json so users +can audit what was scrubbed. +""" + +from __future__ import annotations + +import re +from typing import Final + +# Each (name, pattern) — name is what shows up in meta.json's redaction counter. +# Patterns intentionally on the strict side: prefer false-negative over false-positive +# (we'd rather miss a token than mangle source code that happens to look like one). +_PATTERNS: Final[list[tuple[str, re.Pattern[bytes]]]] = [ + ("aws_access_key", re.compile(rb"AKIA[0-9A-Z]{16}")), + ("aws_session_token", re.compile(rb"ASIA[0-9A-Z]{16}")), + ("anthropic_key", re.compile(rb"sk-ant-[A-Za-z0-9_-]{20,}")), + ("openai_key", re.compile(rb"sk-(?:proj-|svcacct-)?[A-Za-z0-9]{20,}")), + ("hf_token", re.compile(rb"hf_[A-Za-z0-9]{30,}")), + ("github_pat", re.compile(rb"gh[pousr]_[A-Za-z0-9]{30,}")), + ("slack_token", re.compile(rb"xox[abprs]-[A-Za-z0-9-]{10,}")), + ("bearer_token", re.compile(rb"Bearer\s+[A-Za-z0-9_.\-]{20,}")), +] + + +def redact(blob: bytes) -> tuple[bytes, dict[str, int]]: + """Return (redacted_bytes, counts). + + counts maps pattern_name -> number of replacements made. Patterns not + matched are absent from the dict (no zero entries). + """ + counts: dict[str, int] = {} + out = blob + for name, pat in _PATTERNS: + out, n = pat.subn(f"[REDACTED:{name}]".encode(), out) + if n: + counts[name] = n + return out, counts + + +def merge_counts(a: dict[str, int], b: dict[str, int]) -> dict[str, int]: + """Sum two redaction-count dicts. Used to accumulate across parts in meta.json.""" + out = dict(a) + for k, v in b.items(): + out[k] = out.get(k, 0) + v + return out From c020d3e076b794b08c5a7281de147ca18eb0e2fa Mon Sep 17 00:00:00 2001 From: arrmlet Date: Fri, 29 May 2026 03:22:59 +0300 Subject: [PATCH 2/5] Add 'tracecraft session' CLI: mirror, list, show, stop mirror tails a harness session into /sessions/// as disjoint part-NNNNN-.jsonl files plus a cumulative meta.json. --harness choices are driven by the harness REGISTRY so new adapters auto-extend the CLI. Redaction on by default; --no-redact for trusted buckets. Cursor stored per-session; next part seq derived from bucket LIST so losing local state is non-destructive. Co-Authored-By: Claude Opus 4.8 (1M context) --- sdk/tracecraft/cli/__init__.py | 3 + sdk/tracecraft/cli/session.py | 387 +++++++++++++++++++++++++++++++++ 2 files changed, 390 insertions(+) create mode 100644 sdk/tracecraft/cli/session.py diff --git a/sdk/tracecraft/cli/__init__.py b/sdk/tracecraft/cli/__init__.py index 1725927..6cdf14f 100644 --- a/sdk/tracecraft/cli/__init__.py +++ b/sdk/tracecraft/cli/__init__.py @@ -9,6 +9,7 @@ from tracecraft.cli.messages import send, inbox from tracecraft.cli.steps import claim, complete, step_status, wait_for from tracecraft.cli.artifacts import artifact +from tracecraft.cli.session import session as session_group BANNER = """ \033[36m _ __ _ @@ -39,6 +40,7 @@ def cli(ctx): click.echo(" step-status Check step progress") click.echo(" wait-for Block until steps complete") click.echo(" artifact Share files (upload/download/list)") + click.echo(" session Mirror coding-agent traces (mirror/list/show/stop)") click.echo() click.echo(" \033[2mRun 'tracecraft --help' for details.\033[0m") click.echo() @@ -54,6 +56,7 @@ def cli(ctx): cli.add_command(step_status, "step-status") cli.add_command(wait_for, "wait-for") cli.add_command(artifact) +cli.add_command(session_group) def main(): diff --git a/sdk/tracecraft/cli/session.py b/sdk/tracecraft/cli/session.py new file mode 100644 index 0000000..978dd19 --- /dev/null +++ b/sdk/tracecraft/cli/session.py @@ -0,0 +1,387 @@ +"""`tracecraft session` — mirror, list, show, stop. + +Commands: + mirror Pull new bytes from a harness session into the bucket (one-shot). + list Browse sessions in the bucket. + show Inspect one session's meta + tail. + stop Clear local state for a session (placeholder; no daemon yet). + +Bucket layout (additive — does not touch existing tracecraft keys): + + /sessions/// + part-NNNNN-.jsonl ← one per mirror flush, append-disjoint + meta.json ← cumulative metadata + redaction counts + +State files live under ~/.tracecraft/mirror-state/.json and store the +byte offset into the source JSONL. Next-seq is derived from a bucket LIST +on every call, so losing the state file is recoverable. +""" + +from __future__ import annotations + +import json +import os +import re +import tempfile +import uuid +from datetime import datetime, timezone +from pathlib import Path + +import click + +from tracecraft.harness import REGISTRY, get_harness +from tracecraft.redact import merge_counts, redact +from tracecraft.store import get_store + + +# Driven by the harness REGISTRY so adding an adapter auto-extends the CLI. +HARNESS_CHOICES = sorted(REGISTRY) +STATE_DIR = Path.home() / ".tracecraft" / "mirror-state" +PART_RE = re.compile(r"part-(\d{5})-[a-f0-9]{8}\.jsonl$") + + +# ---------- helpers ---------- + + +def _state_path(session_id: str) -> Path: + return STATE_DIR / f"{session_id}.json" + + +def _load_state(session_id: str) -> dict: + p = _state_path(session_id) + if not p.exists(): + return {} + try: + return json.loads(p.read_text()) + except json.JSONDecodeError: + # Corrupt state file — treat as missing rather than crash. + return {} + + +def _save_state(session_id: str, state: dict) -> None: + STATE_DIR.mkdir(parents=True, exist_ok=True) + _state_path(session_id).write_text(json.dumps(state, indent=2)) + + +def _session_prefix(harness_name: str, session_id: str) -> str: + return f"sessions/{harness_name}/{session_id}/" + + +def _next_seq_for(store, harness_name: str, session_id: str) -> int: + """Find the next unused part-NNNNN seq by listing the bucket.""" + prefix = _session_prefix(harness_name, session_id) + keys = store.list_keys(prefix) + seqs: list[int] = [] + for k in keys: + name = k.rsplit("/", 1)[-1] + m = PART_RE.match(name) + if m: + seqs.append(int(m.group(1))) + return (max(seqs) + 1) if seqs else 0 + + +def _now_iso() -> str: + return datetime.now(timezone.utc).isoformat() + + +# ---------- group ---------- + + +@click.group() +def session(): + """Mirror, browse, and inspect coding-agent sessions.""" + + +# ---------- mirror ---------- + + +@session.command("mirror") +@click.option( + "--harness", + "harness_name", + required=True, + type=click.Choice(HARNESS_CHOICES), + help="Which coding agent's session format to read.", +) +@click.option( + "--session-id", + default=None, + help="Explicit session id. If omitted, picks the most recently modified session for --cwd.", +) +@click.option( + "--cwd", + "cwd_str", + default=None, + help="Project directory the session ran in (claude-code only). Defaults to $PWD.", +) +@click.option("--once", "once", is_flag=True, default=True, help="Single-shot mode (currently the only mode).") +@click.option("--no-redact", is_flag=True, help="Skip redaction. Use only on fully-trusted buckets.") +@click.option( + "--min-bytes", + default=1, + type=int, + show_default=True, + help="Skip upload if fewer than this many new bytes are available.", +) +def mirror(harness_name, session_id, cwd_str, once, no_redact, min_bytes): + """Pull new bytes from a harness session into the bucket (one-shot). + + Reads from the last known byte offset (or 0 on first run), applies regex + redaction unless --no-redact, uploads the chunk as a new part object, and + updates the session's meta.json. Idempotent and safe to re-run on a cron. + """ + del once # only mode for now; flag reserved for future daemon mode + store, cfg = get_store() + harness = get_harness(harness_name) + cwd = Path(cwd_str).expanduser().resolve() if cwd_str else Path.cwd() + + # 1. Find the session + if session_id: + candidates = [s for s in harness.discover(cwd) if s.session_id == session_id] + sess = candidates[0] if candidates else None + else: + sess = harness.active_session(cwd) + + if sess is None: + raise click.ClickException( + f"No {harness_name} session found" + + (f" for id={session_id}" if session_id else f" in cwd={cwd}") + ) + + state = _load_state(sess.session_id) + # `cursor` is an opaque per-harness position: a byte offset for file-backed + # harnesses (claude-code, codex, openclaw), a rowid for SQLite (hermes). + # The mirror loop never assumes it equals a byte count. + cursor = state.get("cursor", state.get("byte_offset", 0)) # byte_offset: back-compat + + # Cheap pre-check: is there plausibly anything new? size() is sampled, not + # authoritative — read_new() returns the real consumed cursor below. + if harness.size(sess) - cursor < min_bytes: + click.echo( + f"nothing new: session={sess.session_id} cursor={cursor:,} size={harness.size(sess):,}" + ) + return + + # 2. Read everything new since `cursor`, race-free: read_new returns the + # bytes AND the exact cursor we consumed up to. For SQLite the bytes are + # synthesized JSONL of new rows; raw_len is byte length, not a cursor delta. + chunk, next_cursor = harness.read_new(sess, cursor) + raw_len = len(chunk) + + # 3. Redact (default on) + if no_redact: + out_bytes, counts = chunk, {} + else: + out_bytes, counts = redact(chunk) + + # 4. Upload as next part + seq = _next_seq_for(store, harness_name, sess.session_id) + uniq = uuid.uuid4().hex[:8] + part_key = f"{_session_prefix(harness_name, sess.session_id)}part-{seq:05d}-{uniq}.jsonl" + + with tempfile.NamedTemporaryFile(delete=False, suffix=".jsonl") as tf: + tf.write(out_bytes) + tf_path = tf.name + try: + store.put_file(part_key, tf_path) + finally: + try: + os.unlink(tf_path) + except OSError: + pass + + # 5. Update meta.json (cumulative) + meta_key = f"{_session_prefix(harness_name, sess.session_id)}meta.json" + existing = store.get_json(meta_key) or {} + parts_log = existing.get("parts", []) + parts_log.append( + { + "seq": seq, + "uuid": uniq, + "cursor_range": [cursor, next_cursor], + "source_bytes": raw_len, + "uploaded_bytes": len(out_bytes), + "redactions": counts, + "uploaded_at": _now_iso(), + } + ) + meta = { + "schema_version": 1, + "harness": harness_name, + "session_id": sess.session_id, + "source_path": str(sess.path), + "cwd": str(sess.cwd) if sess.cwd else None, + "agent_id": cfg.get("agent_id"), + "started_at": existing.get("started_at", _now_iso()), + "last_uploaded_at": _now_iso(), + "ended_at": existing.get("ended_at"), + "total_source_bytes": existing.get("total_source_bytes", 0) + raw_len, + "total_uploaded_bytes": existing.get("total_uploaded_bytes", 0) + len(out_bytes), + "redaction_counts": merge_counts(existing.get("redaction_counts", {}), counts), + "parts": parts_log, + } + store.put_json(meta_key, meta) + + # 6. Persist local state. Advance the cursor to the position we read up to + # (next_cursor), NOT cursor+raw_len — those differ for SQLite where the + # cursor is a rowid and raw_len is synthesized-JSONL byte length. + _save_state( + sess.session_id, + { + "harness": harness_name, + "session_id": sess.session_id, + "source_path": str(sess.path), + "cursor": next_cursor, + "last_uploaded_seq": seq, + "last_flush_at": _now_iso(), + }, + ) + + click.echo( + f"uploaded part-{seq:05d}-{uniq} " + f"source={raw_len:,}B upload={len(out_bytes):,}B " + f"redactions={counts or 'none'}" + ) + + +# ---------- list ---------- + + +@session.command("list") +@click.option("--harness", "harness_filter", default=None, help="Filter by harness name.") +@click.option("--limit", default=20, type=int, show_default=True, help="Max sessions to show.") +@click.option( + "--sort-by", + type=click.Choice(["recent", "size"]), + default="recent", + show_default=True, +) +def list_(harness_filter, limit, sort_by): + """List sessions in the bucket.""" + store, _ = get_store() + keys = store.list_keys("sessions/") + metas: list[dict] = [] + seen: set[str] = set() + for k in keys: + if not k.endswith("/meta.json"): + continue + if k in seen: + continue + seen.add(k) + meta = store.get_json(k) + if not meta: + continue + if harness_filter and meta.get("harness") != harness_filter: + continue + metas.append(meta) + + if sort_by == "recent": + metas.sort(key=lambda m: m.get("last_uploaded_at", ""), reverse=True) + else: # size + metas.sort(key=lambda m: m.get("total_uploaded_bytes", 0), reverse=True) + + metas = metas[:limit] + if not metas: + click.echo("(no sessions)") + return + + click.echo(f"{'HARNESS':<14} {'SESSION':<16} {'BYTES':>12} {'PARTS':>6} {'LAST UPLOAD':<25}") + click.echo("-" * 80) + for m in metas: + sid = m.get("session_id", "?") + short = sid[:8] + ("…" if len(sid) > 8 else "") + click.echo( + f"{m.get('harness','?'):<14} {short:<16} " + f"{m.get('total_uploaded_bytes',0):>12,} " + f"{len(m.get('parts', [])):>6} " + f"{m.get('last_uploaded_at','-')[:24]:<25}" + ) + + +# ---------- show ---------- + + +@session.command("show") +@click.argument("session_id") +@click.option( + "--tail", + default=0, + type=int, + help="If >0, also fetch parts and print the last N lines.", +) +def show(session_id, tail): + """Inspect one session's meta + optionally tail its parts.""" + store, _ = get_store() + + # Find which harness this session lives under (search every harness folder). + all_meta_keys = [k for k in store.list_keys("sessions/") if k.endswith(f"/{session_id}/meta.json")] + if not all_meta_keys: + raise click.ClickException(f"session not found: {session_id}") + meta_key = all_meta_keys[0] + meta = store.get_json(meta_key) + click.echo(json.dumps(meta, indent=2)) + + if tail <= 0: + return + + # Fetch all parts (in seq order), concatenate, print last N lines. + prefix = meta_key[: -len("meta.json")] + part_keys = sorted( + k for k in store.list_keys(prefix) if PART_RE.search(k.rsplit("/", 1)[-1]) + ) + body = bytearray() + for k in part_keys: + with tempfile.NamedTemporaryFile(delete=False) as tf: + tmp = tf.name + try: + store.get_file(k, tmp) + body.extend(Path(tmp).read_bytes()) + finally: + try: + os.unlink(tmp) + except OSError: + pass + + lines = body.splitlines() + click.echo("\n--- tail ---") + for line in lines[-tail:]: + try: + click.echo(line.decode("utf-8", errors="replace")) + except Exception: + click.echo(repr(line)) + + +# ---------- stop ---------- + + +@session.command("stop") +@click.argument("session_id") +def stop(session_id): + """Clear local mirror state for a session and mark ended_at in meta. + + This is a placeholder: when --detach lands later, this command will also + kill the background mirror process. For now it just resets local state + and records the end time. + """ + state_file = _state_path(session_id) + had_state = state_file.exists() + if had_state: + state_file.unlink() + + # Best-effort: mark ended_at in meta if a meta exists. + store, _ = get_store() + meta_keys = [ + k for k in store.list_keys("sessions/") if k.endswith(f"/{session_id}/meta.json") + ] + marked = False + if meta_keys: + meta = store.get_json(meta_keys[0]) or {} + if meta and not meta.get("ended_at"): + meta["ended_at"] = _now_iso() + store.put_json(meta_keys[0], meta) + marked = True + + click.echo( + f"stopped session={session_id} " + f"state_cleared={had_state} meta_marked_ended={marked}" + ) From 48159375199a8ae9f7661cdd67f63aab3eff48ff Mon Sep 17 00:00:00 2001 From: arrmlet Date: Fri, 29 May 2026 03:23:07 +0300 Subject: [PATCH 3/5] Tests + docs for session mirror; bump 0.2.0 57 tests total (moto for S3, tmp_path for harness roots, sqlite for hermes). docs/session-mirror.md covers all four harnesses, the cursor model, the bucket layout, and redaction. README gains a session-mirror section and CLI entries. plans/TRACES_V1_PLAN.md records the scope. Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 34 ++ docs/session-mirror.md | 126 ++++++++ plans/TRACES_V1_PLAN.md | 435 ++++++++++++++++++++++++++ sdk/pyproject.toml | 2 +- sdk/tests/test_harness.py | 566 ++++++++++++++++++++++++++++++++++ sdk/tests/test_session_cli.py | 315 +++++++++++++++++++ sdk/tracecraft/__init__.py | 2 +- 7 files changed, 1478 insertions(+), 2 deletions(-) create mode 100644 docs/session-mirror.md create mode 100644 plans/TRACES_V1_PLAN.md create mode 100644 sdk/tests/test_harness.py create mode 100644 sdk/tests/test_session_cli.py diff --git a/README.md b/README.md index 71d2d7a..7821f9f 100644 --- a/README.md +++ b/README.md @@ -93,6 +93,33 @@ Works with any process that can call a CLI — Claude Code, OpenClaw, Hermes Age --- +## Session mirror + +Mirror a coding agent's full session transcript into the same bucket as your +coordination state — so one bucket holds every agent's reasoning **and** the +messages between them. + +```bash +tracecraft session mirror --harness claude-code # tail this session into the bucket +tracecraft session list # browse mirrored sessions +tracecraft session show --tail 50 # peek at a transcript +``` + +Four harnesses, one read-only interface: + +| `--harness` | Source | Storage | +|---|---|---| +| `claude-code` | `~/.claude/projects/.../.jsonl` | JSONL tail | +| `codex` | `~/.codex/sessions/.../rollout-*.jsonl` | JSONL tail | +| `openclaw` | `/agents//sessions/*.jsonl` | JSONL tail | +| `hermes` | `~/.hermes/state.db` | SQLite (read-only) | + +Sessions are never modified at the source. Redaction (AWS/Anthropic/OpenAI/HF/ +GitHub/Slack token shapes) runs on by default and is counted in `meta.json`. +Full reference: [docs/session-mirror.md](docs/session-mirror.md). + +--- + ## Storage backends No vendor lock-in. Bring your own S3: @@ -126,6 +153,8 @@ s3://bucket/project/ steps/design/status.json ← pending → in_progress → complete steps/design/handoff.json ← notes for the next agent artifacts/design/mockup.html ← shared files + sessions/claude-code//part-00000-.jsonl ← mirrored agent transcript + sessions/claude-code//meta.json ← cumulative session metadata ``` Any agent that can call `tracecraft` can participate. Any S3 browser (MinIO console, AWS console, HuggingFace Hub) lets you watch agents coordinate in real-time. @@ -155,6 +184,11 @@ tracecraft wait-for # Block until complete (default 300s t tracecraft artifact upload [--step id] # Share a file tracecraft artifact download [--step id] # Get a file tracecraft artifact list [--step id] # List files + +tracecraft session mirror --harness # Mirror a session into the bucket +tracecraft session list # Browse mirrored sessions +tracecraft session show [--tail N] # Inspect meta + transcript tail +tracecraft session stop # Clear local state, mark ended ``` For multiple agents in the same directory, set identity via env var: diff --git a/docs/session-mirror.md b/docs/session-mirror.md new file mode 100644 index 0000000..9837ab5 --- /dev/null +++ b/docs/session-mirror.md @@ -0,0 +1,126 @@ +# Session mirror + +`tracecraft session mirror` copies a coding agent's session transcript into your +bucket, alongside the coordination state (memory, messages, claims, artifacts) +that tracecraft already stores under the same `/` prefix. One bucket +ends up holding the full record of a multi-agent run: every agent's reasoning +**and** every message between them. + +Sessions are never modified at the source. The mirror is a read-only tail. + +## Supported harnesses + +| `--harness` | Source | Storage | +|---|---|---| +| `claude-code` | `~/.claude/projects//.jsonl` | append-only JSONL | +| `codex` | `~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl` | append-only JSONL | +| `openclaw` | `/agents//sessions/.jsonl` | append-only JSONL | +| `hermes` | `~/.hermes/state.db` (`messages` table) | SQLite (WAL) | + +All four expose the same interface to the mirror loop via the `Harness` +protocol (`sdk/tracecraft/harness/base.py`). Adding a fifth harness is one +file plus a `REGISTRY` entry. + +### Harness notes + +- **OpenClaw** state dir resolves `OPENCLAW_STATE_DIR` → `OPENCLAW_HOME` → + `~/.openclaw`. `--dev`/`--profile ` map to `~/.openclaw-dev` / + `~/.openclaw-` — point `OPENCLAW_STATE_DIR` at those if you use them. + The mutable `sessions.json` index and `*.tmp` staging files are skipped. + Session ids are unique only within an `agentId`, so the mirrored id is + `__`. +- **Hermes** is SQLite, not a file. The adapter opens the DB **read-only** + (`mode=ro`, never `immutable`) so it is safe to run while Hermes is writing — + WAL mode allows concurrent readers. It reads new rows with + `WHERE id > :cursor ORDER BY id` (the same incremental pattern Hermes uses + internally) and synthesizes one JSON line per message. Multimodal `content` + stored with Hermes' `\x00json:` sentinel is decoded back to JSON. + +## Commands + +```bash +tracecraft session mirror --harness [--session-id ID] [--cwd PATH] + [--no-redact] [--min-bytes N] +tracecraft session list [--harness NAME] [--limit N] [--sort-by recent|size] +tracecraft session show [--tail N] +tracecraft session stop +``` + +### mirror + +Single-shot. Reads everything new since the last run, redacts, uploads it as a +new part, updates `meta.json`, and advances the cursor. Safe to run repeatedly +(e.g. from a cron, a `SessionEnd` hook, or a `while sleep 5` loop). + +```bash +# Auto-pick the most recent claude-code session for the current directory +tracecraft session mirror --harness claude-code + +# Explicit session, codex +tracecraft session mirror --harness codex --session-id abc123 + +# Hermes (session id is the sessions.id TEXT value, e.g. 20260529_120000_abc123) +tracecraft session mirror --harness hermes --session-id 20260529_120000_abc123 +``` + +If `--session-id` is omitted, the most recently active session is chosen +(for Hermes, the session owning the highest message id). + +### list / show / stop + +```bash +tracecraft session list # every mirrored session +tracecraft session show # print meta.json +tracecraft session show --tail 50 # + last 50 lines of the transcript +tracecraft session stop # clear local state, mark ended_at +``` + +## Bucket layout + +Additive — does not touch existing coordination keys. + +``` +// + agents/ memory/ messages/ steps/ artifacts/ ← coordination + sessions/ + / + / + part-00000-.jsonl ← one per mirror flush, disjoint + part-00001-.jsonl + meta.json ← cumulative metadata + redaction counts +``` + +Parts are append-disjoint and reassemble byte-for-byte (file harnesses) or +row-for-row (Hermes). The `` suffix makes concurrent flushes from +different machines collision-safe; reassembly sorts by sequence number. + +## The cursor model + +The mirror tracks a per-session **cursor** in +`~/.tracecraft/mirror-state/.json`. The cursor is opaque: + +- file harnesses → a **byte offset** +- Hermes → the highest **`messages.id`** (an AUTOINCREMENT rowid) + +`read_new(session, cursor)` returns `(new_bytes, new_cursor)` so advancement is +race-free — the loop advances to exactly what it consumed, never to a +separately-sampled size. Losing the state file is non-destructive: the next run +re-derives the next part sequence number from a bucket LIST, and overlap is +re-uploaded as a fresh part rather than clobbering existing ones. + +## Redaction + +Redaction is **on by default** and runs before any bytes leave the machine. It +is a regex denylist (`sdk/tracecraft/redact.py`) covering AWS, Anthropic, +OpenAI, HuggingFace, GitHub, and Slack token shapes plus bearer tokens. Every +match is **counted** in `meta.json` (`redaction_counts`), never silently +dropped. + +```bash +tracecraft session mirror --harness claude-code # redaction on (default) +tracecraft session mirror --harness claude-code --no-redact # raw, trusted buckets only +``` + +Redaction v0 catches well-known token shapes. It does **not** detect arbitrary +secrets, custom internal token formats, or proprietary content. Treat it as a +safety net, not a guarantee — and prefer a private bucket for session data. diff --git a/plans/TRACES_V1_PLAN.md b/plans/TRACES_V1_PLAN.md new file mode 100644 index 0000000..86bb5da --- /dev/null +++ b/plans/TRACES_V1_PLAN.md @@ -0,0 +1,435 @@ +# traces-v1 — Session Mirror & Replay + +**Branch:** `traces-v1` +**Target release:** `0.2.0` +**Estimated effort:** 12–14 working days +**Status:** drafted 2026-05-20 + +--- + +## 1. Why this exists (the only thing that matters) + +The 2026-05 market scan (`plans/MARKET_REPORT_SESSIONS_2026_05.md`) found +session-mirroring is **commodity**: + +- Anthropic ships **SessionStore** (Claude-Code-native, opaque cloud) +- HuggingFace ships **Storage Buckets + Agent Trace Viewer** (HF-only) +- **DataClaw** (2.1k★) and **claude-sync** (119★) already mirror local JSONL + +So copying any of them is a waste. Tracecraft's session mirror only earns +its place if it does **three things none of them do**: + +1. **Cross-backend.** Any S3-compatible bucket (AWS, R2, MinIO, B2, Wasabi) + *and* HF Buckets. The user owns the data; we never see it. +2. **Sessions + coordination in one bucket.** Tracecraft already stores + memory / mailbox / claims / artifacts under `/`. Putting + harness sessions under the same `/sessions/` namespace + means one bucket holds the *entire* multi-agent history. +3. **Cross-harness replay.** Claude Code JSONL + Codex JSONL + tracecraft + coordination events merged into one timeline. This is the killer + demo: "watch four Claude Code agents coordinate, see each one's + reasoning, see the messages between them, in a single HTML." + +If at any point during implementation we feel pulled toward features +that don't serve those three goals, stop and re-read this section. + +--- + +## 2. Non-goals + +These look tempting and are deliberately excluded from `0.2.0`: + +- **Real-time UI.** Replay is a static HTML render of a finished bucket. + No live websocket, no dashboard server. +- **LLM-based redaction.** Regex denylist v0 only; LLM redaction is a + later-tier item once we know the false-positive rate. +- **Trace signing / SN13 submission.** That's `SN13_AGENT_TRACES_PITCH.md` + territory, separate 3-week de-risk plan. +- **Anthropic SessionStore integration.** Their API, their schema, + their lock-in. We mirror the local JSONL — that's the open path. +- **MCP server.** Already decided redundant given CLI + SKILL.md. +- **Cursor / Cline / Aider support.** Claude Code + Codex first. Others + follow only if there's demand and a JSONL-equivalent format. +- **TTL claims, heartbeat refresh, message-key collision.** These are + Tier 1 fixes from `RESEARCH_2026_05.md`. Bundle them in `0.2.1` if + traces-v1 didn't subsume the need. + +--- + +## 3. Scope: nine deliverables + +| # | Deliverable | Approx LoC | Days | +|---|-------------|-----------|------| +| D1 | `tracecraft session mirror` (Claude Code) | 150 | 2 | +| D2 | Claude Code plugin (`.claude-plugin/`) | 250 | 1 | +| D3 | Codex variant | 80 | 1 | +| D4 | `tracecraft session list / show` | 80 | 1 | +| D5 | `tracecraft replay` (the killer demo) | 350 | 2 | +| D6 | Redaction v0 (regex denylist) | 100 | 0.5 | +| D7 | Tests (moto + golden JSONL fixtures) | 250 | 1.5 | +| D8 | Docs (README + SKILL.md + plugin README) | — | 1 | +| D9 | Launch artifact (4-agent demo recording) | — | 1 | + +Total: ~1,260 LoC, 11 working days + 1 day slack. + +--- + +## 4. Bucket layout (additive — does not touch existing keys) + +``` +// + …existing keys (agents/, memory/, messages/, steps/, artifacts/)… + sessions/ + claude-code/ + .jsonl ← raw JSONL stream (append-only) + .meta.json ← cwd, started_at, ended_at, agent_id, + line_count, redacted_count, schema_version + codex/ + .jsonl + .meta.json + _index.json ← list of all sessions (rebuilt on each upload) +``` + +**Why a separate top-level `sessions/` instead of nesting under `agents/`:** +sessions belong to a *harness instance*, not always to a registered tracecraft +agent. A solo dev running Claude Code with no `tracecraft init agents/...` still +benefits from the mirror. Linking to an `agent_id` is optional metadata. + +--- + +## 5. D1 — `tracecraft session mirror` (the foundation) + +### Command +``` +tracecraft session mirror [--harness claude-code|codex] [--session-id ] + [--watch-dir ] [--batch-seconds 5] + [--once] [--detach] +``` + +### Behaviour +1. Auto-detect the active session if `--session-id` is omitted: + - **Claude Code:** glob `~/.claude/projects//*.jsonl`, + pick the one with the most recent `mtime`. + - **Codex:** glob `~/.codex/sessions///
/rollout-*.jsonl`, + same heuristic. +2. Tail the file (resume from byte offset stored in + `~/.tracecraft/mirror-.state`). +3. Every `--batch-seconds` (default 5), flush the new bytes to + `sessions//.jsonl` using + **multipart append via copy-then-put** (S3 has no native append; + we re-upload the growing object, see §5.3). +4. Update `.meta.json` on every flush. +5. Track PID in `~/.tracecraft/mirror.pid` (per-session, not global) so + the user can `tracecraft session stop ` cleanly. +6. `--detach` forks a background process (Unix `os.fork()`, + on Windows fall back to subprocess + log file). +7. `--once` does a single sync and exits (good for cron / hooks). + +### 5.1 Append strategy on S3 + +S3 has no `append`. Options considered: + +| Option | Pros | Cons | Verdict | +|--------|------|------|---------| +| Re-upload full file every batch | Trivial | Cost grows O(n²) for long sessions | ✗ | +| One object per batch (`..jsonl`) | Cheap, no read-back | Replay must list+merge | ✓ chosen | +| S3 multipart upload kept open | True append-ish | Multipart sessions abort on agent crash | ✗ | + +**Chosen:** one object per batch. Final layout: +``` +sessions/claude-code// + part-00000.jsonl + part-00001.jsonl + … + meta.json +``` +Replay/show concatenates parts in order. `tracecraft session compact ` +(later) merges into one file for archival. + +Trade-off accepted: more list operations during replay. Cheap on S3 +($0.005 per 1000 LIST). For long sessions this is materially better. + +### 5.2 State file format + +`~/.tracecraft/mirror-state/.json`: +```json +{ + "harness": "claude-code", + "session_id": "abc123", + "source_path": "/Users/x/.claude/projects/.../abc123.jsonl", + "bucket_prefix": "sessions/claude-code/abc123/", + "byte_offset": 142857, + "next_part_seq": 12, + "last_flush": "2026-05-20T10:15:00Z", + "pid": 4523 +} +``` + +### 5.3 Graceful shutdown +- `SIGTERM` / `SIGINT` → flush pending buffer, write final meta, remove pid. +- Crash → state file lets next `mirror` invocation resume from `byte_offset`. +- Idempotency: if `part-.jsonl` already exists at the target key, + bump `next_part_seq` until empty slot found (defends against duplicate + uploads after partial crash). + +--- + +## 6. D2 — Claude Code plugin + +### Why a plugin (vs a hook the user installs manually) +The whole point is **zero-friction**. If the user has to edit JSON +config files, we lose. `/plugin install tracecraft` should be the path. + +### Files in `plugins/claude-code/` +``` +plugins/claude-code/ + .claude-plugin/ + plugin.json ← name, version, hooks, commands + hooks/ + session-start.sh ← spawns `tracecraft session mirror --detach` + session-end.sh ← `tracecraft session stop $CLAUDE_SESSION_ID` + skills/ + tracecraft.md ← SKILL.md so Claude inside Claude Code knows + how to use tracecraft for coordination + commands/ + tc-mirror.md ← /tc-mirror slash command (start/stop/status) + tc-replay.md ← /tc-replay slash command + README.md +``` + +### Submission target +Anthropic's plugin marketplace + GitHub direct-install path +(`/plugin install Arrmlet/tracecraft`). + +### Open question to resolve during impl +Does `SessionStart` hook fire on `claude --resume`? If not, we also +need a `UserPromptSubmit` hook with a "have we started mirroring?" guard. +(Test on day 1 of D2; cheap to verify.) + +--- + +## 7. D3 — Codex variant + +Codex CLI writes to `~/.codex/sessions///
/rollout-*.jsonl`. +Schema differs (it's not Claude-Code JSONL) but the *act of tailing* is +identical. ~50 LoC: just a new `Harness` adapter that knows the +glob pattern and (optionally) translates entries to a normalized schema. + +For replay we'll keep entries in their native schema and let the +renderer handle two harness types side-by-side. **No premature +normalization** — if a third harness lands, then we extract a base. + +--- + +## 8. D4 — `session list` / `session show` + +``` +tracecraft session list [--harness claude-code|codex] [--limit 20] +tracecraft session show [--tail 50] +tracecraft session stop +``` + +Reads `/sessions/_index.json`. `_index.json` is rewritten on +each meta update (write whole file — it's tiny; ~1 KB per 100 sessions). + +--- + +## 9. D5 — `tracecraft replay` (the killer demo) + +This is where tracecraft stops looking like "yet another session +mirror" and becomes a coordination viewer. + +### Command +``` +tracecraft replay [--project ] [--out replay.html] [--open] + [--since ] [--until ] +``` + +### What it does +1. Pulls **all** of `/`: + - `agents/*.json` (registered agents) + - `memory/*.json` (every memory write — but memory keys don't have + timestamps; we'll need to add `_updated_at` to memory writes — + small backwards-compatible change) + - `messages/**/*.json` (every message) + - `steps/**/*.json` (every claim/handoff/status) + - `sessions/**/part-*.jsonl` (every harness event) +2. Builds a unified timeline (single sorted array of events, + each tagged with `event_type` and `agent_id`). +3. Renders a **single self-contained HTML file** (no server) with: + - vertical timeline (newest at top or oldest at top, toggle) + - one swim-lane per agent + - colour-coding: coordination events (claim/message/memory) vs + harness events (tool-use, reasoning, file-edit) + - click any event → expand JSON + - filter by agent / event type / text search + +### Tech for the HTML +- Pure HTML + vanilla JS embedded in one file. **No build step.** + React/Vite would be faster to write but harder to ship and harder + for users to inspect/trust. +- One inlined `