Related: #10 (prd.json leak from worktree). Closes #10 when landed — the artifact-placement section below resolves the leak by construction.
Why
Tilth runs are only as good as their seed. Today the seeding workflow lives in two places that aren't Tilth:
- A Claude Code skill (
tilth-prd-seeder) that runs an interview against the target codebase and writes prd.json + matching tests/test_t00N_*.py.
- The user, who is expected to commit those artifacts into the source repo before running
uv run tilth <workspace>.
This has three problems for Tilth's mission:
- It pollutes the target repo. A finished feature should ship via a clean PR. Today
prd.json (and progress.txt, post-run) ride the session branch into the PR diff. The seeder skill writes prd.json to <workspace>/prd.json, where it gets committed by git add -A in commit_task.
- The seeder isn't discoverable or versioned with the harness. It only exists for users who installed the skill. Anyone else hits a docs gap at the most load-bearing step of using Tilth on a non-demo codebase (
docs/getting-started/your-own-project.md).
- Seed quality is the single biggest predictor of run quality. A weak seed collapses the quality gate to "ruff passed + judge said OK", which the docs explicitly call out as the worst-case failure. The interview that produces a strong seed should ship with the harness.
Proposed shape
A new subcommand:
uv run tilth prep-feature <workspace>
It runs the seeding interview against <workspace>, produces the seed, and stages it as a prepared (but not started) session under sessions/<id>/. The next uv run tilth <workspace> either auto-picks the prepared session for that workspace or accepts --session <id> to choose explicitly.
This makes prep-feature a peer of --resume, --reset, --visualize — all session-lifecycle verbs.
What prep-feature does (distilled from the skill)
Sequence matters; same order as tilth-prd-seeder/SKILL.md:
- Confirm intent. One sentence: what feature/refactor, what workspace path. Paraphrase if both were in the prompt; ask one targeted question otherwise.
- Strategic codebase scan. Seed-steered, not exhaustive: glob/grep for the area the feature touches, sample 2–4 most-relevant files end-to-end, inventory existing
tests/ for style/fixtures, check for an existing prd.json (and continue task IDs from the highest existing T-NNN). Spawn an Explore subagent in parallel for unfamiliar/large codebases.
- Anchored interview. Adaptive, one question per turn. Mix
AskUserQuestion (decision-style, 2–4 plausible options surfaced by the scan) and free-form (clarification, motivation, scope-boundary calls). Return to the codebase mid-interview when an answer makes a new area relevant.
- Coverage targets (all must be hit before writing): motivation & context, observable contract, task slicing + per-task acceptance criteria, test strategy, scope boundaries, risks & open questions.
- Surface blockers as they appear. Refactor with no existing tests to ratchet against. Task slice that contradicts the code. Acceptance criterion that isn't programmatically checkable. Slice too coarse. No
tests/ directory yet.
- Wrap interview when there's enough. Don't drag to 100% certainty; unknowns belong in the chat summary.
- Confirm IDs, slugs, workspace. Restate the path, propose contiguous
T-NNN IDs and test_<task-id-lower>_<slug>.py file names via AskUserQuestion. The naming pattern is load-bearing — Tilth's pytest filter keys on it.
- Write the artifacts. prd entries (always
status: "pending", append-don't-overwrite) and one matching test file per task (assertion-clusters mapping 1:1 to acceptance criteria, matching project's existing test style). See "Where artifacts land" for where they're written.
- Surface chat summary (TL;DR + Open Questions + Blockers, no disk writes), then suggest next steps and stop.
Behavior to avoid (carry across from the skill verbatim):
- Don't skip the codebase scan; don't read every file end-to-end in step 2.
- Don't batch interview questions into a megaprompt.
- Don't fabricate acceptance criteria the user didn't agree to.
- Don't write a prd entry without its matching test file (the pair is the unit).
- Don't overwrite an existing
prd.json; don't reuse existing task IDs.
- Don't paper over contradictions because they're awkward.
Refuse / redirect cases (also from the skill):
- Bug fix small enough for one task → write the single entry directly, no interview.
- Greenfield with no existing code → can't anchor; redirect to architecture sketch first.
- Already-detailed PRD just needs more tasks → open and edit directly.
(The full skill is at /Users/sam/.claude/skills/tilth-prd-seeder/SKILL.md with the file template at references/tilth-task-seed-template.md. The substance ports into the new command verbatim; the rest of this issue is about how it lives inside Tilth.)
Where artifacts land
Aligned to the three invariants this touches (Brain/Hands/Session split, agent-visibility boundary, worktree-branch-not-auto-merged):
| Artifact |
Lives in |
Why |
prd.json (runtime) |
sessions/<id>/prd.json |
Session state. Mutated by harness as tasks flip pending → in_progress → done. Outside the worktree — agent never sees it (closes #10). |
Test files (tests/test_t0NN_*.py) |
<workspace>/tests/ |
Legitimate repo artifact. They are the quality gate; they should be reviewed in the PR and committed to main like any other test. |
progress.txt |
sessions/<id>/progress.txt |
Within-session journal. Outside worktree — no PR pollution. |
AGENTS.md |
<workspace>/AGENTS.md |
Cross-session memory channel. Legitimate repo artifact under modern conventions; user may commit and share across machines. Stays in worktree. |
Net effect: target repo gains tests (intentional) and possibly AGENTS.md (user's call). No prd.json, no progress.txt, no .tilth/ directory, no auto-.gitignore writes. A feature dev → PR → merge cycle leaves zero Tilth-specific runtime artifacts in origin.
prep-feature's write path
- Creates
sessions/<id>/ and writes sessions/<id>/prd.json directly (no intermediate "seed file in the source repo" step).
- Writes test files to
<workspace>/tests/test_t0NN_*.py as the skill does today.
- Records a
session_prepared event in events.jsonl with the seed details for traceability.
- Session has
status: prepared in checkpoint.json — distinguishable from "in-progress" and "all_done".
Resume / pickup flow
uv run tilth <workspace> looks for the most recent prepared session for that source path. If exactly one, uses it. If multiple, lists them and asks. If zero, errors with: "No prepared session. Run uv run tilth prep-feature <workspace> first."
uv run tilth --session <id> accepts an explicit session.
--resume semantics unchanged for in-progress sessions.
Migration
- Existing demo repo (
AlteredCraft/tilth-demo-todo-cli) has prd.json and progress.txt committed in main. Document a one-time cleanup: delete from main; future seeding happens via prep-feature and lands in sessions/<id>/.
- Tests in
<workspace>/tests/ already work and stay where they are.
- The standalone
tilth-prd-seeder Claude Code skill gets deprecated in favor of uv run tilth prep-feature. Skill docs point users at the new command.
Open questions
- Worker model for the interview. Same model the harness uses for run-time tool-use? Or a smaller/cheaper model? The interview is conversational + reading code; doesn't need the same horsepower as a task-execution turn. Probably configurable via
TILTH_PREP_MODEL env var, defaulting to the same model.
- Auto-start option. Should
prep-feature end by asking "kick off the run now?" (Y/n) — or always stop after writing the seed and let the user invoke tilth <workspace> separately? Skill's current behavior is the latter (interview → summary → stop).
- Interactive interview UX in a CLI. The skill ran inside Claude Code where
AskUserQuestion is first-class. In uv run tilth prep-feature, do we render AskUserQuestion-equivalents as numbered TTY prompts? Use a richer prompt library (questionary)? Or run the interview via the LLM's tool-use loop with a prompt_user(question, options) tool the harness implements?
- Re-prep on an existing prepared session. If the user runs
prep-feature on a workspace that already has a prepared session, do we append to it (continue the PRD), replace it, or refuse? Probably refuse with a hint to --reset <id> or --session <id> --append.
Related
Why
Tilth runs are only as good as their seed. Today the seeding workflow lives in two places that aren't Tilth:
tilth-prd-seeder) that runs an interview against the target codebase and writesprd.json+ matchingtests/test_t00N_*.py.uv run tilth <workspace>.This has three problems for Tilth's mission:
prd.json(andprogress.txt, post-run) ride the session branch into the PR diff. The seeder skill writesprd.jsonto<workspace>/prd.json, where it gets committed bygit add -Aincommit_task.docs/getting-started/your-own-project.md).Proposed shape
A new subcommand:
It runs the seeding interview against
<workspace>, produces the seed, and stages it as a prepared (but not started) session undersessions/<id>/. The nextuv run tilth <workspace>either auto-picks the prepared session for that workspace or accepts--session <id>to choose explicitly.This makes
prep-featurea peer of--resume,--reset,--visualize— all session-lifecycle verbs.What
prep-featuredoes (distilled from the skill)Sequence matters; same order as
tilth-prd-seeder/SKILL.md:tests/for style/fixtures, check for an existingprd.json(and continue task IDs from the highest existingT-NNN). Spawn an Explore subagent in parallel for unfamiliar/large codebases.AskUserQuestion(decision-style, 2–4 plausible options surfaced by the scan) and free-form (clarification, motivation, scope-boundary calls). Return to the codebase mid-interview when an answer makes a new area relevant.tests/directory yet.T-NNNIDs andtest_<task-id-lower>_<slug>.pyfile names viaAskUserQuestion. The naming pattern is load-bearing — Tilth's pytest filter keys on it.status: "pending", append-don't-overwrite) and one matching test file per task (assertion-clusters mapping 1:1 to acceptance criteria, matching project's existing test style). See "Where artifacts land" for where they're written.Behavior to avoid (carry across from the skill verbatim):
prd.json; don't reuse existing task IDs.Refuse / redirect cases (also from the skill):
(The full skill is at
/Users/sam/.claude/skills/tilth-prd-seeder/SKILL.mdwith the file template atreferences/tilth-task-seed-template.md. The substance ports into the new command verbatim; the rest of this issue is about how it lives inside Tilth.)Where artifacts land
Aligned to the three invariants this touches (Brain/Hands/Session split, agent-visibility boundary, worktree-branch-not-auto-merged):
prd.json(runtime)sessions/<id>/prd.jsontests/test_t0NN_*.py)<workspace>/tests/progress.txtsessions/<id>/progress.txtAGENTS.md<workspace>/AGENTS.mdNet effect: target repo gains tests (intentional) and possibly AGENTS.md (user's call). No
prd.json, noprogress.txt, no.tilth/directory, no auto-.gitignorewrites. A feature dev → PR → merge cycle leaves zero Tilth-specific runtime artifacts in origin.prep-feature's write pathsessions/<id>/and writessessions/<id>/prd.jsondirectly (no intermediate "seed file in the source repo" step).<workspace>/tests/test_t0NN_*.pyas the skill does today.session_preparedevent inevents.jsonlwith the seed details for traceability.status: preparedincheckpoint.json— distinguishable from "in-progress" and "all_done".Resume / pickup flow
uv run tilth <workspace>looks for the most recentpreparedsession for that source path. If exactly one, uses it. If multiple, lists them and asks. If zero, errors with: "No prepared session. Runuv run tilth prep-feature <workspace>first."uv run tilth --session <id>accepts an explicit session.--resumesemantics unchanged for in-progress sessions.Migration
AlteredCraft/tilth-demo-todo-cli) hasprd.jsonandprogress.txtcommitted inmain. Document a one-time cleanup: delete from main; future seeding happens viaprep-featureand lands insessions/<id>/.<workspace>/tests/already work and stay where they are.tilth-prd-seederClaude Code skill gets deprecated in favor ofuv run tilth prep-feature. Skill docs point users at the new command.Open questions
TILTH_PREP_MODELenv var, defaulting to the same model.prep-featureend by asking "kick off the run now?" (Y/n) — or always stop after writing the seed and let the user invoketilth <workspace>separately? Skill's current behavior is the latter (interview → summary → stop).AskUserQuestionis first-class. Inuv run tilth prep-feature, do we renderAskUserQuestion-equivalents as numbered TTY prompts? Use a richer prompt library (questionary)? Or run the interview via the LLM's tool-use loop with aprompt_user(question, options)tool the harness implements?prep-featureon a workspace that already has a prepared session, do we append to it (continue the PRD), replace it, or refuse? Probably refuse with a hint to--reset <id>or--session <id> --append.Related
tilth/loop.py:_load_prd,tilth/loop.py:_save_prd— touched by the artifact move.tilth/memory.py:_load_progress_tail,tilth/memory.py:append_progress— touched by the progress.txt move.docs/getting-started/your-own-project.md— currently describes the manual seed flow; rewrites aroundprep-featureonce landed.