Skip to content

Bring PRD seeding into Tilth as uv run tilth prep-feature #14

@samkeen

Description

@samkeen

Related: #10 (prd.json leak from worktree). Closes #10 when landed — the artifact-placement section below resolves the leak by construction.

Why

Tilth runs are only as good as their seed. Today the seeding workflow lives in two places that aren't Tilth:

  1. A Claude Code skill (tilth-prd-seeder) that runs an interview against the target codebase and writes prd.json + matching tests/test_t00N_*.py.
  2. The user, who is expected to commit those artifacts into the source repo before running uv run tilth <workspace>.

This has three problems for Tilth's mission:

  • It pollutes the target repo. A finished feature should ship via a clean PR. Today prd.json (and progress.txt, post-run) ride the session branch into the PR diff. The seeder skill writes prd.json to <workspace>/prd.json, where it gets committed by git add -A in commit_task.
  • The seeder isn't discoverable or versioned with the harness. It only exists for users who installed the skill. Anyone else hits a docs gap at the most load-bearing step of using Tilth on a non-demo codebase (docs/getting-started/your-own-project.md).
  • Seed quality is the single biggest predictor of run quality. A weak seed collapses the quality gate to "ruff passed + judge said OK", which the docs explicitly call out as the worst-case failure. The interview that produces a strong seed should ship with the harness.

Proposed shape

A new subcommand:

uv run tilth prep-feature <workspace>

It runs the seeding interview against <workspace>, produces the seed, and stages it as a prepared (but not started) session under sessions/<id>/. The next uv run tilth <workspace> either auto-picks the prepared session for that workspace or accepts --session <id> to choose explicitly.

This makes prep-feature a peer of --resume, --reset, --visualize — all session-lifecycle verbs.

What prep-feature does (distilled from the skill)

Sequence matters; same order as tilth-prd-seeder/SKILL.md:

  1. Confirm intent. One sentence: what feature/refactor, what workspace path. Paraphrase if both were in the prompt; ask one targeted question otherwise.
  2. Strategic codebase scan. Seed-steered, not exhaustive: glob/grep for the area the feature touches, sample 2–4 most-relevant files end-to-end, inventory existing tests/ for style/fixtures, check for an existing prd.json (and continue task IDs from the highest existing T-NNN). Spawn an Explore subagent in parallel for unfamiliar/large codebases.
  3. Anchored interview. Adaptive, one question per turn. Mix AskUserQuestion (decision-style, 2–4 plausible options surfaced by the scan) and free-form (clarification, motivation, scope-boundary calls). Return to the codebase mid-interview when an answer makes a new area relevant.
    • Coverage targets (all must be hit before writing): motivation & context, observable contract, task slicing + per-task acceptance criteria, test strategy, scope boundaries, risks & open questions.
  4. Surface blockers as they appear. Refactor with no existing tests to ratchet against. Task slice that contradicts the code. Acceptance criterion that isn't programmatically checkable. Slice too coarse. No tests/ directory yet.
  5. Wrap interview when there's enough. Don't drag to 100% certainty; unknowns belong in the chat summary.
  6. Confirm IDs, slugs, workspace. Restate the path, propose contiguous T-NNN IDs and test_<task-id-lower>_<slug>.py file names via AskUserQuestion. The naming pattern is load-bearing — Tilth's pytest filter keys on it.
  7. Write the artifacts. prd entries (always status: "pending", append-don't-overwrite) and one matching test file per task (assertion-clusters mapping 1:1 to acceptance criteria, matching project's existing test style). See "Where artifacts land" for where they're written.
  8. Surface chat summary (TL;DR + Open Questions + Blockers, no disk writes), then suggest next steps and stop.

Behavior to avoid (carry across from the skill verbatim):

  • Don't skip the codebase scan; don't read every file end-to-end in step 2.
  • Don't batch interview questions into a megaprompt.
  • Don't fabricate acceptance criteria the user didn't agree to.
  • Don't write a prd entry without its matching test file (the pair is the unit).
  • Don't overwrite an existing prd.json; don't reuse existing task IDs.
  • Don't paper over contradictions because they're awkward.

Refuse / redirect cases (also from the skill):

  • Bug fix small enough for one task → write the single entry directly, no interview.
  • Greenfield with no existing code → can't anchor; redirect to architecture sketch first.
  • Already-detailed PRD just needs more tasks → open and edit directly.

(The full skill is at /Users/sam/.claude/skills/tilth-prd-seeder/SKILL.md with the file template at references/tilth-task-seed-template.md. The substance ports into the new command verbatim; the rest of this issue is about how it lives inside Tilth.)

Where artifacts land

Aligned to the three invariants this touches (Brain/Hands/Session split, agent-visibility boundary, worktree-branch-not-auto-merged):

Artifact Lives in Why
prd.json (runtime) sessions/<id>/prd.json Session state. Mutated by harness as tasks flip pending → in_progress → done. Outside the worktree — agent never sees it (closes #10).
Test files (tests/test_t0NN_*.py) <workspace>/tests/ Legitimate repo artifact. They are the quality gate; they should be reviewed in the PR and committed to main like any other test.
progress.txt sessions/<id>/progress.txt Within-session journal. Outside worktree — no PR pollution.
AGENTS.md <workspace>/AGENTS.md Cross-session memory channel. Legitimate repo artifact under modern conventions; user may commit and share across machines. Stays in worktree.

Net effect: target repo gains tests (intentional) and possibly AGENTS.md (user's call). No prd.json, no progress.txt, no .tilth/ directory, no auto-.gitignore writes. A feature dev → PR → merge cycle leaves zero Tilth-specific runtime artifacts in origin.

prep-feature's write path

  • Creates sessions/<id>/ and writes sessions/<id>/prd.json directly (no intermediate "seed file in the source repo" step).
  • Writes test files to <workspace>/tests/test_t0NN_*.py as the skill does today.
  • Records a session_prepared event in events.jsonl with the seed details for traceability.
  • Session has status: prepared in checkpoint.json — distinguishable from "in-progress" and "all_done".

Resume / pickup flow

  • uv run tilth <workspace> looks for the most recent prepared session for that source path. If exactly one, uses it. If multiple, lists them and asks. If zero, errors with: "No prepared session. Run uv run tilth prep-feature <workspace> first."
  • uv run tilth --session <id> accepts an explicit session.
  • --resume semantics unchanged for in-progress sessions.

Migration

  • Existing demo repo (AlteredCraft/tilth-demo-todo-cli) has prd.json and progress.txt committed in main. Document a one-time cleanup: delete from main; future seeding happens via prep-feature and lands in sessions/<id>/.
  • Tests in <workspace>/tests/ already work and stay where they are.
  • The standalone tilth-prd-seeder Claude Code skill gets deprecated in favor of uv run tilth prep-feature. Skill docs point users at the new command.

Open questions

  • Worker model for the interview. Same model the harness uses for run-time tool-use? Or a smaller/cheaper model? The interview is conversational + reading code; doesn't need the same horsepower as a task-execution turn. Probably configurable via TILTH_PREP_MODEL env var, defaulting to the same model.
  • Auto-start option. Should prep-feature end by asking "kick off the run now?" (Y/n) — or always stop after writing the seed and let the user invoke tilth <workspace> separately? Skill's current behavior is the latter (interview → summary → stop).
  • Interactive interview UX in a CLI. The skill ran inside Claude Code where AskUserQuestion is first-class. In uv run tilth prep-feature, do we render AskUserQuestion-equivalents as numbered TTY prompts? Use a richer prompt library (questionary)? Or run the interview via the LLM's tool-use loop with a prompt_user(question, options) tool the harness implements?
  • Re-prep on an existing prepared session. If the user runs prep-feature on a workspace that already has a prepared session, do we append to it (continue the PRD), replace it, or refuse? Probably refuse with a hint to --reset <id> or --session <id> --append.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions