What this is
A tracker for the F1 friction ("the seed is the contract but nothing validates the seed", proposals/frictions-2026-05-26.md) as it shows up concretely in the demo's T-001. Not a proposal to patch the seeder for uv init specifically — see Explicitly out of scope below.
The shape
T-001 ("Bootstrap project skeleton"):
- Description authorises
uv init to create pyproject.toml.
uv init also creates README.md, .python-version, and a root main.py stub — none acknowledged by the description.
- Acceptance criteria require
todo_cli/__main__.py as the entry point (from todo_cli.__main__ import main succeeds). They say nothing about the collateral files, and nothing about cleaning up the now-orphaned root main.py.
So the seed relies on the worker inferring that the auto-generated root main.py is dead weight to remove, and that README.md / .python-version are fine to keep. The contract is underspecified, not internally validated.
Where seen
20260526-173822-f411ea — original F1 example.
20260527-174931-25fa4c — Phase 1 demo run. Evaluator rejected; worker thrashed iters 24-35 (incl. an attempted [tool.ruff] exclude at iter 33 — F12 gaming) before user interrupt.
Why we're tracking, not fixing in v1
The run-time symptom (evaluator over-rejecting hygiene/scaffolding files) is being addressed separately by softening tilth/prompts/judge.md — redrawing the scope-creep bright line at cross-task interference and letting the evaluator apply judgement to hygiene/tooling/scaffolding collateral. That should let this seed succeed at run time with one corrective iteration (worker removes the dead main.py, keeps README.md/.python-version).
But the seed itself is still unvalidated — the judge softening compensates at run time; it doesn't make the contract self-consistent. The structural fix is contract negotiation at seed time (Anthropic harness pattern: evaluator reviews proposed AC before any code is written), which is v2 territory per proposals/v1-worker-evaluator-dialogue.md ("What v1 is deliberately not").
Value as a fixture
This seed is a good reproducible anchor for:
- Phase 3 (
submit_case.work_arounds): does the worker now name the scaffolding collateral as a work-around, and does the evaluator accept/reject that specific claim instead of inferring it from the diff?
- v2 contract negotiation: does the evaluator catch the description↔AC underspecification before the worker runs?
Explicitly out of scope
- Hardcoding
uv init awareness into tilth/seed/prompts.md (overfits — next time it's git init / npm init / cookiecutter).
- A narrow "scaffolding-command discipline rule" in the seeder prompt — considered and dropped; the judge softening is the better intervention layer for v1, and seed validation is the real fix for v2.
Related
What this is
A tracker for the F1 friction ("the seed is the contract but nothing validates the seed",
proposals/frictions-2026-05-26.md) as it shows up concretely in the demo's T-001. Not a proposal to patch the seeder foruv initspecifically — see Explicitly out of scope below.The shape
T-001 ("Bootstrap project skeleton"):
uv initto createpyproject.toml.uv initalso createsREADME.md,.python-version, and a rootmain.pystub — none acknowledged by the description.todo_cli/__main__.pyas the entry point (from todo_cli.__main__ import mainsucceeds). They say nothing about the collateral files, and nothing about cleaning up the now-orphaned rootmain.py.So the seed relies on the worker inferring that the auto-generated root
main.pyis dead weight to remove, and thatREADME.md/.python-versionare fine to keep. The contract is underspecified, not internally validated.Where seen
20260526-173822-f411ea— original F1 example.20260527-174931-25fa4c— Phase 1 demo run. Evaluator rejected; worker thrashed iters 24-35 (incl. an attempted[tool.ruff] excludeat iter 33 — F12 gaming) before user interrupt.Why we're tracking, not fixing in v1
The run-time symptom (evaluator over-rejecting hygiene/scaffolding files) is being addressed separately by softening
tilth/prompts/judge.md— redrawing the scope-creep bright line at cross-task interference and letting the evaluator apply judgement to hygiene/tooling/scaffolding collateral. That should let this seed succeed at run time with one corrective iteration (worker removes the deadmain.py, keepsREADME.md/.python-version).But the seed itself is still unvalidated — the judge softening compensates at run time; it doesn't make the contract self-consistent. The structural fix is contract negotiation at seed time (Anthropic harness pattern: evaluator reviews proposed AC before any code is written), which is v2 territory per
proposals/v1-worker-evaluator-dialogue.md("What v1 is deliberately not").Value as a fixture
This seed is a good reproducible anchor for:
submit_case.work_arounds): does the worker now name the scaffolding collateral as a work-around, and does the evaluator accept/reject that specific claim instead of inferring it from the diff?Explicitly out of scope
uv initawareness intotilth/seed/prompts.md(overfits — next time it'sgit init/npm init/cookiecutter).Related
proposals/frictions-2026-05-26.mdproposals/v1-implementation-plan.md(Phase 1 shipped; Phase 3 =submit_case)