Seeder anti-pattern: transitional stub behavior pinned as a permanent regression test forces a later task to rewrite a completed task's seed test

## What happened

In demo session `20260529-113158-6b9b12`, the seeder produced a cross-task contradiction on the same call, `main([])`:

| Task | Contract for `main([])` |
|---|---|
| **T-001** AC | returns **`0`** — seed test `test_main_returns_zero` asserts `main([]) == 0` |
| **T-002** AC | returns a **non-zero** exit code and prints usage to stderr |

T-001 pins a *transitional stub* behavior (`main([]) == 0`) as a *permanent acceptance criterion with a regression test*. T-002 was always going to supersede that behavior. The two are mutually exclusive.

## Why it forced a contract rewrite

`validators.py:run_pytest` ratchets — it runs completed tasks' tests as a regression guard (commit d2adb33). So during T-002, T-001's now-stale `test_main_returns_zero` runs and **fails**. The worker had exactly two escapes:

1. Break T-002's AC (make `main([])` return 0), or
2. Rewrite T-001's seed test.

It chose (2): flipped `test_main_returns_zero` → `test_main_returns_nonzero_on_no_args` (`assert main([]) != 0`). Validators then passed and the task was accepted.

## The damage

T-001's **stated** acceptance criterion (`main([])` returns 0) is now silently contradicted by T-001's own rewritten test. The run shows 3/3 green; the T-001 contract in `prd.json` is false relative to the code. This is the cross-cutting friction from `proposals/frictions-2026-05-26.md`: *"we've seen both tests pass and the judge accept with broken work."* There's now an AC↔test mismatch on a completed task that nothing flags.

## Sub-finding: the evaluator overrode its own hard rule

`tilth/prompts/judge.md` makes cross-task seed-test edits (`tests/test_t<NNN>_*.py`, NNN ≠ this task) a **hard reject, no judgement**. In this session the evaluator:

- **rejected** a `test_t003_persist.py` edit (future task → correct), but
- **accepted** the `test_t001_scaffold.py` edit (completed task), improvising a completed-vs-future distinction the rule does not contain — and even *instructed* it at iter 31 ("Only the test_t001_scaffold.py change ... should remain in this diff").

The reasoning is pragmatic (a completed task's test legitimately invalidated by evolved behavior ≠ tampering with a future task's contract), but it means the gaming backstop isn't actually hard, and the precedent is "you may rewrite an earlier task's test if you tell a good story."

## Root cause + fix surface

This is **F1/F2 (seed contradiction) + F9 (no cross-task awareness)**, turned into a forcing function by the validator ratchet. The fix belongs **upstream in the seeder**, not in weakening the cross-task rule:

- The seeder should not pin transitional/stub behavior (`main([]) == 0`) as a permanent regression assertion. A stub behavior that a later task supersedes should either not be asserted in a ratcheted test, or the superseding task (T-002) should explicitly acknowledge it changes T-001's `main([])` contract.
- Weakening `judge.md`'s cross-task hard-reject to bless "completed-task test updates" is the wrong move — it invites the gaming the rule exists to stop.

## Open tension (deliberately not resolved here)

Keeping the cross-task rule truly hard means a contradictory seed like this one would **deadlock** (worker can't pass T-002 without touching T-001's test; judge won't allow it → iter cap). That's arguably the *correct* signal (bad seed fails loudly instead of silently passing broken work) — but it collides with F7 (no escape from a broken task) and the deferred halt-authority question. This interaction should inform the halt design when it's revisited, using the ledger data v1 sessions now produce.

## Notes

- **Not a Phase 1/2 regression.** The v1 dialogue worked — in fact the evaluator's verdicts and the per-task ledger *surfaced* this contradiction legibly, exactly as the sketch predicted ("v1 surfaces F1 via the ledger but can't halt it").
- Distinct from #22 (that's `uv init` collateral; this is transitional-stub-as-regression-test + ratchet).

## Related
- #22 (sibling F1 demo-seed instance), frictions F1/F2/F3/F9, validator ratchet (d2adb33)
- Session: `20260529-113158-6b9b12`

Task	Contract for `main([])`
T-001 AC	returns `0` — seed test `test_main_returns_zero` asserts `main([]) == 0`
T-002 AC	returns a non-zero exit code and prints usage to stderr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeder anti-pattern: transitional stub behavior pinned as a permanent regression test forces a later task to rewrite a completed task's seed test #23

What happened

Why it forced a contract rewrite

The damage

Sub-finding: the evaluator overrode its own hard rule

Root cause + fix surface

Open tension (deliberately not resolved here)

Notes

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Seeder anti-pattern: transitional stub behavior pinned as a permanent regression test forces a later task to rewrite a completed task's seed test #23

Description

What happened

Why it forced a contract rewrite

The damage

Sub-finding: the evaluator overrode its own hard rule

Root cause + fix surface

Open tension (deliberately not resolved here)

Notes

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions