validators: ruff runs workspace-wide while pytest is filtered — F5 asymmetry forces per-file-ignore workarounds

## The asymmetry

`tilth/validators.py`:
- `run_pytest(workspace, task_ids)` filters to the current task's + completed tasks' test globs (`test_t<NNN>_*.py`) — the ratchet.
- `run_ruff(workspace)` runs `ruff check .` — **workspace-wide**, with no task scoping (`validators.py:80`).

So at an early-stage task, ruff lints the **future tasks' seed test files**, which legitimately have unsorted imports / references to modules that don't exist yet. The worker can't fix them (touching another task's files is a hard `scope_creep` reject — correctly). It's cornered: ruff fails, and the only files it's allowed to change are its own.

This is **F5** (`proposals/frictions-2026-05-26.md` — "worker affordance bleed: validators filter, worker doesn't") realised specifically for the lint floor.

## Observed cost

Demo session `20260529-134013` (Phase 3 validation), T-001:
- **27 iterations / 226k tokens** — ~15 spent on the ruff-vs-future-task-seed-files dance, vs T-002 (6 iters) and T-003 (12) which were clean.
- iter 16: worker ran `ruff --fix` on the future-task test files → modified them → **rejected** `scope_creep` (correct).
- iter 27 (accepted): worker added `[tool.ruff.lint] per-file-ignores` for I001 on the future-task seed files, in its own `pyproject.toml`, so workspace ruff passes without touching them.

The per-file-ignore is the *same shape* as the `[tool.ruff] exclude` the frictions doc flagged as **F12 gaming** (session 173822). Phase 3's `work_arounds` made it **declared and adjudicated** (the evaluator accepted it as legit scoping) rather than hidden — a real improvement — but whether it's "legit scoping" or "papering over a harness gap" is exactly the ambiguity this asymmetry creates. The worker shouldn't have to make that call.

## Proposed fix

Scope `run_ruff` to the task's owned files the way `run_pytest` is scoped — at minimum, exclude future-task seed test files from the lint at earlier stages (the harness knows the task ids and the `test_t<NNN>` convention). Ruff should see the same "live at this stage" set pytest does.

Open questions for the fix:
- Lint scope = the worker's diff? task-owned source + this task's test glob + completed tasks' globs (mirroring the pytest ratchet)? The latter keeps the regression-guard intent.
- Passing explicit paths to `ruff check <paths>` vs. a generated `--exclude` for future-task globs. Explicit paths is cleaner but must include `pyproject.toml` etc.

## Why it matters

- Removes a recurring per-task token tax (inflates OQ#6 in the v1 plan).
- Removes the F12-gaming-vs-legit ambiguity at the source, so the evaluator isn't asked to bless lint-silencing workarounds.
- Tightens the mechanical floor: the worker is judged on lint of what it owns, not on noise from work it isn't allowed to do yet.

## Related
- Frictions **F5** (the asymmetry), **F12** (the gaming shape it induces)
- Distinct from #20 (validator *pluggability*) and #16 (judge *visibility* of seed tests) — this is about *scoping the existing ruff run*
- Surfaced during v1 Phase 3 validation (session `20260529-134013`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validators: ruff runs workspace-wide while pytest is filtered — F5 asymmetry forces per-file-ignore workarounds #25

The asymmetry

Observed cost

Proposed fix

Why it matters

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

validators: ruff runs workspace-wide while pytest is filtered — F5 asymmetry forces per-file-ignore workarounds #25

Description

The asymmetry

Observed cost

Proposed fix

Why it matters

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions