Skip to content

validators: ruff runs workspace-wide while pytest is filtered — F5 asymmetry forces per-file-ignore workarounds #25

@samkeen

Description

@samkeen

The asymmetry

tilth/validators.py:

  • run_pytest(workspace, task_ids) filters to the current task's + completed tasks' test globs (test_t<NNN>_*.py) — the ratchet.
  • run_ruff(workspace) runs ruff check .workspace-wide, with no task scoping (validators.py:80).

So at an early-stage task, ruff lints the future tasks' seed test files, which legitimately have unsorted imports / references to modules that don't exist yet. The worker can't fix them (touching another task's files is a hard scope_creep reject — correctly). It's cornered: ruff fails, and the only files it's allowed to change are its own.

This is F5 (proposals/frictions-2026-05-26.md — "worker affordance bleed: validators filter, worker doesn't") realised specifically for the lint floor.

Observed cost

Demo session 20260529-134013 (Phase 3 validation), T-001:

  • 27 iterations / 226k tokens — ~15 spent on the ruff-vs-future-task-seed-files dance, vs T-002 (6 iters) and T-003 (12) which were clean.
  • iter 16: worker ran ruff --fix on the future-task test files → modified them → rejected scope_creep (correct).
  • iter 27 (accepted): worker added [tool.ruff.lint] per-file-ignores for I001 on the future-task seed files, in its own pyproject.toml, so workspace ruff passes without touching them.

The per-file-ignore is the same shape as the [tool.ruff] exclude the frictions doc flagged as F12 gaming (session 173822). Phase 3's work_arounds made it declared and adjudicated (the evaluator accepted it as legit scoping) rather than hidden — a real improvement — but whether it's "legit scoping" or "papering over a harness gap" is exactly the ambiguity this asymmetry creates. The worker shouldn't have to make that call.

Proposed fix

Scope run_ruff to the task's owned files the way run_pytest is scoped — at minimum, exclude future-task seed test files from the lint at earlier stages (the harness knows the task ids and the test_t<NNN> convention). Ruff should see the same "live at this stage" set pytest does.

Open questions for the fix:

  • Lint scope = the worker's diff? task-owned source + this task's test glob + completed tasks' globs (mirroring the pytest ratchet)? The latter keeps the regression-guard intent.
  • Passing explicit paths to ruff check <paths> vs. a generated --exclude for future-task globs. Explicit paths is cleaner but must include pyproject.toml etc.

Why it matters

  • Removes a recurring per-task token tax (inflates OQ#6 in the v1 plan).
  • Removes the F12-gaming-vs-legit ambiguity at the source, so the evaluator isn't asked to bless lint-silencing workarounds.
  • Tightens the mechanical floor: the worker is judged on lint of what it owns, not on noise from work it isn't allowed to do yet.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions