Summary
Copeland scoring currently has 3 criteria: tests passed, convergence, files changed. The "fewer files" criterion penalizes agents that add test files — Agent #5 beat Agent #1 in #80 because it changed 1 file (no tests) vs 2 files (code + 6 tests).
Proposed fix
Add a 4th criterion: test files added/modified. Count files matching *.test.* or *.spec.* patterns. More test files = better.
Also split criterion 3: count only non-test files for the "fewer files" comparison. This way:
- Tests passed (pass > fail) — binary correctness
- Convergence group size (larger > smaller) — consensus
- Non-test files changed (fewer > more) — minimal scope
- Test files added (more > fewer) — thoroughness
This ensures Copeland rewards both minimal code changes AND comprehensive testing.
Evidence
In PR #118, Copeland recommended Agent #5 (+4) over Agent #1 (0) solely because #5 changed fewer files. But #5 added zero tests while #1 added 6. Human override was needed.
Acceptance criteria
Summary
Copeland scoring currently has 3 criteria: tests passed, convergence, files changed. The "fewer files" criterion penalizes agents that add test files — Agent #5 beat Agent #1 in #80 because it changed 1 file (no tests) vs 2 files (code + 6 tests).
Proposed fix
Add a 4th criterion: test files added/modified. Count files matching
*.test.*or*.spec.*patterns. More test files = better.Also split criterion 3: count only non-test files for the "fewer files" comparison. This way:
This ensures Copeland rewards both minimal code changes AND comprehensive testing.
Evidence
In PR #118, Copeland recommended Agent #5 (+4) over Agent #1 (0) solely because #5 changed fewer files. But #5 added zero tests while #1 added 6. Human override was needed.
Acceptance criteria
thinktank evaluateto measure impact