Add test coverage as 4th Copeland scoring criterion

## Summary
Copeland scoring currently has 3 criteria: tests passed, convergence, files changed. The "fewer files" criterion penalizes agents that add test files — Agent #5 beat Agent #1 in #80 because it changed 1 file (no tests) vs 2 files (code + 6 tests).

## Proposed fix
Add a 4th criterion: **test files added/modified**. Count files matching `*.test.*` or `*.spec.*` patterns. More test files = better.

Also split criterion 3: count only non-test files for the "fewer files" comparison. This way:
1. Tests passed (pass > fail) — binary correctness
2. Convergence group size (larger > smaller) — consensus
3. Non-test files changed (fewer > more) — minimal scope
4. Test files added (more > fewer) — thoroughness

This ensures Copeland rewards both minimal code changes AND comprehensive testing.

## Evidence
In PR #118, Copeland recommended Agent #5 (+4) over Agent #1 (0) solely because #5 changed fewer files. But #5 added zero tests while #1 added 6. Human override was needed.

## Acceptance criteria
- [ ] Split filesChanged into testFiles and nonTestFiles in Copeland
- [ ] Add testFiles as 4th criterion (more = better)
- [ ] Update tests
- [ ] Re-run `thinktank evaluate` to measure impact

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test coverage as 4th Copeland scoring criterion #119

Summary

Proposed fix

Evidence

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add test coverage as 4th Copeland scoring criterion #119

Description

Summary

Proposed fix

Evidence

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions