Problem
Diff similarity in src/scoring/diff-parser.ts uses exact string matching on added lines. Two agents that write semantically identical code but with different indentation (e.g., 2 spaces vs 4 spaces, trailing whitespace) are scored as dissimilar and placed in separate convergence groups. This is especially common when:
- Agents reformat existing code incidentally
- Files use mixed indentation
- Agents add the same logic in different contexts (e.g., inside an
if block)
Current behavior
// line comparison is exact
addedLines.add(`${file}:${line}`);
" const x = 1;" and "const x = 1;" are treated as different lines.
Proposed solution
- Add
--ignore-whitespace flag to thinktank run (default: false)
- When enabled, normalize lines before adding to the similarity set: trim leading/trailing whitespace and collapse internal runs of whitespace to a single space
- Thread through
RunOptions → computePairwiseSimilarity()
- Also pass
-w to the underlying git diff call so the raw diffs themselves ignore whitespace (prevents whitespace-only changes from inflating diff sizes)
Acceptance criteria
Problem
Diff similarity in
src/scoring/diff-parser.tsuses exact string matching on added lines. Two agents that write semantically identical code but with different indentation (e.g., 2 spaces vs 4 spaces, trailing whitespace) are scored as dissimilar and placed in separate convergence groups. This is especially common when:ifblock)Current behavior
" const x = 1;"and"const x = 1;"are treated as different lines.Proposed solution
--ignore-whitespaceflag tothinktank run(default: false)RunOptions→computePairwiseSimilarity()-wto the underlyinggit diffcall so the raw diffs themselves ignore whitespace (prevents whitespace-only changes from inflating diff sizes)Acceptance criteria
--ignore-whitespacenormalizes lines in Jaccard similarity computation