Skip to content

Add --whitespace-insensitive flag to similarity comparison #70

Description

@that-github-user

Problem

Diff similarity in src/scoring/diff-parser.ts uses exact string matching on added lines. Two agents that write semantically identical code but with different indentation (e.g., 2 spaces vs 4 spaces, trailing whitespace) are scored as dissimilar and placed in separate convergence groups. This is especially common when:

  • Agents reformat existing code incidentally
  • Files use mixed indentation
  • Agents add the same logic in different contexts (e.g., inside an if block)

Current behavior

// line comparison is exact
addedLines.add(`${file}:${line}`);

" const x = 1;" and "const x = 1;" are treated as different lines.

Proposed solution

  • Add --ignore-whitespace flag to thinktank run (default: false)
  • When enabled, normalize lines before adding to the similarity set: trim leading/trailing whitespace and collapse internal runs of whitespace to a single space
  • Thread through RunOptionscomputePairwiseSimilarity()
  • Also pass -w to the underlying git diff call so the raw diffs themselves ignore whitespace (prevents whitespace-only changes from inflating diff sizes)

Acceptance criteria

  • --ignore-whitespace normalizes lines in Jaccard similarity computation
  • Agents with whitespace-only differences score ≥ 0.95 similarity when flag is set
  • Default behavior unchanged (flag is opt-in)
  • Unit tests: two diffs differing only in indentation, with and without the flag
  • README documents the flag with a note about when to use it (reformatting tasks)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions