Skip to content

Content-type-aware diff: tables (CSV/Excel) and PDF #117

@gregoryfoster

Description

@gregoryfoster

Scope

Extend the diff service (src/core/diff/ from #115) with content-type-aware normalizers and renderers for:

  1. Tables — CSV, Excel
  2. PDF — page-aware

Tables

  • Row-key-aware normalization: when the extractor identifies a key column, diff by row-key (row add/remove/modified) instead of line-by-line.
  • Render in dashboard as a data-table with adds highlighted green, removes red, per-cell changes yellow with old → new values visible.
  • Requires: pandas or equivalent; extractor metadata to carry key-column hints.

PDF

  • Page-aware diff: show which pages changed, diff extracted text per-page rather than as one flat blob.
  • Integrate with page-level screenshot comparison where available.

Dependencies

Acceptance

  • src/core/diff/normalize.py gains normalize_table and normalize_pdf_text
  • Dashboard Change Detail supports content-type-aware rendering for CSV/Excel and PDF watches
  • TDD coverage for new normalizers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions