Scope
Extend the diff service (src/core/diff/ from #115) with content-type-aware normalizers and renderers for:
- Tables — CSV, Excel
- PDF — page-aware
Tables
- Row-key-aware normalization: when the extractor identifies a key column, diff by row-key (row add/remove/modified) instead of line-by-line.
- Render in dashboard as a data-table with adds highlighted green, removes red, per-cell changes yellow with old → new values visible.
- Requires:
pandas or equivalent; extractor metadata to carry key-column hints.
PDF
- Page-aware diff: show which pages changed, diff extracted text per-page rather than as one flat blob.
- Integrate with page-level screenshot comparison where available.
Dependencies
Acceptance
Scope
Extend the diff service (
src/core/diff/from #115) with content-type-aware normalizers and renderers for:Tables
pandasor equivalent; extractor metadata to carry key-column hints.PDF
Dependencies
src/core/diff/module, normalization layer, unified-diff core).Acceptance
src/core/diff/normalize.pygainsnormalize_tableandnormalize_pdf_text