Skip to content

add per column summary statistics to the inventory resource#370

Open
lindsaywiard wants to merge 8 commits into
mainfrom
258-add-per-column-summary-statistics-to-the-inventory-resource
Open

add per column summary statistics to the inventory resource#370
lindsaywiard wants to merge 8 commits into
mainfrom
258-add-per-column-summary-statistics-to-the-inventory-resource

Conversation

@lindsaywiard

Copy link
Copy Markdown
Collaborator

Adds summary to each Column, populated at completion time.

  • New summarize.py builds the stats graph
  • save_parquet / save_parquet_replace replaced by *_with_summary variants that return stats alongside the path
  • Stats write back to Firestore on completion via update_status
  • Single-pass regression test included

Changes

  • API
    • schema.py — added ContinuousColumnSummary, CategoricalColumnSummary, ColumnSummary union, summary field on Column
  • API Tests
    • test_schema.py — round-trip tests for both summary discriminator variants
    • test_duplicate_router.py, test_modifications_router.py, test_treatments_router.py — updated columns assertions to expect summary: None
  • Standgen
    • summarize.py — new, _build_column_stats_graph returns lazy dask scalars per column
    • storage.py_compute_write_and_stats fuses stats with the write; replaces save_parquet / save_parquet_replace with *_with_summary variants
    • handlers/pim.py, handlers/chm.py, handlers/modifications.py, handlers/treatments.py — swapped call sites, return columns in result dict
    • main.pyupdate_status gains a columns param, writes summaries to Firestore on completion
  • Standgen Tests
    • test_summarize.py — new, unit tests for stat correctness, null handling, and categorical unique count
    • integration/test_storage.py — new, single-pass regression test for both write functions
    • handlers/conftest.py — new, shared column constants for handler tests
    • Handler and integration tests updated for new call signatures and summary assertions
  • Lib Tests Shared Data
    • chm_lmf.json, chm_vwf.json, pim_treemap.json — added columns field

@lindsaywiard lindsaywiard linked an issue Jun 16, 2026 that may be closed by this pull request
@lindsaywiard

Copy link
Copy Markdown
Collaborator Author

I can work on these merge conflicts tomorrow and update you when I get them figured out :)

@lindsaywiard

Copy link
Copy Markdown
Collaborator Author

Should be ready for review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add per-column summary statistics to the Inventory resource

1 participant