Improve coverage pipeline performance by shaddi · Pull Request #102 · spin-vt/bdk

shaddi · 2026-06-13T03:58:43Z

A number of improvements to the coverage and tile generation performance pipeline that don't impact correctness. Coverage calculation core changes are mostly efficiency improvements and eliminating redundant computation. Tileset generation (still slowest by wall clock time) has been tweaked to enable "splicing" of only changed tiles into an existing tileset: when a user makes a change that would impact map tiles, only tiles in the vicinity of that geometries that were changed are rebuilt. Generally, this PR improves wall-clock by about 75% for most edits.

…GeoJSON Three hot spots, one behavior-preserving pass (the end-to-end replay suite asserts identical outputs): - The fabric CSV (tens of MB) was re-parsed — and its points rebuilt one Python object at a time — for EVERY coverage file in a recompute. load_fabric_gdf() parses it once (geometry built vectorized via points_from_xy) and process_data passes the frame to each file's compute instead of letting it reload. - add_to_db wrote results with iterrows() + per-row ORM objects and a dead IntegrityError-swallowing batch path (kml_data has no unique constraint). It now COPYs the rows in one statement on the session's own connection, keeping the same transaction semantics. - The retile dumped one giant FeatureCollection through json.dump; it now streams newline-delimited features through orjson — also the input shape tippecanoe -P can actually parallelize parsing on. Coverage compute drops to roughly a third of its former wall clock on a multi-file filing; the retile's serialization cost mostly vanishes.

…store one tileset The full tile build is now two tippecanoe runs merged into one tileset: overview zooms (z0-8) keep the density-dropping flags they need, detail zooms (z9-16) are built with no dropping of any kind so a tile's bytes are a pure function of the features that intersect it. tippecanoe's --drop-densest-as-needed shares its discovered min-gap across a whole zoom level, which both silently thinned dense areas and would defeat regional tile splicing (an upcoming change). Also: stop storing the mbtiles file blob (vector_tiles rows are the only serving truth; nothing ever read the blob back) including in folder.copy snapshots; deterministic ordering for the tile feature stream (files by id, fabric points by location_id); flatten the feature arrays fed to create_tiles (two callers nested per-file feature lists, relying on tippecanoe tolerating JSON arrays as ldjson lines).

…ilds An edit now records its polygons' bbox as scoped dirt; the retile that consumes it regenerates only the dirty region's z9-16 tiles (snapped to the z9 grid, expanded by tippecanoe's tile-buffer overhang) and replaces those rows inside the current tileset in one transaction. Because z9-16 is built drop-free, the spliced tiles are byte-identical to a full rebuild's - pinned by test_splice_matches_full_rebuild_byte_for_byte (real tippecanoe). The splice input is the region's points plus every coverage geometry passed whole: clipping geometries (shapely or tippecanoe's --clip-bounding-box) perturbs polygon simplification deep inside the region; out-of-region output is discarded instead. Whole-tileset dirt (uploads, recomputes), a splice failure, or no redis all fall back to the full rebuild. Splices leave z0-8 overview tiles slightly stale (an edited dot is sub-pixel there); a new beat task full-rebuilds a spliced folder once it has been quiet for ten minutes.

The BDC accepts multiple technology claims per location, so the export CSV now reports each one by default. For providers who prefer a single row per location, the export_max_service_only site setting (admin settings page, default off) keeps only the fastest claim: download desc, then upload desc, then low-latency first, then lowest technology code as a deterministic tiebreak — honored by both CSV-generation sites and pinned by unit tests. Also adds a golden test that replays archived filings end to end and asserts the generated CSV matches the originally filed bytes (fixtures local-only, skip-if-absent).

shaddi added 4 commits June 12, 2026 23:53

shaddi merged commit baf3ec7 into main Jun 13, 2026
3 checks passed

shaddi deleted the pipeline-performance branch June 13, 2026 04:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve coverage pipeline performance#102

Improve coverage pipeline performance#102
shaddi merged 4 commits into
mainfrom
pipeline-performance

shaddi commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shaddi commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant