Optimize save_weights: rank-ordered layout with collective I/O (3) by cattabiani · Pull Request #84 · openbraininstitute/BlueRecording

cattabiani · 2026-06-19T08:44:30Z

Replace the per-node HDF5 write loop in _add_data with a single collective write per MPI rank. Each rank now writes a contiguous block of scaling_factors in one operation.

Fix: #83
Fix: #79
Fix: #73

Changes

_init_weights: File layout is now rank-ordered (rank 0's segments first, then rank 1's, etc.) instead of globally GID-sorted. Uses groupby(sort=False) to preserve rank-concatenation order.
_add_data: Writes coeffs.values.T as a single contiguous slice using h5py collective I/O. Falls back to a plain write when h5py lacks parallel support (single-rank / unit tests).
save_weights: Uses MPI_Scan (comm.scan) to compute each rank's row offset. All ranks (including empty ones) participate in collective I/O. Emits a warning if running multi-rank without parallel h5py.
_write_neurite_types: Same single-write pattern with collective I/O fallback.
MPI integration tests: Changed to order-independent comparison (per-node data lookup instead of raw array equality).

Rationale

The SONATA spec does not require globally sorted node_ids in electrode files
libsonatareport uses the same rank-ordered pattern for reports
neurodamus reads electrode files with np.where(node_ids == gid) — no sorted assumption
With round-robin GID distribution, the old GID-sorted layout made each rank's data non-contiguous, requiring ~3000 small writes per rank

Testing

All unit tests pass
All 4 MPI integration tests pass with mpirun -n 2
No reference file changes needed (per-node data values are identical)

Replace the per-node HDF5 write loop in _add_data with a single collective write per MPI rank. The file layout is now rank-ordered (each rank's segments contiguous) instead of globally GID-sorted. Changes: - _init_weights: keep all_cols in rank-concatenation order, build node_ids/offsets in rank order using groupby(sort=False) - _add_data: new 'start' parameter for optimized single-write path; legacy per-node path preserved for backward compat with unit tests - save_weights: use MPI_Scan for row offsets, all ranks participate in collective I/O (including empty ranks) - _write_neurite_types: same contiguous write pattern - MPI integration tests: order-independent comparison (per-node lookup) The SONATA spec does not require globally sorted node_ids. Readers (neurodamus) use np.where(node_ids == gid) for lookup.

Simplify _add_data to only the optimized single-block write path. The 'ids' parameter is removed; 'start' is now required. Unit tests updated to use start=0 (single-rank = offset 0). The 'backwards' test now verifies that data is written in coeffs column order (which matches the file layout in the new architecture).

GIDs are disjoint across ranks (round-robin), so no deduplication needed. Replace dict-based seen tracking with simple concatenation of each rank's np.unique output.

…s entry

- Replace collective I/O with independent writes (non-overlapping slices) - Use fillvalue=1.0 instead of np.ones to avoid 17GB memory spike in init - Add log_rank0() utility for rank-0 progress messages - Add timing summary to save_weights - Warn if h5py lacks MPI support in multi-rank runs

…ite phase

…kle overhead)

…e-save-weights

cattabiani self-assigned this Jun 19, 2026

cattabiani marked this pull request as draft June 19, 2026 08:44

cattabiani added 11 commits June 19, 2026 10:52

format

64435d1

Simplify node_ids construction: concatenate directly

e0b5a61

GIDs are disjoint across ranks (round-robin), so no deduplication needed. Replace dict-based seen tracking with simple concatenation of each rank's np.unique output.

Add try/except fallback for non-parallel h5py and warn at save_weight…

ccaf27f

…s entry

Fix MPI-IO deadlock: all ranks must open dataset before write

85eb9d3

Avoid pre-filling 17 GB dataset: write ones column during parallel wr…

cc8735c

…ite phase

Use Gatherv for buffer-based MPI transfer in _init_weights (avoid pic…

a3e0be0

…kle overhead)

Revert late-alloc HDF5 hack: use simple create_dataset for portability

da76261

Clarify init timing includes MPI sync wait

c095534

format

8a7af3a

cattabiani changed the title ~~Optimize save_weights: rank-ordered layout with collective I/O~~ Optimize save_weights: rank-ordered layout with collective I/O (3) Jun 25, 2026

cattabiani requested review from WeinaJi and mgeplf June 25, 2026 07:12

cattabiani marked this pull request as ready for review June 25, 2026 07:13

Merge branch 'katta/vectorize-line-source-weights' into katta/optimiz…

4b92c59

…e-save-weights

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize save_weights: rank-ordered layout with collective I/O (3)#84

Optimize save_weights: rank-ordered layout with collective I/O (3)#84
cattabiani wants to merge 13 commits into
katta/vectorize-line-source-weightsfrom
katta/optimize-save-weights

cattabiani commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cattabiani commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Rationale

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cattabiani commented Jun 19, 2026 •

edited

Loading