Improve genomeGenerate multicore index build performance#2687
Improve genomeGenerate multicore index build performance#2687justinblethrow-cloud wants to merge 3 commits into
Conversation
Add a local benchmark script for genomeGenerate that records wall time, CPU and I/O samples, selected index-build settings, and per-stage timings from Log.out. This keeps performance validation reproducible while leaving STAR runtime behavior unchanged.
Use parallel traversal for SAindex generation and parallel bucket sorting for junction insertion indices. Keep deterministic output ordering while reducing the remaining annotation-heavy genomeGenerate stages.
692ceef to
fb426f8
Compare
|
Cleaned the branch history for review. The PR now has three logical commits: benchmark harness, SAindex/junction parallelization, and suffix-array construction improvements. The cleaned tree is identical to the previously pushed tree (verified by matching git tree hashes), so the benchmark and byte-identity results in the PR description still apply. Local validation after cleanup: |
Reduce genomeGenerate suffix-array build time by batching prefix-bin fills, retaining sorted chunks in RAM when memory allows, splitting very large prefix bins into ordered sub-bins, and using a comparator that can skip already-known prefix words. Optional SA sort profiling remains gated behind STAR_PROFILE_SA_SORT=1.
fb426f8 to
47e6051
Compare
|
Update after final hardening pass:
One broader skip-first-word safety guard was tested but not kept because it regressed suffix-sort time from the good ~238s profiled sort regime back to ~275s; the genomeGenerate buffer already has leading padding, and the retained guard is limited to the extra sub-bin prefix read. |
Summary
genomeGeneratemulti-core throughput for large referencesextras/tests/scripts/benchmarkGenomeGenerate.shto capture wall time, CPU/I/O samples, and genomeGenerate stage timingsBenchmark notes
--genomeSAindexNbases 14,--genomeChrBinNbits 18,--limitGenomeGenerateRAM 300000000000results/full_chm13_prfinal_20260520fill_count_seconds=1.629,fill_scatter_seconds=36.699,sort_seconds=237.783,finalize_seconds=1.788Genome,SA,SAindex, chromosome metadata, andsjdbList.out.tabagainst the prior accepted optimized buildValidation
make -C source STARbenchmarkGenomeGenerate.shruns at 1 and 16 threadsSTAR_PROFILE_SA_SORT=1Notes for review
extras/tests/scripts/.STAR_PROFILE_SA_SORT=1; normal runs do not emit those profiling details.