Release/5.3.0 by rhshah · Pull Request #20 · msk-access/gbcms

rhshah · 2026-05-16T10:04:17Z

No description provided.

…us priority

…n/check_deletion

- Add has_structural_alt field (sticky flag, same pattern as has_n_base) - Add is_structural param to observe() with debug_assert guard - Add structural ALT priority in resolve(): CIGAR I/D evidence wins over non-structural REF when fragment reads disagree on INDELs - Full doc comments including known Phase 3 limitation

- Line 1231: count_variant_from_cache (main counting path) - Line 1737: count_single_variant (discovered during compilation) - Line 2320: count_per_transcript (P4b per-transcript counting) - resolve() call sites (1378, 1862, 2336) unchanged — structural info is already inside FragmentEvidence by resolve() time

…ority - 9 test cases: INS/DEL conflict, agreement, singleton, SNP tie, REF agree - All 4 counting invariants asserted (code-quality.md) - Both count_bam and count_bam_binned parity verified via count_both() - SNP regression test (test 5) confirms unchanged quality consensus

…tural priority - Test 6: INS R1=I(2) → Phase 3, R2=I(1) → structural → ALT wins - Test 7: DEL R1=D(2) → Phase 3, R2=D(1) → structural → ALT wins - Confirms wrong-length reads don't set is_structural (Phase 3 non-structural) - 11 tests total (up from 9), all with invariant + parity assertions

…ogic - 8 resolve() tests: structural wins, quality wins, tie discard, edge cases - 3 observe() tests: sticky flag, REF-structural guard, non-structural ALT - 161 Rust tests total (up from 150), 0 Clippy warnings - Tests isolated from BAM I/O for fast regression detection

- counting-metrics.md: Phase B flowchart now shows structural priority diamond before quality comparison, new table row, info admonition with Phase 3 scope and rationale, FragmentEvidence description updated - architecture.md: comparison table updated for consistency - Both docs mention the CIGAR I/D priority mechanism and its scope

- Test Categories: added test_indel_fragment_consensus.py entry - File tree: added in alphabetical position - Rust test table: added fragment consensus row (11 unit tests) - Accuracy Validation: added INDEL fragment consensus row (11 cases) - Updated test counts: 266 Python + 161 Rust tests

… priority - code-quality.md: add test_indel_fragment_consensus.py to test table, update test counts to 266 Python + 161 Rust - parameters.md: update fragment_qual_threshold description to note that INDEL structural evidence bypasses the quality threshold

Resolves systematic fragment-level INDEL discard caused by anchor-BQ tie in quality-weighted consensus. Structural CIGAR I/D evidence now wins unconditionally over base-quality comparison for INDEL conflicts. Changes: - ClassifyResult.is_structural flag + is_alt_structural() constructor - 6 CIGAR I/D return sites tagged structural in variant_checks.rs - FragmentEvidence.has_structural_alt sticky flag + resolve() priority - 3 engine.rs observe() call sites threaded - 11 Python integration tests (INS/DEL conflict, wrong-length, regression) - 11 Rust unit tests (resolve/observe branch coverage) - Documentation: counting-metrics, architecture, testing-guide, CHANGELOG, agent rules, Nextflow parameters Test results: 266 Python + 161 Rust tests pass, 0 Clippy warnings

Steps 0-6 of the merge implementation plan: - build: add polars>=1.0.0 as core dependency - feat(io): add centralized io/batch.py (read_maf, scan_maf, read_parquet, write_maf) - refactor(mfsd): replace pyarrow+csv with Polars batch helpers - feat(models): add MergeConfig Pydantic model with validation - feat(merge): add Polars-based multi-BAM MAF merge engine - feat(cli): add gbcms merge subcommand (--input type:path) - test: add 23 tests (7 batch_io + 15 merge + 1 CLI integration) All 289 tests pass. Zero performance warnings. ruff clean.

GAP A: Fill meta columns (gbcms_status, etc.) with '' instead of null after outer join. Prevents null propagation to downstream tools. GAP B: Add comprehensive 'different row count' tests: - test 16: 5 vs 3 rows with 2 overlapping (asymmetric counts) - test 17: combined columns with unmatched variants - test 18: annotation column nulls for right-only variants - test 19: meta column null-fill verification GAP C: Log unmatched variant counts per type for monitoring. GAP E: test 20: CLI invalid input format error path. All 294 tests pass. merge.py at 96% coverage.

Step A: Expose fisher_exact_2x2 to Python via PyO3 - Added #[pyfunction] wrapper in shared/stats.rs - Registered in lib.rs as gbcms._rs.fisher_exact_2x2 - Same Rust implementation as per-BAM engine → numerically identical Step B: Expand combined additive columns - Read-level: ref_count, alt_count → total_count, vaf - Read strand: ref/alt_count_forward/reverse (4 cols) - Fragment-level: ref/alt_count_fragment → total_count_fragment, vaf_fragment - Fragment strand: ref/alt_count_fragment_forward/reverse (4 cols) - Schema-aware: only computes columns that exist in input MAFs Step C: Combined strand bias via Rust Fisher exact test - simplex_duplex_strand_bias_p_value + odds_ratio (read-level) - simplex_duplex_fragment_strand_bias_p_value + odds_ratio (fragment-level) - Computed per-row using the Rust fisher_exact_2x2 function Total: 20 combined columns when all inputs have full column set. Graceful degradation when strand columns are absent. Tests: 4 new tests (21-24), 298 total passing.

Steps 7-8 of merge implementation plan: - Nextflow MERGE_COUNTS process with bam_type samplesheet grouping - Auto-derive --column-prefix from meta.bam_type in DNA module - docs/cli/merge.md CLI reference - mkdocs.yml nav entry - Nextflow params/samplesheet docs (bam_type, merge options) - release-guide.md updated (9 version locations) - CHANGELOG.md comprehensive Unreleased section - __init__.py exports (merge_mafs, MergeConfig) - _rs.pyi type stub (fisher_exact_2x2) - architecture.md + output-formats.md updated 298 tests | 0 lint errors | No breaking changes

…tion

Squash merging rewrites commit SHAs, breaking shared ancestry between main and develop. This causes merge conflicts on every changed file during the back-merge step. Regular merge commits preserve history and make back-merges conflict-free.

Canonical release notes live in CHANGELOG.md. This standalone file was a leftover from v5.1.0 and is not referenced by any CI, docs, or release automation.

When bam_type is set but suffix is not provided, the output suffix is now automatically derived as '-{bam_type}' (e.g., bam_type=duplex produces suffix=-duplex → sample1-duplex.maf). Priority: explicit suffix > auto-derived from bam_type > global params.suffix This eliminates redundancy in samplesheet rows where suffix was always just '-{bam_type}'.

Errors out early if multiple samplesheet rows share the same sample ID without a distinguishing suffix or bam_type, which would cause output MAFs to silently overwrite each other.

…tization and docs

rhshah · 2026-05-16T10:05:53Z

#19

rhshah added 25 commits May 14, 2026 17:55

fix(rust): add is_structural flag to ClassifyResult for INDEL consens…

27f0433

…us priority

fix(rust): mark CIGAR I/D ALT returns as structural in check_insertio…

67055e9

…n/check_deletion

docs: add INDEL consensus fix to CHANGELOG

fce37ef

docs: update CHANGELOG with final test counts (266 Python + 161 Rust)

128965a

Merge feature/merge-multi-bam: gbcms merge command + Nextflow integra…

40e90d1

…tion

Merge main (5.2.0) back into develop

92c7063

chore: remove redundant release_notes.md

e5e5fff

Canonical release notes live in CHANGELOG.md. This standalone file was a leftover from v5.1.0 and is not referenced by any CI, docs, or release automation.

feat: validate duplicate sample IDs without suffix/bam_type

9d4e4b2

Errors out early if multiple samplesheet rows share the same sample ID without a distinguishing suffix or bam_type, which would cause output MAFs to silently overwrite each other.

feat: v5.3.0 — CRAM support, provenance headers, strand bias NaN sani…

bfffa89

…tization and docs

chore: bump version to 5.3.0

4516687

rhshah self-assigned this May 16, 2026

docs: add explicit CRAM example to nextflow docs

3dfb8fe

rhshah merged commit e1bb985 into main May 16, 2026
9 checks passed

rhshah deleted the release/5.3.0 branch May 16, 2026 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release/5.3.0#20

Release/5.3.0#20
rhshah merged 26 commits into
mainfrom
release/5.3.0

rhshah commented May 16, 2026

Uh oh!

rhshah commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rhshah commented May 16, 2026

Uh oh!

rhshah commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant