Release/5.3.0#20
Merged
Merged
Conversation
- Add has_structural_alt field (sticky flag, same pattern as has_n_base) - Add is_structural param to observe() with debug_assert guard - Add structural ALT priority in resolve(): CIGAR I/D evidence wins over non-structural REF when fragment reads disagree on INDELs - Full doc comments including known Phase 3 limitation
- Line 1231: count_variant_from_cache (main counting path) - Line 1737: count_single_variant (discovered during compilation) - Line 2320: count_per_transcript (P4b per-transcript counting) - resolve() call sites (1378, 1862, 2336) unchanged — structural info is already inside FragmentEvidence by resolve() time
…ority - 9 test cases: INS/DEL conflict, agreement, singleton, SNP tie, REF agree - All 4 counting invariants asserted (code-quality.md) - Both count_bam and count_bam_binned parity verified via count_both() - SNP regression test (test 5) confirms unchanged quality consensus
…tural priority - Test 6: INS R1=I(2) → Phase 3, R2=I(1) → structural → ALT wins - Test 7: DEL R1=D(2) → Phase 3, R2=D(1) → structural → ALT wins - Confirms wrong-length reads don't set is_structural (Phase 3 non-structural) - 11 tests total (up from 9), all with invariant + parity assertions
…ogic - 8 resolve() tests: structural wins, quality wins, tie discard, edge cases - 3 observe() tests: sticky flag, REF-structural guard, non-structural ALT - 161 Rust tests total (up from 150), 0 Clippy warnings - Tests isolated from BAM I/O for fast regression detection
- counting-metrics.md: Phase B flowchart now shows structural priority diamond before quality comparison, new table row, info admonition with Phase 3 scope and rationale, FragmentEvidence description updated - architecture.md: comparison table updated for consistency - Both docs mention the CIGAR I/D priority mechanism and its scope
- Test Categories: added test_indel_fragment_consensus.py entry - File tree: added in alphabetical position - Rust test table: added fragment consensus row (11 unit tests) - Accuracy Validation: added INDEL fragment consensus row (11 cases) - Updated test counts: 266 Python + 161 Rust tests
… priority - code-quality.md: add test_indel_fragment_consensus.py to test table, update test counts to 266 Python + 161 Rust - parameters.md: update fragment_qual_threshold description to note that INDEL structural evidence bypasses the quality threshold
Resolves systematic fragment-level INDEL discard caused by anchor-BQ tie in quality-weighted consensus. Structural CIGAR I/D evidence now wins unconditionally over base-quality comparison for INDEL conflicts. Changes: - ClassifyResult.is_structural flag + is_alt_structural() constructor - 6 CIGAR I/D return sites tagged structural in variant_checks.rs - FragmentEvidence.has_structural_alt sticky flag + resolve() priority - 3 engine.rs observe() call sites threaded - 11 Python integration tests (INS/DEL conflict, wrong-length, regression) - 11 Rust unit tests (resolve/observe branch coverage) - Documentation: counting-metrics, architecture, testing-guide, CHANGELOG, agent rules, Nextflow parameters Test results: 266 Python + 161 Rust tests pass, 0 Clippy warnings
Steps 0-6 of the merge implementation plan: - build: add polars>=1.0.0 as core dependency - feat(io): add centralized io/batch.py (read_maf, scan_maf, read_parquet, write_maf) - refactor(mfsd): replace pyarrow+csv with Polars batch helpers - feat(models): add MergeConfig Pydantic model with validation - feat(merge): add Polars-based multi-BAM MAF merge engine - feat(cli): add gbcms merge subcommand (--input type:path) - test: add 23 tests (7 batch_io + 15 merge + 1 CLI integration) All 289 tests pass. Zero performance warnings. ruff clean.
GAP A: Fill meta columns (gbcms_status, etc.) with '' instead of null
after outer join. Prevents null propagation to downstream tools.
GAP B: Add comprehensive 'different row count' tests:
- test 16: 5 vs 3 rows with 2 overlapping (asymmetric counts)
- test 17: combined columns with unmatched variants
- test 18: annotation column nulls for right-only variants
- test 19: meta column null-fill verification
GAP C: Log unmatched variant counts per type for monitoring.
GAP E: test 20: CLI invalid input format error path.
All 294 tests pass. merge.py at 96% coverage.
Step A: Expose fisher_exact_2x2 to Python via PyO3 - Added #[pyfunction] wrapper in shared/stats.rs - Registered in lib.rs as gbcms._rs.fisher_exact_2x2 - Same Rust implementation as per-BAM engine → numerically identical Step B: Expand combined additive columns - Read-level: ref_count, alt_count → total_count, vaf - Read strand: ref/alt_count_forward/reverse (4 cols) - Fragment-level: ref/alt_count_fragment → total_count_fragment, vaf_fragment - Fragment strand: ref/alt_count_fragment_forward/reverse (4 cols) - Schema-aware: only computes columns that exist in input MAFs Step C: Combined strand bias via Rust Fisher exact test - simplex_duplex_strand_bias_p_value + odds_ratio (read-level) - simplex_duplex_fragment_strand_bias_p_value + odds_ratio (fragment-level) - Computed per-row using the Rust fisher_exact_2x2 function Total: 20 combined columns when all inputs have full column set. Graceful degradation when strand columns are absent. Tests: 4 new tests (21-24), 298 total passing.
Steps 7-8 of merge implementation plan: - Nextflow MERGE_COUNTS process with bam_type samplesheet grouping - Auto-derive --column-prefix from meta.bam_type in DNA module - docs/cli/merge.md CLI reference - mkdocs.yml nav entry - Nextflow params/samplesheet docs (bam_type, merge options) - release-guide.md updated (9 version locations) - CHANGELOG.md comprehensive Unreleased section - __init__.py exports (merge_mafs, MergeConfig) - _rs.pyi type stub (fisher_exact_2x2) - architecture.md + output-formats.md updated 298 tests | 0 lint errors | No breaking changes
Squash merging rewrites commit SHAs, breaking shared ancestry between main and develop. This causes merge conflicts on every changed file during the back-merge step. Regular merge commits preserve history and make back-merges conflict-free.
Canonical release notes live in CHANGELOG.md. This standalone file was a leftover from v5.1.0 and is not referenced by any CI, docs, or release automation.
When bam_type is set but suffix is not provided, the output suffix
is now automatically derived as '-{bam_type}' (e.g., bam_type=duplex
produces suffix=-duplex → sample1-duplex.maf).
Priority: explicit suffix > auto-derived from bam_type > global params.suffix
This eliminates redundancy in samplesheet rows where suffix was
always just '-{bam_type}'.
Errors out early if multiple samplesheet rows share the same sample ID without a distinguishing suffix or bam_type, which would cause output MAFs to silently overwrite each other.
…tization and docs
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.