Skip to content

aligned_pairs_to_sam swaps is_primary / is_proper arguments when emitting -N >0 paired records #587

@genomicsguy

Description

@genomicsguy

Severity: Medium — does not affect default-mode (-N 0) output, but silently
corrupts SAM flags whenever -N >0 is used, which is the configuration any
rescue / MAPQ-recompute post-processor needs in order to inspect candidate
alignments.

Affected version: strobealign Rust crate, 0.18.0-alpha (HEAD as vendored
2026-05-27).

File / line: src/mapper.rs:1350-1358 (the max_secondary > 0 branch in
aligned_pairs_to_sam).

The callee signature at src/mapper.rs:229-238 is:

fn make_paired_records(
    &self,
    alignments: [Option<&Alignment>; 2],
    references: &[RefSequence],
    records: [&SequenceRecord; 2],
    mapq: [u8; 2],
    details: &[Details; 2],
    is_primary: bool,
    is_proper: bool,
) -> [SamRecord; 2]

But the call site passes them swapped:

// src/mapper.rs:1350-1358
records.extend(sam_output.make_paired_records(
    [alignment1.as_ref(), alignment2.as_ref()],
    references,
    [record1, record2],
    [mapq, mapq],
    details,
    is_proper,    // ← bound to parameter `is_primary`
    is_primary,   // ← bound to parameter `is_proper`
));

The other call site in the same function (the max_secondary == 0 default
branch, lines 1330-1338) passes them in the correct order, so the bug is silent
on the default -N 0 configuration.

Effect on emitted SAM flags / MAPQ (only fires when -N >0, i.e.
max_secondary > 0):

make_mapped_record uses the (swapped) is_primary parameter to decide whether
to zero MAPQ and set the SECONDARY flag (0x100):

// src/mapper.rs:140-143
if !is_primary {
    mapq = 0;
    flags |= SECONDARY;
}

make_paired_records then uses the (swapped) is_proper parameter to set
PROPER_PAIR (0x2):

// src/mapper.rs:265-267
if is_proper {
    sam_records[i].flags |= PROPER_PAIR;
}

Resulting truth table for the four (true-primary?, true-proper?) combinations:

true is_primary true is_proper After swap, SECONDARY set? MAPQ zeroed? PROPER_PAIR set?
primary proper no (correct) no (correct) yes (correct)
primary improper yes (WRONG — primary mislabeled as secondary) yes (WRONG) yes (WRONG — improper pair labeled proper)
secondary proper no (WRONG — secondary mislabeled as primary) no (WRONG — keeps the primary's MAPQ) no (WRONG — proper pair labeled improper)
secondary improper yes (correct) yes (correct) no (correct)

Practical user impact:

  1. Discordant top alignments are emitted with SECONDARY (0x100) set and MAPQ 0.
    Any downstream tool that filters with samtools view -F 0x100 will drop them,
    even though they are the intended primary alignment for that pair. This is
    silently suppressing real placements.

  2. Concordant secondary alignments are emitted without SECONDARY (0x100) and
    with the primary's MAPQ. Variant callers and dedup tools that key on
    (qname, flag&0x100==0) will see two alignments per qname both claiming to
    be the primary.

  3. The PROPER_PAIR (0x2) flag is inverted relative to whether the pair
    actually is proper, on both primary and secondary records (proper-primary and
    improper-secondary happen to come out right by coincidence; the other two
    cases are wrong). Tools that use 0x2 as a confidence input (e.g.
    bcftools mpileup, some duplicate-aware tools) will read the wrong signal.

The default-mode SAM (-N 0) is unaffected because the other code path is
correct, which is likely why this hasn't been caught.

Reproduction: run any paired-end alignment with -N 1 (or higher) on a
fixture that contains at least some discordant pairs (e.g. real GIAB reads with
realistic insert-size variance) and inspect any read-pair where one mate's
alignment is in the 0x100-bit class:

strobealign -N 5 ref.fa R1.fq.gz R2.fq.gz \
  | awk '$5==0 && and($2,0x100)' | head    # supposed-secondary
strobealign -N 5 ref.fa R1.fq.gz R2.fq.gz \
  | awk '$5>0 && !and($2,0x100) && !and($2,0x800)' \
  | sort -k1,1 | uniq -d -f0 -w 30 | head   # qnames with two "primaries"

Suggested fix: swap the two arguments at the call site. Defensively, a
tuple struct (e.g. PairFlags { is_primary: bool, is_proper: bool }) would have
prevented this at the type-system level.

records.extend(sam_output.make_paired_records(
    [alignment1.as_ref(), alignment2.as_ref()],
    references,
    [record1, record2],
    [mapq, mapq],
    details,
    is_primary,
    is_proper,
));

A unit test that asserts flags & SECONDARY == 0 and flags & PROPER_PAIR != 0
for the first record in a -N >0 run on a discordant pair would catch this and
any future regression.

How we found it: investigating why strobealign's -N flag changes the
primary placement
(not just adds candidates).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions