How is sequencing saturation calculated #2204

Liripo · 2024-08-29T15:12:20Z

Liripo
Aug 29, 2024

I checked the 10x calculation https://kb.10xgenomics.com/hc/en-us/articles/115003646912-How-is-sequencing-saturation-calculated and I wrote the following code, but the calculation results are different from those of STARsolo：

import pysam
import polars as pl

with pysam.AlignmentFile("Aligned.sortedByCoord.out.bam") as bam:
    cbs = []
    ubs = []
    gxs = []
    dup = []
    for line in bam:
        if line.is_unmapped:
            continue
        cbs.append(line.get_tag("CB"))
        ubs.append(line.get_tag("UB"))
        gxs.append(line.get_tag("GX"))
        dup.append(line.is_duplicate)
    df = pl.DataFrame({
        "CB": cbs,
        "UB": ubs,
        "GX": gxs,
        "dup": dup
    })


duplicate_reads = sum(df['dup'] == True)

unique_confidently_mapped_reads = (
    df.filter(pl.col('CB') != "-",pl.col('GX') != "-")
    [:,['CB','UB','GX']].unique()
).height

1 - unique_confidently_mapped_reads / (unique_confidently_mapped_reads + duplicate_reads)

I want to ask where I went wrong

shayanjl · 2026-04-05T22:59:56Z

shayanjl
Apr 5, 2026

I think the main issue is that line.is_duplicate is not the same as the duplication concept used for sequencing saturation in single-cell data. Sequencing saturation is usually calculated at the molecule level, based on valid CB/UB/gene combinations, not simply from the BAM duplicate flag. So the difference is likely coming from counting duplicates with is_duplicate, counting at the read level instead of the molecule level, and not exactly matching STARsolo’s filtering rules for valid barcodes, UMIs, genes, and confidently mapped reads. In practice, you would usually want to group by something like CB + UB + GX and compare total reads versus unique molecules after applying the same filters STARsolo uses.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is sequencing saturation calculated #2204

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How is sequencing saturation calculated #2204

Uh oh!

Liripo Aug 29, 2024

Replies: 1 comment

Uh oh!

shayanjl Apr 5, 2026

Liripo
Aug 29, 2024

shayanjl
Apr 5, 2026