Replies: 1 comment
-
|
I think the main issue is that line.is_duplicate is not the same as the duplication concept used for sequencing saturation in single-cell data. Sequencing saturation is usually calculated at the molecule level, based on valid CB/UB/gene combinations, not simply from the BAM duplicate flag. So the difference is likely coming from counting duplicates with is_duplicate, counting at the read level instead of the molecule level, and not exactly matching STARsolo’s filtering rules for valid barcodes, UMIs, genes, and confidently mapped reads. In practice, you would usually want to group by something like CB + UB + GX and compare total reads versus unique molecules after applying the same filters STARsolo uses. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I checked the 10x calculation
https://kb.10xgenomics.com/hc/en-us/articles/115003646912-How-is-sequencing-saturation-calculatedand I wrote the following code, but the calculation results are different from those of STARsolo:I want to ask where I went wrong
Beta Was this translation helpful? Give feedback.
All reactions