Skip to content

Significant variation in GRNBoost2 results with minor cell subsampling (removing one cell) in pySCENIC #623

@jklupup

Description

@jklupup

When running the GRN step in pySCENIC, I observed substantial differences in the output adjacencies.csv after removing just one cell from the expression matrix. Specifically:

Using the ​​full expression matrix​​ (e.g., thousands of cells) vs. a matrix ​​missing one cell​​ yields only ​​56.21% overlap in TF-target pairs​​.
This level of variability seems unexpectedly high for a dataset of this scale.

I wonder if it's something wrong with my code

Code:

if [ ! -f grn.SUCCESS ]; then
    arboreto_with_multiprocessing.py \
      $count_loom \
      $tf_list \
      --num_workers 16 \
      --output adjacencies.csv \
      --method grnboost2 \
      --sparse \
      --seed 1 \
    && touch grn.SUCCESS
fi

if [ ! -f grn.SUCCESS ]; then echo "grn error"; exit 1; fi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions