Skip to content

Pre-filtering steps and requirements per sample #17

@joannawolthuis

Description

@joannawolthuis

Hi there! I'm faced with a dataset where I can have up to 90% missing values for a given feature (m/z value but equivalent to a protein readout), and that means there are many cases where some samples have all NA's for all of their 3 replicates.
What would you recommend in terms of input filtering? Would you make sure that a feature has at least 1 replicate present for every sample? Or set some kind of maximum missing value threshold (ie. don't include features that have more than X% of samples missing)?

Currently I remove all m/z that have more than 90% of samples missing, but that leaves me with approx. 60 000 features and approx. 1500 samples.

I'm noticing that I can run the algorithm but it's not converging (at least not after 48 hours).
Thanks :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions