The Perturb-Bench Project focuses on assessing the robustness of single-cell perturbation modelling tools by developing a unified benchmarking pipeline that enables fair comparison. Many publications report only on metrics that favour their own tools. The Perturb-Bench project is a collective effort to provide an objective and systematic comparison of these tools across a comprehensive set of metrics, offering a balanced overview of each tool’s capabilities.
To begin, we aim to:
-
Assess Extrapolation Accuracy: Evaluate how closely the predictions of generative AI (GAI) tools designed to extrapolate unseen events align with ground truth data.
-
Evaluate Digital Knockout Performance: Investigate the performance of Gene Regulatory Network inference (GRNs) tools conducting digital knockouts by comparing their results to experimental data, such as CRISPR screening outcomes.
- Benchmarking Robustness: Establish standardized benchmarks to measure methods robustness across diverse metrics, system distributions and datasets.
- Tool Development: Create a Nextflow framework to facilitate the testing and evaluation systematically.
- Community Collaboration: Engage with the ELIXIR research community to combine multidisciplinary, share findings, methodologies, and best practices.
These schemas formalize the methods flows through which Perturb-Bench will effectively compare GAIs and GRNs, ensuring consistent formats for data and results. For instance, common file formats like AnnData objects for sc-expression data/metadata will be required to be fed into metric functions across different scenarios.
Workflow:
- Load the dataset.
- Pre-process the data (clarify specific pre-processing steps).
- Train and test the model:
- Activate the model instance.
- Perform hyperparameter tuning.
- Train the model.
- Generate predictions for control and stimulated scenarios (output as AnnData objects).
Outputs:
- R² Score: Measures the closeness of predictions to stimulation data distributions.
- Distance Metrics: Includes Euclidean distance, E distance, Maximum Mean Discrepancy, etc.
Workflow:
- Load the dataset.
- Pre-process the data (clarify specific pre-processing steps).
- Reconstruct the Gene Regulatory Network (GRN).
- Define the target for simulation.
- Optionally, specify the cell type for simulation.
Outputs:
- KO-Responsive Genes: A list of genes responsive to knockouts.
- Validation Metric:
- Compare results against iLINCS ground truth using Jaccard Similarity.
- Optionally, validate with other experimental perturbation datasets.
