This repository contains a data processing and evaluation pipeline for analyzing protein complexes generated by various Large Language Models (LLMs) such as ChatGPT, Perplexity, Claude, Deepseek, Llama, and Gemini.
The end-to-end pipeline is orchestrated by pipeline.py, which automates data integration, consensus tracking, metric calculation, and visualization plotting.
You can run the full pipeline simply by executing:
python pipeline.pyThe pipeline supports skipping specific steps if you only want to run parts of the analysis:
--skip-steps 1 2: Skips the given step numbers (e.g., this skips Step 1 and Step 2).--skip-first-three: A shortcut flag to skip the data integration, consensus, and metric calculation steps (Steps 1, 2, and 3). This is useful if you just want to quickly regenerate plots using previously computed data.
- Scripts:
integrating_complexes_f1.py,integrating_complexes_bridges.py - Description: Processes the raw generated complex files from various LLMs located in the
sources/directory. It evaluates and refines these into base complexes (True Positives) and extracts "bridges" connecting different complexes. The refined outputs are saved into theintegrated_tp/andintegrated_bridges/directories.
- Scripts:
integrating_complexes_voting.py - Description: Applies a consensus or voting mechanism across the different LLMs' outputs (both raw and bridged). This aggregates the individual models' predictions into robust consensus datasets and saves them into the
integrated_voting/directory.
- Scripts:
calculate_f1.py,calculate_graph_density.py,calculate_graph_density_stringdb.py - Description: Evaluates all generated datasets (raw outputs, base complexes, bridged complexes, and consensus models) against ground truth data such as
verified_complexes.jsonand STRING DB. It computes key performance metrics including F1 scores and Graph Density, outputting the results to their respective directories (results_f1/,results_graph_density/, andresults_graph_density_sdb/).
- Scripts:
figure_plotting/_regenerate_plots.py - Description: Reads the metric results calculated in Step 3 to automatically generate the primary figures and charts used for analysis and presentation.
- Scripts:
plotting/wordcloud.pyand various scripts insupplemental_plotting/(e.g.,heatmap_consensus_comparison.py,llm_comparison.py, scatterplots) - Description: Generates supplementary visualizations to provide deeper insight into the data. These include word clouds, heatmaps comparing consensus methods, LLM head-to-head comparisons, and scatter plots correlating F1 scores with graph densities.