This repository contains a comprehensive analysis pipeline for predicting Ventilator-Associated Pneumonia (VAP) onset in mechanically ventilated ICU patients. The project is organized into three main cohorts:
- Script - Main cohort where models are trained (foundation)
- MIMIC - External cohort validation using MIMIC-IV data
- Amikacin - External cohort validation using French Amikacinhal data
PredictingVAPExternal/
├── script/ # Main cohort (Script data)
│ ├── data/ # Script cohort datasets
│ ├── src/ # Source code (modeling, figures, tables)
│ ├── scripts/ # Bash scripts to run pipelines
│ ├── final_models/ # Final trained models (Random CV)
│ ├── figures/ # Generated figures
│ └── Tables/ # Generated tables
├── mimic/ # MIMIC-IV external cohort
│ ├── data/ # MIMIC data (raw, processed, labeled)
│ ├── src/ # MIMIC-specific analysis code
│ ├── scripts/ # Bash scripts for MIMIC analysis
│ └── results/ # MIMIC results (figures, tables, modeling)
├── amikacin/ # Amikacin external cohort
│ ├── data/ # Amikacin data (raw, processed, labeled)
│ ├── src/ # Amikacin-specific analysis code
│ ├── scripts/ # Bash scripts for Amikacin analysis
│ └── results/ # Amikacin results (figures, tables, modeling)
└── requirements.txt # Python dependencies
Important: Run analyses in this order:
- Script → Train models on main cohort
- MIMIC → Validate on MIMIC-IV external cohort (uses Script models)
- Amikacin → Validate on French Amikacinhal cohort (uses Script models)
# 1. Script (Main Cohort)
cd script/scripts
./run_all.sh
# 2. MIMIC (External Cohort)
cd ../../mimic/scripts
./run_figures.sh
./run_tables.sh
./run_modeling.sh
# 3. Amikacin (External Cohort)
cd ../../amikacin/scripts
./run_france_all.shOr run all cohorts sequentially:
./run_all_cohorts.sh- Script Cohort - Main cohort modeling, figures, and tables
- MIMIC Cohort - MIMIC-IV external validation
- Amikacin Cohort - French Amikacinhal external validation
- Python 3.8+
- Virtual environment (recommended)
# Create virtual environment
python3 -m venv vap_onset
source vap_onset/bin/activate # macOS/Linux
# or
vap_onset\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt- Multiple Model Types: Random Forest, XGBoost, Logistic Regression, LSTM
- Multiple Prediction Windows: 3, 5, and 7-day VAP prediction windows
- Cross-Validation Strategies: Random CV (final), Temporal CV (archived)
- Comprehensive Analysis: Figures, tables, feature importance, timeline analysis
- External Validation: Validated on MIMIC-IV and French Amikacinhal cohorts
- Models:
script/final_models/results_random_cv_20251031_115429/ - Figures:
script/figures/ - Tables:
script/Tables/
- Figures:
mimic/results/figures/ - Tables:
mimic/results/tables/ - Modeling:
mimic/results/modeling/
- Figures:
amikacin/results/figures/ - Tables:
amikacin/results/tables/ - Modeling:
amikacin/results/modeling/
- All scripts include error handling and continue even if some steps fail
- Output directories are timestamped to prevent overwriting
- Each cohort has its own configuration file (
scripts/config.sh) - See individual cohort READMEs for detailed usage instructions