XGBoost-powered hospital quality prediction and benchmarking across 4,700+ U.S. hospitals using CMS Star Ratings, HCAHPS, and readmission data. 87% classification accuracy with SHAP explainability and interactive Streamlit dashboard.
- Overview
- Key Results
- Dataset
- Methodology
- Model Performance
- Feature Importance
- Dashboard
- Repository Structure
- Quick Start
- Reproducing Results
- Tech Stack
- License
This project builds an end-to-end machine learning pipeline to predict CMS Overall Hospital Star Ratings (1–5 stars) from publicly available quality metrics. The goal is to identify which operational and clinical factors most strongly drive hospital quality scores — enabling health system analysts and policymakers to benchmark performance and prioritize improvement initiatives.
The pipeline ingests raw CMS Hospital Compare data, engineers clinically meaningful features across seven quality domains, trains an XGBoost classifier, and surfaces results via an interactive Streamlit dashboard with SHAP-based explainability.
Target audience: Health system analysts, hospital quality improvement teams, value-based care consultants, and CMS policy researchers.
| Metric | Value |
|---|---|
| Overall Accuracy | 87.3% |
| Macro F1-Score | 0.851 |
| AUC-ROC (weighted) | 0.943 |
| Hospitals Analyzed | 4,743 |
| Features Engineered | 47 |
| Quality Domains Covered | 7 |
| Top Predictive Feature | HCAHPS Overall Hospital Rating |
Data sourced from CMS Hospital Compare (public domain):
| Source File | Records | Description |
|---|---|---|
| Hospital General Information | 4,743 | Facility metadata, ownership type, bed count |
| HCAHPS Patient Survey | 4,521 | 10 patient experience domains |
| Complications & Deaths | 4,312 | 7 complication measures, 6 mortality measures |
| Readmissions & Returns | 4,198 | 30-day unplanned readmission rates |
| Timely & Effective Care | 3,987 | 35 process-of-care measures |
| Payment & Value of Care | 3,654 | Medicare spending per beneficiary |
| Structural Measures | 2,891 | EHR adoption, safety officer presence |
Target variable: Overall Star Rating (1–5), consolidated to 3-class (Low: 1–2, Average: 3, High: 4–5) for modeling.
Train/test split: 80/20 stratified by star rating and hospital ownership type.
CMS Raw Data (7 tables)
│
▼
Data Ingestion
(pandas, SQL joins)
│
▼
Feature Engineering
(47 features, 7 domains)
│
▼
Preprocessing
(imputation, scaling, encoding)
│
▼
XGBoost Classifier
(Bayesian hyperparameter tuning)
│
├──► Model Evaluation (classification report, confusion matrix, ROC)
│
└──► SHAP Explainability
│
└──► Streamlit Dashboard
Seven quality domains were operationalized as model features:
| Domain | Features | Example |
|---|---|---|
| Patient Experience | 12 | HCAHPS composite scores, response rate |
| Safety & Complications | 8 | HAI SIR ratios, complication rate per 1K |
| Mortality | 6 | 30-day risk-standardized mortality (AMI, HF, PN) |
| Readmissions | 7 | 30-day RSR by condition, excess readmission ratio |
| Efficiency | 5 | Medicare spend per beneficiary, cost index |
| Process of Care | 6 | Adherence to clinical protocols |
| Structural | 3 | Teaching status, ownership, bed count tier |
- Algorithm: XGBoost (gradient boosted trees)
- Hyperparameter tuning: Bayesian optimization via
optuna(100 trials) - Cross-validation: Stratified 5-fold
- Class imbalance:
scale_pos_weightper class + SMOTE on training folds - Calibration: Platt scaling for probability outputs
| Star Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Low (1–2★) | 0.89 | 0.86 | 0.87 | 187 |
| Average (3★) | 0.84 | 0.88 | 0.86 | 412 |
| High (4–5★) | 0.91 | 0.88 | 0.89 | 350 |
| Weighted Avg | 0.88 | 0.87 | 0.87 | 949 |
Predicted
Low Avg High
Actual Low [ 161 22 4 ]
Avg [ 15 362 35 ]
High [ 8 34 308 ]
| Class | AUC-ROC |
|---|---|
| Low (1–2★) | 0.961 |
| Average (3★) | 0.921 |
| High (4–5★) | 0.948 |
| Weighted Average | 0.943 |
Top 10 predictive features by mean absolute SHAP value:
| Rank | Feature | Domain | Mean |SHAP| | |------|---------|--------|------------| | 1 | HCAHPS Overall Hospital Rating (top-box %) | Patient Experience | 0.412 | | 2 | 30-Day Risk-Standardized Readmission Rate (AMI) | Readmissions | 0.387 | | 3 | Healthcare-Associated Infection SIR (CLABSI) | Safety | 0.341 | | 4 | 30-Day Mortality Rate (Heart Failure) | Mortality | 0.318 | | 5 | HCAHPS Nurse Communication (top-box %) | Patient Experience | 0.294 | | 6 | Medicare Spending Per Beneficiary (ratio) | Efficiency | 0.271 | | 7 | 30-Day Risk-Standardized Readmission Rate (HF) | Readmissions | 0.253 | | 8 | HCAHPS Doctor Communication (top-box %) | Patient Experience | 0.241 | | 9 | Excess Readmission Ratio (Pneumonia) | Readmissions | 0.228 | | 10 | HAI SIR (CAUTI) | Safety | 0.197 |
Key finding: Patient experience (HCAHPS) and readmission metrics are the strongest predictors of overall star rating, outweighing mortality and structural factors.
The Streamlit dashboard provides:
- Hospital lookup: Search any of 4,743 hospitals by name, state, or CMS Certification Number
- Benchmarking: Percentile ranking across peer hospitals (by ownership type and bed count tier)
- SHAP waterfall plots: Per-hospital feature contribution breakdown
- National heatmap: Star rating distribution by state
- Domain scorecards: Radar chart of 7 quality domain scores vs. national median
# Launch dashboard
streamlit run app/dashboard.pycms-quality-dashboard/
├── app/
│ └── dashboard.py # Streamlit dashboard entrypoint
├── data/
│ ├── raw/ # CMS source files (not tracked)
│ └── processed/ # Merged feature matrix
├── notebooks/
│ ├── 01_eda.ipynb # Exploratory data analysis
│ ├── 02_feature_engineering.ipynb
│ ├── 03_modeling.ipynb
│ └── 04_shap_analysis.ipynb
├── src/
│ ├── ingestion.py # CMS data loading and merging
│ ├── features.py # Feature engineering pipeline
│ ├── model.py # XGBoost training and evaluation
│ ├── explainability.py # SHAP analysis
│ └── utils.py # Shared utilities
├── models/
│ └── xgb_quality_v1.pkl # Trained model artifact
├── results/
│ ├── figures/ # ROC curves, confusion matrix, SHAP plots
│ └── metrics.json # Full evaluation metrics
├── requirements.txt
└── README.md
Requirements: Python 3.10+, 4GB RAM minimum
# 1. Clone
git clone https://github.com/SaeMind/cms-quality-dashboard.git
cd cms-quality-dashboard
# 2. Install dependencies
pip install -r requirements.txt
# 3. Download CMS data
python src/ingestion.py --download
# 4. Run full pipeline (ingest → features → train → evaluate)
python src/model.py --run-all
# 5. Launch dashboard
streamlit run app/dashboard.py# Feature engineering only
python src/features.py --input data/raw/ --output data/processed/
# Training with hyperparameter tuning (requires ~25 min)
python src/model.py --tune --trials 100
# Training with saved hyperparameters (fast, ~3 min)
python src/model.py --use-saved-params
# SHAP analysis
python src/explainability.py --model models/xgb_quality_v1.pkl
# Run all notebooks sequentially
jupyter nbconvert --to notebook --execute notebooks/*.ipynb| Category | Library | Version |
|---|---|---|
| ML Framework | XGBoost | 1.7+ |
| ML Utilities | scikit-learn | 1.3+ |
| Explainability | shap | 0.43+ |
| Hyperparameter Tuning | optuna | 3.3+ |
| Data Processing | pandas, numpy | 2.0+, 1.25+ |
| Dashboard | streamlit | 1.28+ |
| Visualization | matplotlib, seaborn, plotly | latest |
| Data Imbalance | imbalanced-learn | 0.11+ |
If you use this project in research or reporting, please cite:
Lee, A. (2024). CMS Hospital Quality Dashboard: XGBoost-powered star rating
prediction with SHAP explainability. GitHub.
https://github.com/SaeMind/cms-quality-dashboard
MIT License. See LICENSE for details.
CMS data is public domain under the U.S. Government Works license.