Skip to content

jkowalczyk08/PathMNIST

Repository files navigation

PathMNIST Colorectal Tissue Classifier: Generalization & Explainability

Deep learning pipeline for 9-class colorectal cancer tissue classification on the PathMNIST benchmark.

View/Download Project Presentation Slide Deck (PDF)

Summary

  • We Beat the MedMNIST Baseline: Our optimized ResNet-18 achieves 93.59% test accuracy, significantly outperforming the official MedMNIST ResNet-50 baseline (91.1%) while being 2.3x smaller (~11M vs. ~25M parameters).
  • Solved Clinical Domain Shift: We benchmarked two core domain generalization techniques—Domain Randomization (Color Jitter) and Physical Stain Normalization (Macenko)—to make the model invariant to hospital-specific H&E stain variations.
  • Quantified Explainability (XAI): We verified model decisions qualitatively using Grad-CAM and quantitatively via Pixel Deletion Tests (faithfulness metric) and t-SNE latent space analysis.

📊 Quick Visual Tour

Quantitative Benchmark Latent Clustering Separation (t-SNE)
Metrics Table t-SNE ResNet18
Our ResNet-18 outperforms deeper baselines. Color Jitter (center) creates highly compact, separated tissue clusters.
Grad-CAM Visual Attributions Explanation Faithfulness (Pixel Deletion Test)
Grad-CAM ResNet18 Deletion Test ResNet18
Color Jitter forces the model to focus on clinical cellular nuclei structures. Steep curve proves the model relies directly on the Grad-CAM highlighted features.

Key Competencies Demonstrated

  • Rigorous Domain Shift Handling: Practical implementation and comparison of physical stain normalization (Macenko) vs. domain randomization (Color Jitter) to overcome clinical covariate shift.
  • Addressing Dataset Bias: Overcoming unstratified and highly imbalanced data splits.
  • Quantitative & Qualitative XAI: Going beyond "pretty pictures" by quantifying CAM faithfulness using Pixel Deletion Curves and visualizing latent space embeddings (t-SNE).
  • Robust Software Architecture: Clean, modular python packages (src/), reproducible environment management (uv), and robust experiment tracking (wandb).

📖 Deep Dive Technical Details

🧬 1. Clinical Context & The Domain Shift Challenge (Click to expand)

Colorectal Cancer Histology

Hematoxylin and Eosin (H&E) staining is the gold standard in cancer diagnostics. Hematoxylin stains cell nuclei a deep blue/purple, while Eosin stains extracellular matrix and cytoplasm various shades of pink/red. We classify pathology patches into 9 tissue categories:

  1. Adipose
  2. Background
  3. Debris
  4. Lymphocytes
  5. Mucus
  6. Smooth Muscle
  7. Normal Colon Mucosa
  8. Cancer-Associated Stroma
  9. Colorectal Adenocarcinoma Epithelium

The Stain Variation Problem

Histology slides processed at different hospitals exhibit major color variations due to varying chemical concentrations, section thickness, and scanner calibration. Deep networks easily overfit to these site-specific color profiles (memorizing a specific hospital's exact shade of pink) rather than learning cellular geometry.

To evaluate domain generalization, we train and validate on NCT-CRC-HE-100K data and test on an external-center source dataset (CRC-VAL-HE-7K).

🛠️ 2. Methodology & Generalization Strategies (Click to expand)

Class Imbalance Mitigation

Exploratory Data Analysis revealed severe class imbalance. We implement a dual-balancing strategy:

  1. Dynamic Batch Balancing: During training, we use an inverse-frequency WeightedRandomSampler to ensure every mini-batch contains an even distribution of all 9 classes.
  2. Class-Weighted Cross-Entropy: We weight the loss function inversely proportional to class frequencies, penalizing misclassifications on minority classes more heavily.

Color-Invariance Pipelines

We benchmarked three color pipelines:

  1. Baseline: Standard spatial augmentations (random horizontal/vertical flips) with no color modification.
  2. Color Jitter (Domain Randomization): We randomly perturb brightness ($\pm 20%$), contrast ($\pm 20%$), saturation ($\pm 20%$), and hue ($\pm 10%$) during training.
  3. Macenko Stain Normalization: A physical approach based on the Beer-Lambert law. We map RGB intensities to Optical Density (OD) space, perform Singular Value Decomposition (SVD) to isolate pure Hematoxylin and Eosin stain vectors, and mathematically normalize concentrations to match a standard template image.
🧠 3. Scientific Discussion: Why Jittering Outperformed Macenko (Click to expand)

Benchmarks

  • Simple CNN Baseline (None): 83.98% accuracy.
  • Simple CNN + Macenko: 78.12% accuracy (a significant performance drop).
  • Simple CNN + Color Jitter: 88.82% accuracy (+4.84% gain).

Why Stain Normalization Failed Here

  1. Resolution Constraints: Macenko normalization relies on SVD to calculate stain vectors. On low-resolution $28 \times 28$ patches, there are very few pixels to construct robust staining matrix statistics. Noise and compression artifacts are amplified, leading to unstable stain vector calculation.
  2. Structural Blurring: Reconstructing the image after scaling stain concentrations often introduces blur and color halos in tiny patches, wiping out the fine morphological textures of cell nuclei.
  3. The Power of Randomization: Color Jitter does not alter pixel structures; instead, it acts as a regularizer. By randomly changing colors, it prevents the network from relying on absolute color values, forcing it to extract invariant features like edge orientations, nuclear density, and gland boundary patterns.
🔍 4. Explainable AI: Verification & Faithfulness (Click to expand)

Visual Attributions (Grad-CAM)

  • Baseline model: Activations are scattered and often focus on blank background spaces or arbitrary stain pools.
  • Color Jitter model: Activations are localized on cell nuclei clusters and epithelial-stromal boundaries.

Quantifying Explanation Faithfulness (Pixel Deletion Test)

We rank pixels by their Grad-CAM importance, progressively replace them with the slide's average color, and measure the decline in the model's confidence for the true class. A steeper, faster decline in confidence proves a highly faithful attribution map. The Color Jitter model (orange curve) exhibits a rapid drop, mathematically validating that the model is heavily reliant on the biologically relevant structures.

Latent Space Projections (t-SNE)

We project pre-classification feature vectors into 2D. We did not discover any specific differences, between the Baseline, Jitter, and Macenko approach. Some classes are very distinctively split into two groups, which might be worth looking into in the future. Maybe there is some possible improvement there.

🚀 5. Getting Started & Reproducibility (Click to expand)

This project uses the modern, ultra-fast Python package installer and resolver uv.

1. Install Dependencies

uv sync

2. Run the Full Experiment Suite

To train models, evaluate on the test set, and generate confusion matrices under all strategies (None, Jitter, Macenko), execute the automated bash script:

# Run experiments for ResNet18
./run_transform_experiments.sh resnet18

3. Run Individual Components

Alternatively, run individual modules from the root directory:

# Train a model
python -m src.train resnet18 jitter

# Evaluate and save confusion matrix
python -m src.evaluate resnet18 jitter

# Generate t-SNE projections
python -m src.tsne resnet18

# Run Grad-CAM and Deletion Tests
python -m src.xai resnet18

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages