PathMNIST Colorectal Tissue Classifier: Generalization & Explainability

Deep learning pipeline for 9-class colorectal cancer tissue classification on the PathMNIST benchmark.

View/Download Project Presentation Slide Deck (PDF)

Summary

We Beat the MedMNIST Baseline: Our optimized ResNet-18 achieves 93.59% test accuracy, significantly outperforming the official MedMNIST ResNet-50 baseline (91.1%) while being 2.3x smaller (~11M vs. ~25M parameters).
Solved Clinical Domain Shift: We benchmarked two core domain generalization techniques—Domain Randomization (Color Jitter) and Physical Stain Normalization (Macenko)—to make the model invariant to hospital-specific H&E stain variations.
Quantified Explainability (XAI): We verified model decisions qualitatively using Grad-CAM and quantitatively via Pixel Deletion Tests (faithfulness metric) and t-SNE latent space analysis.

📊 Quick Visual Tour

Quantitative Benchmark	Latent Clustering Separation (t-SNE)

Our ResNet-18 outperforms deeper baselines.	Color Jitter (center) creates highly compact, separated tissue clusters.

Grad-CAM Visual Attributions	Explanation Faithfulness (Pixel Deletion Test)

Color Jitter forces the model to focus on clinical cellular nuclei structures.	Steep curve proves the model relies directly on the Grad-CAM highlighted features.

Key Competencies Demonstrated

Rigorous Domain Shift Handling: Practical implementation and comparison of physical stain normalization (Macenko) vs. domain randomization (Color Jitter) to overcome clinical covariate shift.
Addressing Dataset Bias: Overcoming unstratified and highly imbalanced data splits.
Quantitative & Qualitative XAI: Going beyond "pretty pictures" by quantifying CAM faithfulness using Pixel Deletion Curves and visualizing latent space embeddings (t-SNE).
Robust Software Architecture: Clean, modular python packages (src/), reproducible environment management (uv), and robust experiment tracking (wandb).

📖 Deep Dive Technical Details

🧬 1. Clinical Context & The Domain Shift Challenge (Click to expand)

Colorectal Cancer Histology

Hematoxylin and Eosin (H&E) staining is the gold standard in cancer diagnostics. Hematoxylin stains cell nuclei a deep blue/purple, while Eosin stains extracellular matrix and cytoplasm various shades of pink/red. We classify pathology patches into 9 tissue categories:

Adipose
Background
Debris
Lymphocytes
Mucus
Smooth Muscle
Normal Colon Mucosa
Cancer-Associated Stroma
Colorectal Adenocarcinoma Epithelium

The Stain Variation Problem

Histology slides processed at different hospitals exhibit major color variations due to varying chemical concentrations, section thickness, and scanner calibration. Deep networks easily overfit to these site-specific color profiles (memorizing a specific hospital's exact shade of pink) rather than learning cellular geometry.

To evaluate domain generalization, we train and validate on NCT-CRC-HE-100K data and test on an external-center source dataset (CRC-VAL-HE-7K).

🛠️ 2. Methodology & Generalization Strategies (Click to expand)

Class Imbalance Mitigation

Exploratory Data Analysis revealed severe class imbalance. We implement a dual-balancing strategy:

Dynamic Batch Balancing: During training, we use an inverse-frequency WeightedRandomSampler to ensure every mini-batch contains an even distribution of all 9 classes.
Class-Weighted Cross-Entropy: We weight the loss function inversely proportional to class frequencies, penalizing misclassifications on minority classes more heavily.

Color-Invariance Pipelines

We benchmarked three color pipelines:

Baseline: Standard spatial augmentations (random horizontal/vertical flips) with no color modification.
Color Jitter (Domain Randomization): We randomly perturb brightness ($\pm 20%$), contrast ($\pm 20%$), saturation ($\pm 20%$), and hue ($\pm 10%$) during training.
Macenko Stain Normalization: A physical approach based on the Beer-Lambert law. We map RGB intensities to Optical Density (OD) space, perform Singular Value Decomposition (SVD) to isolate pure Hematoxylin and Eosin stain vectors, and mathematically normalize concentrations to match a standard template image.

🧠 3. Scientific Discussion: Why Jittering Outperformed Macenko (Click to expand)

Benchmarks

Simple CNN Baseline (None): 83.98% accuracy.
Simple CNN + Macenko: 78.12% accuracy (a significant performance drop).
Simple CNN + Color Jitter: 88.82% accuracy (+4.84% gain).

Why Stain Normalization Failed Here

Resolution Constraints: Macenko normalization relies on SVD to calculate stain vectors. On low-resolution $28 \times 28$ patches, there are very few pixels to construct robust staining matrix statistics. Noise and compression artifacts are amplified, leading to unstable stain vector calculation.
Structural Blurring: Reconstructing the image after scaling stain concentrations often introduces blur and color halos in tiny patches, wiping out the fine morphological textures of cell nuclei.
The Power of Randomization: Color Jitter does not alter pixel structures; instead, it acts as a regularizer. By randomly changing colors, it prevents the network from relying on absolute color values, forcing it to extract invariant features like edge orientations, nuclear density, and gland boundary patterns.

🔍 4. Explainable AI: Verification & Faithfulness (Click to expand)

Visual Attributions (Grad-CAM)

Baseline model: Activations are scattered and often focus on blank background spaces or arbitrary stain pools.
Color Jitter model: Activations are localized on cell nuclei clusters and epithelial-stromal boundaries.

Quantifying Explanation Faithfulness (Pixel Deletion Test)

We rank pixels by their Grad-CAM importance, progressively replace them with the slide's average color, and measure the decline in the model's confidence for the true class. A steeper, faster decline in confidence proves a highly faithful attribution map. The Color Jitter model (orange curve) exhibits a rapid drop, mathematically validating that the model is heavily reliant on the biologically relevant structures.

Latent Space Projections (t-SNE)

We project pre-classification feature vectors into 2D. We did not discover any specific differences, between the Baseline, Jitter, and Macenko approach. Some classes are very distinctively split into two groups, which might be worth looking into in the future. Maybe there is some possible improvement there.

🚀 5. Getting Started & Reproducibility (Click to expand)

This project uses the modern, ultra-fast Python package installer and resolver uv.

1. Install Dependencies

uv sync

2. Run the Full Experiment Suite

To train models, evaluate on the test set, and generate confusion matrices under all strategies (None, Jitter, Macenko), execute the automated bash script:

# Run experiments for ResNet18
./run_transform_experiments.sh resnet18

3. Run Individual Components

Alternatively, run individual modules from the root directory:

# Train a model
python -m src.train resnet18 jitter

# Evaluate and save confusion matrix
python -m src.evaluate resnet18 jitter

# Generate t-SNE projections
python -m src.tsne resnet18

# Run Grad-CAM and Deletion Tests
python -m src.xai resnet18

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
notebooks		notebooks
results/figures		results/figures
src		src
.gitignore		.gitignore
.python-version		.python-version
PathMNIST.pdf		PathMNIST.pdf
README.md		README.md
generate_table.py		generate_table.py
main.py		main.py
print_models_summary.py		print_models_summary.py
pyproject.toml		pyproject.toml
run_transform_experiments.sh		run_transform_experiments.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PathMNIST Colorectal Tissue Classifier: Generalization & Explainability

Summary

📊 Quick Visual Tour

Key Competencies Demonstrated

📖 Deep Dive Technical Details

Colorectal Cancer Histology

The Stain Variation Problem

Class Imbalance Mitigation

Color-Invariance Pipelines

Benchmarks

Why Stain Normalization Failed Here

Visual Attributions (Grad-CAM)

Quantifying Explanation Faithfulness (Pixel Deletion Test)

Latent Space Projections (t-SNE)

1. Install Dependencies

2. Run the Full Experiment Suite

3. Run Individual Components

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PathMNIST Colorectal Tissue Classifier: Generalization & Explainability

Summary

📊 Quick Visual Tour

Key Competencies Demonstrated

📖 Deep Dive Technical Details

Colorectal Cancer Histology

The Stain Variation Problem

Class Imbalance Mitigation

Color-Invariance Pipelines

Benchmarks

Why Stain Normalization Failed Here

Visual Attributions (Grad-CAM)

Quantifying Explanation Faithfulness (Pixel Deletion Test)

Latent Space Projections (t-SNE)

1. Install Dependencies

2. Run the Full Experiment Suite

3. Run Individual Components

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages