Deep learning pipeline for 9-class colorectal cancer tissue classification on the PathMNIST benchmark.
View/Download Project Presentation Slide Deck (PDF)
- We Beat the MedMNIST Baseline: Our optimized ResNet-18 achieves 93.59% test accuracy, significantly outperforming the official MedMNIST ResNet-50 baseline (91.1%) while being 2.3x smaller (~11M vs. ~25M parameters).
- Solved Clinical Domain Shift: We benchmarked two core domain generalization techniques—Domain Randomization (Color Jitter) and Physical Stain Normalization (Macenko)—to make the model invariant to hospital-specific H&E stain variations.
- Quantified Explainability (XAI): We verified model decisions qualitatively using Grad-CAM and quantitatively via Pixel Deletion Tests (faithfulness metric) and t-SNE latent space analysis.
| Quantitative Benchmark | Latent Clustering Separation (t-SNE) |
|---|---|
![]() |
![]() |
| Our ResNet-18 outperforms deeper baselines. | Color Jitter (center) creates highly compact, separated tissue clusters. |
- Rigorous Domain Shift Handling: Practical implementation and comparison of physical stain normalization (Macenko) vs. domain randomization (Color Jitter) to overcome clinical covariate shift.
- Addressing Dataset Bias: Overcoming unstratified and highly imbalanced data splits.
- Quantitative & Qualitative XAI: Going beyond "pretty pictures" by quantifying CAM faithfulness using Pixel Deletion Curves and visualizing latent space embeddings (t-SNE).
- Robust Software Architecture: Clean, modular python packages (
src/), reproducible environment management (uv), and robust experiment tracking (wandb).
🧬 1. Clinical Context & The Domain Shift Challenge (Click to expand)
Hematoxylin and Eosin (H&E) staining is the gold standard in cancer diagnostics. Hematoxylin stains cell nuclei a deep blue/purple, while Eosin stains extracellular matrix and cytoplasm various shades of pink/red. We classify pathology patches into 9 tissue categories:
- Adipose
- Background
- Debris
- Lymphocytes
- Mucus
- Smooth Muscle
- Normal Colon Mucosa
- Cancer-Associated Stroma
- Colorectal Adenocarcinoma Epithelium
Histology slides processed at different hospitals exhibit major color variations due to varying chemical concentrations, section thickness, and scanner calibration. Deep networks easily overfit to these site-specific color profiles (memorizing a specific hospital's exact shade of pink) rather than learning cellular geometry.
To evaluate domain generalization, we train and validate on NCT-CRC-HE-100K data and test on an external-center source dataset (CRC-VAL-HE-7K).
🛠️ 2. Methodology & Generalization Strategies (Click to expand)
Class Imbalance Mitigation
Exploratory Data Analysis revealed severe class imbalance. We implement a dual-balancing strategy:
-
Dynamic Batch Balancing: During training, we use an inverse-frequency
WeightedRandomSamplerto ensure every mini-batch contains an even distribution of all 9 classes. - Class-Weighted Cross-Entropy: We weight the loss function inversely proportional to class frequencies, penalizing misclassifications on minority classes more heavily.
Color-Invariance Pipelines
We benchmarked three color pipelines:
- Baseline: Standard spatial augmentations (random horizontal/vertical flips) with no color modification.
-
Color Jitter (Domain Randomization): We randomly perturb brightness (
$\pm 20%$ ), contrast ($\pm 20%$ ), saturation ($\pm 20%$ ), and hue ($\pm 10%$ ) during training. - Macenko Stain Normalization: A physical approach based on the Beer-Lambert law. We map RGB intensities to Optical Density (OD) space, perform Singular Value Decomposition (SVD) to isolate pure Hematoxylin and Eosin stain vectors, and mathematically normalize concentrations to match a standard template image.
🧠 3. Scientific Discussion: Why Jittering Outperformed Macenko (Click to expand)
Benchmarks
- Simple CNN Baseline (None): 83.98% accuracy.
- Simple CNN + Macenko: 78.12% accuracy (a significant performance drop).
- Simple CNN + Color Jitter: 88.82% accuracy (+4.84% gain).
Why Stain Normalization Failed Here
-
Resolution Constraints: Macenko normalization relies on SVD to calculate stain vectors. On low-resolution
$28 \times 28$ patches, there are very few pixels to construct robust staining matrix statistics. Noise and compression artifacts are amplified, leading to unstable stain vector calculation. - Structural Blurring: Reconstructing the image after scaling stain concentrations often introduces blur and color halos in tiny patches, wiping out the fine morphological textures of cell nuclei.
- The Power of Randomization: Color Jitter does not alter pixel structures; instead, it acts as a regularizer. By randomly changing colors, it prevents the network from relying on absolute color values, forcing it to extract invariant features like edge orientations, nuclear density, and gland boundary patterns.
🔍 4. Explainable AI: Verification & Faithfulness (Click to expand)
- Baseline model: Activations are scattered and often focus on blank background spaces or arbitrary stain pools.
- Color Jitter model: Activations are localized on cell nuclei clusters and epithelial-stromal boundaries.
We rank pixels by their Grad-CAM importance, progressively replace them with the slide's average color, and measure the decline in the model's confidence for the true class. A steeper, faster decline in confidence proves a highly faithful attribution map. The Color Jitter model (orange curve) exhibits a rapid drop, mathematically validating that the model is heavily reliant on the biologically relevant structures.
We project pre-classification feature vectors into 2D. We did not discover any specific differences, between the Baseline, Jitter, and Macenko approach. Some classes are very distinctively split into two groups, which might be worth looking into in the future. Maybe there is some possible improvement there.
🚀 5. Getting Started & Reproducibility (Click to expand)
This project uses the modern, ultra-fast Python package installer and resolver uv.
uv syncTo train models, evaluate on the test set, and generate confusion matrices under all strategies (None, Jitter, Macenko), execute the automated bash script:
# Run experiments for ResNet18
./run_transform_experiments.sh resnet18Alternatively, run individual modules from the root directory:
# Train a model
python -m src.train resnet18 jitter
# Evaluate and save confusion matrix
python -m src.evaluate resnet18 jitter
# Generate t-SNE projections
python -m src.tsne resnet18
# Run Grad-CAM and Deletion Tests
python -m src.xai resnet18


