Skip to content

Adilforest/ml-assignment-4

Repository files navigation

Machine Learning — Assignment 4 (AITU)

Systematic preprocessing and augmentation study for lung-cancer CT image classification using a custom CNN trained on the IQ-OTH/NCCD dataset.

Python Jupyter TensorFlow scikit-learn


Overview

Dataset: IQ-OTH/NCCD Lung Cancer Dataset — CT lung scan images in three classes: Benign, Malignant, and Normal.

Goal: Compare how different image preprocessing strategies and data-augmentation pipelines affect the classification performance of a fixed custom CNN architecture. Each experiment is run three times (different random seeds) to assess result stability.


What it covers

  • Custom CNN architecture — multi-block Conv2D + BatchNormalization + MaxPooling network built in Keras/TensorFlow; trained for 10 epochs per run.
  • Preprocessing strategies compared (9 experiment groups):
    • Baseline — no preprocessing beyond MobileNetV2 input scaling
    • CLAHE — Contrast Limited Adaptive Histogram Equalization
    • HistEqual — global histogram equalization
    • GaussianBlur — smoothing filter applied before training
    • MedianFilter — median spatial filter
    • LightAug — light geometric/photometric augmentation
    • ModerateAug — moderate augmentation pipeline
    • CLAHE_LightAug — CLAHE combined with light augmentation
    • GaussianBlur_ModerateAug — Gaussian blur combined with moderate augmentation
  • Evaluation metrics: Macro F1, precision, recall per class (benign / malignant / normal); loss and accuracy curves.
  • Statistical analysis (analysis/analysis.ipynb): Kruskal–Wallis test + Dunn post-hoc test (Bonferroni correction) to identify which preprocessing groups differ significantly from the baseline; Mann–Whitney effect sizes; radar and scatter plots.
  • Orchestration: orchestrator.py runs all experiment notebooks programmatically.
  • Smaller-resolution ablation (SmallerRes) — examines the impact of reduced input resolution.

Repository structure

ML_Assignment_4/
├── cases/
│   ├── Baseline/1,2,3/          # Three-run baseline experiment notebooks
│   ├── CLAHE/1,2,3/             # CLAHE preprocessing
│   ├── HistEqual/1,2,3/         # Histogram equalization
│   ├── GaussianBlur/1,2,3/      # Gaussian blur
│   ├── MedianFilter/1,2,3/      # Median filter
│   ├── LightAug/1,2,3/          # Light augmentation
│   ├── ModerateAug/1,2,3/       # Moderate augmentation
│   ├── CLAHE_LightAug/1,2,3/    # CLAHE + light aug
│   ├── GaussianBlur_ModerateAug/# Gaussian blur + moderate aug
│   └── SmallerRes/1,2,3/        # Reduced resolution ablation
├── analysis/
│   ├── analysis.ipynb           # Statistical comparison across all groups
│   ├── metrics.csv              # Aggregated per-run metrics
│   ├── f1_radar.png             # Radar chart of macro F1 per group
│   ├── heatmap_vs_baseline.png  # Metric deltas vs baseline
│   └── *.csv                    # Exported tables (stability, convergence, …)
├── lung-cancer-98-8-custom-cnn-model.ipynb  # Standalone prototype notebook
├── Template_REV1.ipynb          # Experiment template
├── orchestrator.py              # Batch runner for all case notebooks
├── Dockerfile / docker-compose.yaml
└── metrics_all-merged123.csv    # Merged metrics across all experiments

Getting started

# 1. Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

# 2. Install dependencies
pip install jupyter tensorflow keras scikit-learn pandas numpy matplotlib seaborn

# 3. Open a specific experiment notebook
jupyter lab cases/Baseline/1/Baseline1.ipynb

# 4. Or run the statistical analysis
jupyter lab analysis/analysis.ipynb

Dataset: Download the IQ-OTH/NCCD dataset from Kaggle and place it at ./dataset/The IQ-OTHNCCD lung cancer dataset/ before running the experiment notebooks.


Adil Ormanov — GitHub

About

Lung-cancer CT image classification: custom CNN with 9 preprocessing/augmentation strategies compared (CLAHE, Histogram Equalization, Gaussian Blur, data augmentation) on IQ-OTH/NCCD dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors