Machine Learning — Assignment 4 (AITU)

Systematic preprocessing and augmentation study for lung-cancer CT image classification using a custom CNN trained on the IQ-OTH/NCCD dataset.

Overview

Dataset: IQ-OTH/NCCD Lung Cancer Dataset — CT lung scan images in three classes: Benign, Malignant, and Normal.

Goal: Compare how different image preprocessing strategies and data-augmentation pipelines affect the classification performance of a fixed custom CNN architecture. Each experiment is run three times (different random seeds) to assess result stability.

What it covers

Custom CNN architecture — multi-block Conv2D + BatchNormalization + MaxPooling network built in Keras/TensorFlow; trained for 10 epochs per run.
Preprocessing strategies compared (9 experiment groups):
- Baseline — no preprocessing beyond MobileNetV2 input scaling
- CLAHE — Contrast Limited Adaptive Histogram Equalization
- HistEqual — global histogram equalization
- GaussianBlur — smoothing filter applied before training
- MedianFilter — median spatial filter
- LightAug — light geometric/photometric augmentation
- ModerateAug — moderate augmentation pipeline
- CLAHE_LightAug — CLAHE combined with light augmentation
- GaussianBlur_ModerateAug — Gaussian blur combined with moderate augmentation
Evaluation metrics: Macro F1, precision, recall per class (benign / malignant / normal); loss and accuracy curves.
Statistical analysis (analysis/analysis.ipynb): Kruskal–Wallis test + Dunn post-hoc test (Bonferroni correction) to identify which preprocessing groups differ significantly from the baseline; Mann–Whitney effect sizes; radar and scatter plots.
Orchestration: orchestrator.py runs all experiment notebooks programmatically.
Smaller-resolution ablation (SmallerRes) — examines the impact of reduced input resolution.

Repository structure

ML_Assignment_4/
├── cases/
│   ├── Baseline/1,2,3/          # Three-run baseline experiment notebooks
│   ├── CLAHE/1,2,3/             # CLAHE preprocessing
│   ├── HistEqual/1,2,3/         # Histogram equalization
│   ├── GaussianBlur/1,2,3/      # Gaussian blur
│   ├── MedianFilter/1,2,3/      # Median filter
│   ├── LightAug/1,2,3/          # Light augmentation
│   ├── ModerateAug/1,2,3/       # Moderate augmentation
│   ├── CLAHE_LightAug/1,2,3/    # CLAHE + light aug
│   ├── GaussianBlur_ModerateAug/# Gaussian blur + moderate aug
│   └── SmallerRes/1,2,3/        # Reduced resolution ablation
├── analysis/
│   ├── analysis.ipynb           # Statistical comparison across all groups
│   ├── metrics.csv              # Aggregated per-run metrics
│   ├── f1_radar.png             # Radar chart of macro F1 per group
│   ├── heatmap_vs_baseline.png  # Metric deltas vs baseline
│   └── *.csv                    # Exported tables (stability, convergence, …)
├── lung-cancer-98-8-custom-cnn-model.ipynb  # Standalone prototype notebook
├── Template_REV1.ipynb          # Experiment template
├── orchestrator.py              # Batch runner for all case notebooks
├── Dockerfile / docker-compose.yaml
└── metrics_all-merged123.csv    # Merged metrics across all experiments

Getting started

# 1. Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

# 2. Install dependencies
pip install jupyter tensorflow keras scikit-learn pandas numpy matplotlib seaborn

# 3. Open a specific experiment notebook
jupyter lab cases/Baseline/1/Baseline1.ipynb

# 4. Or run the statistical analysis
jupyter lab analysis/analysis.ipynb

Dataset: Download the IQ-OTH/NCCD dataset from Kaggle and place it at ./dataset/The IQ-OTHNCCD lung cancer dataset/ before running the experiment notebooks.

Adil Ormanov — GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning — Assignment 4 (AITU)

Overview

What it covers

Repository structure

Getting started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
analysis		analysis
cases		cases
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
Template_REV1.ipynb		Template_REV1.ipynb
docker-compose.yaml		docker-compose.yaml
lung-cancer-98-8-custom-cnn-model.ipynb		lung-cancer-98-8-custom-cnn-model.ipynb
metrics_all-merged123.csv		metrics_all-merged123.csv
orchestrator.py		orchestrator.py

Folders and files

Latest commit

History

Repository files navigation

Machine Learning — Assignment 4 (AITU)

Overview

What it covers

Repository structure

Getting started

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages