Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,16 @@ target/

# Mypy cache
.mypy_cache/

# Claude Code working state
.claude/

# Training logs
/logs/*.log

# Model checkpoints downloaded from Colab (super_resolution.h5 in checkpoints/
# is already tracked; this only catches root-level .h5 backups)
/best_*.h5

# Data archives at repo root
/*.zip
90 changes: 90 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

PCBSegClassNet is a TensorFlow-based deep learning project for PCB (Printed Circuit Board) component segmentation and classification. It uses the FICS PCB Image Collection (FPIC) dataset.

The two tasks are handled by separate model variants sharing the same encoder:
- **Segmentation**: `PCBSegNet` — segments all 25 component classes on a full PCB image
- **Classification**: `PCBClassNet` — classifies individual cropped component images

## Environment Setup

```bash
conda create -n pscn python=3.8
conda activate pscn
conda install pip
pip install -r requirements.txt
```

Key dependencies: `tensorflow-gpu==2.11`, `albumentations`, `pyyaml`, `tqdm`, `pandas`.

## Commands

All training commands must be run from the `src/` directory.

**Train segmentation** (100 epochs):
```bash
python train_segmentation.py -opt cfs/pscn_seg.yml -epoch 100
```

**Evaluate segmentation** (loads best checkpoint, skips training):
```bash
python train_segmentation.py -opt cfs/pscn_seg.yml -epoch 0
```

**Train classification** (100 epochs):
```bash
python train_classification.py -opt cfs/pscn_class.yml -epoch 100
```

**Evaluate classification**:
```bash
python train_classification.py -opt cfs/pscn_class.yml -epoch 0
```

**Data preparation** (run from `src/data/`):
```bash
# Create HSI+CLAHE images, masks, and classification crops
python create_mask.py -i ../../data/pcb_image/ -a ../../data/smd_annotation/ -id ../../data/segmentation/images -ad ../../data/segmentation/masks -cd ../../data/classification/images/

# Create patches (768px) and split train/test
python create_patches.py -i ../../data/segmentation/images/ -m ../../data/segmentation/masks -cd ../../data/classification/images/ -ps 768
```

## Architecture

### Encoder (shared by both tasks)
Built in `src/models/blocks.py`, the encoder has three stages:
1. **Learning Module** — three conv/depthwise-separable conv blocks with stride 2, producing feature maps at 3 scales (`learning_layer1`, `learning_layer2`, `learning_layer3`)
2. **Feature Extractor** — three `bottleneck_block` stages (MobileNetV2-style residual bottlenecks) followed by a `pyramid_pooling_block` (PSPNet-style)
3. **Fusion Module** — fuses the learning module output with the upsampled feature extractor output

### Segmentation Decoder (`get_decoder` in `blocks.py`)
- Applies `tem_block` (Texture Enhancement Module: channel attention + cosine-similarity-based spatial attention) to encoder output
- Two upsampling steps with skip connections from `learning_layer2` and `learning_layer1`
- Final `Conv2D(num_classes)` + softmax

### Classification Head (`get_classification` in `blocks.py`)
- `GlobalAveragePooling2D` on encoder output → `Dense(128, relu)` → `Dense(num_classes, softmax)`

### Loss
Segmentation uses **DISLoss** (`src/models/loss.py`): sum of Dice loss + Jaccard loss + SSIM loss. Classification uses standard `categorical_crossentropy`.

## Configuration

Training hyperparameters and data paths are controlled by YAML files in `src/cfs/`:
- `pscn_seg.yml` — segmentation config (25 classes, Adam lr=1e-4, batch=16, input 512×512)
- `pscn_class.yml` — classification config (25 classes, Adam lr=1e-4, batch=16, input 512×512)

Checkpoints are saved to `checkpoints/best_seg.h5` and `checkpoints/best_class.h5`. Logs go to `logs/app.log`.

## Data

25 PCB component classes: R, C, U, Q, J, L, RA, D, RN, TP, IC, P, CR, M, BTN, FB, CRA, SW, T, F, V, LED, S, QA, JP.

The segmentation masks use specific RGB color values per class (defined in `src/data/dataloader.py::color_values`). When modifying mask generation, ensure colors match this mapping exactly.

The FPIC dataset requires access codes from the dataset authors — it is not freely downloadable.
69 changes: 69 additions & 0 deletions notebooks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Colab Training

`colab_train.ipynb` is a self-contained notebook that runs the **full pipeline** end-to-end on a Colab GPU runtime: data preprocessing (mask generation + patches + train/val split) → segmentation training → classification training.

## Quickstart

1. **Get the raw FPIC dataset** (request access codes from the dataset authors — see top-level [README.md](../README.md)).
2. **Zip raw inputs** and upload to Drive:
```powershell
Compress-Archive -Path data\pcb_image, data\smd_annotation -DestinationPath data_raw.zip -Force
```
Place at `MyDrive/PCBSegClassNet/data_raw.zip` (~7 GB).
3. **Open the notebook in Colab**:
```
https://colab.research.google.com/github/<your-fork>/PCBSegClassNet/blob/colab/notebooks/colab_train.ipynb
```
4. **Runtime → Change runtime type → GPU** (T4 is enough; High-RAM not needed), then run cells top to bottom.

## What the notebook does

| Section | Purpose |
|---|---|
| 1 | `nvidia-smi` GPU sanity |
| 2 | Clone this repo (`colab` branch) |
| 3 | Install TF 2.15 + dependencies (TF 2.15 is the last release on Keras 2; Keras 3 from TF 2.16+ breaks this codebase's `tf.keras.backend.{dot,transpose}` calls) |
| 4 | Mount Drive, unzip `data_raw.zip` to local Colab disk |
| 5 | `create_mask.py` — polygon masks + classification crops (EDSR super-resolution, GPU) |
| 6 | `create_patches.py` — 768 px patches + 80/20 train/val split (CPU) |
| 7 | Set up Drive checkpoint directory for persistence across sessions |
| 8 | Segmentation training (5 epochs sanity → 80 epochs full → mirror checkpoint to Drive) |
| 9 | Classification training (same pattern) |
| 10 | Optional: re-evaluate from Drive checkpoints in a fresh session |

## Why preprocess on Colab?

- Raw inputs (~7 GB) are smaller than the processed dataset (~18 GB) — easier to transfer to Drive.
- Reproducibility: anyone with raw data + this notebook can recreate the exact training set without trusting an opaque processed zip.
- Easy to iterate on preprocessing knobs (e.g. patch size) without re-uploading.

If you already have a processed dataset zip, you can skip cells 5–6 and unzip it directly into `data/` instead.

## Why TF 2.15?

- This repo uses `tf.keras.backend.dot` / `backend.transpose` and `tf.keras.activations.softmax(tensor)` patterns that broke in Keras 3.
- TF 2.15 is the **last TF release on Keras 2**; Keras 3 starts at TF 2.16.
- Earlier this notebook tried to pin TF 2.10 via `condacolab`, but Colab's base Python keeps moving past 3.10 and TF 2.10's wheel matrix doesn't follow. TF 2.15 ships wheels for the Python versions Colab actually serves.

## VRAM notes

| GPU | Comfortable batch size at 512×512 input |
|---|---|
| T4 (16 GB) | 16 |
| A100 (40 GB) | 32+ |
| L4 (24 GB) | 16-24 |
| RTX 4060 Ti (8 GB) | 4-8 (and even 8 OOMs in this codebase due to SSIM gradient) |

The default `batch_size: 16` in `cfs/pscn_seg.yml` works on all Colab GPUs.

## Epoch budget

The notebook runs:
- **Sanity 5 epochs** before each full run, so you catch NaN losses or OOMs in <1 hour.
- **Full 80 epochs** for both segmentation and classification.

80 + 80 ≈ 18 hours on an L4, fitting inside Colab Pro's 24 h session limit with margin. The original paper trained for 100 epochs; 80 leaves a safety buffer for the inevitable Drive-mount / preprocessing time at the start of a session. If you want closer to paper-faithful runs, push to 100 once you've seen one full run complete.

## Session persistence

Colab wipes `/content` on disconnect but Drive persists. The notebook copies the best checkpoint to Drive after each training run; section 10 shows how to restore it in a new session for evaluation.
Loading