CandleLabAI · ironmanizawesome · May 5, 2026 · May 5, 2026 · May 5, 2026 · May 10, 2026
diff --git a/.gitignore b/.gitignore
@@ -87,3 +87,16 @@ target/
 
 # Mypy cache
 .mypy_cache/
+
+# Claude Code working state
+.claude/
+
+# Training logs
+/logs/*.log
+
+# Model checkpoints downloaded from Colab (super_resolution.h5 in checkpoints/
+# is already tracked; this only catches root-level .h5 backups)
+/best_*.h5
+
+# Data archives at repo root
+/*.zip
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,90 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+PCBSegClassNet is a TensorFlow-based deep learning project for PCB (Printed Circuit Board) component segmentation and classification. It uses the FICS PCB Image Collection (FPIC) dataset.
+
+The two tasks are handled by separate model variants sharing the same encoder:
+- **Segmentation**: `PCBSegNet` — segments all 25 component classes on a full PCB image
+- **Classification**: `PCBClassNet` — classifies individual cropped component images
+
+## Environment Setup
+
+```bash
+conda create -n pscn python=3.8
+conda activate pscn
+conda install pip
+pip install -r requirements.txt
+```
+
+Key dependencies: `tensorflow-gpu==2.11`, `albumentations`, `pyyaml`, `tqdm`, `pandas`.
+
+## Commands
+
+All training commands must be run from the `src/` directory.
+
+**Train segmentation** (100 epochs):
+```bash
+python train_segmentation.py -opt cfs/pscn_seg.yml -epoch 100
+```
+
+**Evaluate segmentation** (loads best checkpoint, skips training):
+```bash
+python train_segmentation.py -opt cfs/pscn_seg.yml -epoch 0
+```
+
+**Train classification** (100 epochs):
+```bash
+python train_classification.py -opt cfs/pscn_class.yml -epoch 100
+```
+
+**Evaluate classification**:
+```bash
+python train_classification.py -opt cfs/pscn_class.yml -epoch 0
+```
+
+**Data preparation** (run from `src/data/`):
+```bash
+# Create HSI+CLAHE images, masks, and classification crops
+python create_mask.py -i ../../data/pcb_image/ -a ../../data/smd_annotation/ -id ../../data/segmentation/images -ad ../../data/segmentation/masks -cd ../../data/classification/images/
+
+# Create patches (768px) and split train/test
+python create_patches.py -i ../../data/segmentation/images/ -m ../../data/segmentation/masks -cd ../../data/classification/images/ -ps 768
+```
+
+## Architecture
+
+### Encoder (shared by both tasks)
+Built in `src/models/blocks.py`, the encoder has three stages:
+1. **Learning Module** — three conv/depthwise-separable conv blocks with stride 2, producing feature maps at 3 scales (`learning_layer1`, `learning_layer2`, `learning_layer3`)
+2. **Feature Extractor** — three `bottleneck_block` stages (MobileNetV2-style residual bottlenecks) followed by a `pyramid_pooling_block` (PSPNet-style)
+3. **Fusion Module** — fuses the learning module output with the upsampled feature extractor output
+
+### Segmentation Decoder (`get_decoder` in `blocks.py`)
+- Applies `tem_block` (Texture Enhancement Module: channel attention + cosine-similarity-based spatial attention) to encoder output
+- Two upsampling steps with skip connections from `learning_layer2` and `learning_layer1`
+- Final `Conv2D(num_classes)` + softmax
+
+### Classification Head (`get_classification` in `blocks.py`)
+- `GlobalAveragePooling2D` on encoder output → `Dense(128, relu)` → `Dense(num_classes, softmax)`
+
+### Loss
+Segmentation uses **DISLoss** (`src/models/loss.py`): sum of Dice loss + Jaccard loss + SSIM loss. Classification uses standard `categorical_crossentropy`.
+
+## Configuration
+
+Training hyperparameters and data paths are controlled by YAML files in `src/cfs/`:
+- `pscn_seg.yml` — segmentation config (25 classes, Adam lr=1e-4, batch=16, input 512×512)
+- `pscn_class.yml` — classification config (25 classes, Adam lr=1e-4, batch=16, input 512×512)
+
+Checkpoints are saved to `checkpoints/best_seg.h5` and `checkpoints/best_class.h5`. Logs go to `logs/app.log`.
+
+## Data
+
+25 PCB component classes: R, C, U, Q, J, L, RA, D, RN, TP, IC, P, CR, M, BTN, FB, CRA, SW, T, F, V, LED, S, QA, JP.
+
+The segmentation masks use specific RGB color values per class (defined in `src/data/dataloader.py::color_values`). When modifying mask generation, ensure colors match this mapping exactly.
+
+The FPIC dataset requires access codes from the dataset authors — it is not freely downloadable.
diff --git a/notebooks/README.md b/notebooks/README.md
@@ -0,0 +1,69 @@
+# Colab Training
+
+`colab_train.ipynb` is a self-contained notebook that runs the **full pipeline** end-to-end on a Colab GPU runtime: data preprocessing (mask generation + patches + train/val split) → segmentation training → classification training.
+
+## Quickstart
+
+1. **Get the raw FPIC dataset** (request access codes from the dataset authors — see top-level [README.md](../README.md)).
+2. **Zip raw inputs** and upload to Drive:
+    ```powershell
+    Compress-Archive -Path data\pcb_image, data\smd_annotation -DestinationPath data_raw.zip -Force
+    ```
+    Place at `MyDrive/PCBSegClassNet/data_raw.zip` (~7 GB).
+3. **Open the notebook in Colab**:
+    ```
+    https://colab.research.google.com/github/<your-fork>/PCBSegClassNet/blob/colab/notebooks/colab_train.ipynb
+    ```
+4. **Runtime → Change runtime type → GPU** (T4 is enough; High-RAM not needed), then run cells top to bottom.
+
+## What the notebook does
+
+| Section | Purpose |
+|---|---|
+| 1 | `nvidia-smi` GPU sanity |
+| 2 | Clone this repo (`colab` branch) |
+| 3 | Install TF 2.15 + dependencies (TF 2.15 is the last release on Keras 2; Keras 3 from TF 2.16+ breaks this codebase's `tf.keras.backend.{dot,transpose}` calls) |
+| 4 | Mount Drive, unzip `data_raw.zip` to local Colab disk |
+| 5 | `create_mask.py` — polygon masks + classification crops (EDSR super-resolution, GPU) |
+| 6 | `create_patches.py` — 768 px patches + 80/20 train/val split (CPU) |
+| 7 | Set up Drive checkpoint directory for persistence across sessions |
+| 8 | Segmentation training (5 epochs sanity → 80 epochs full → mirror checkpoint to Drive) |
+| 9 | Classification training (same pattern) |
+| 10 | Optional: re-evaluate from Drive checkpoints in a fresh session |
+
+## Why preprocess on Colab?
+
+- Raw inputs (~7 GB) are smaller than the processed dataset (~18 GB) — easier to transfer to Drive.
+- Reproducibility: anyone with raw data + this notebook can recreate the exact training set without trusting an opaque processed zip.
+- Easy to iterate on preprocessing knobs (e.g. patch size) without re-uploading.
+
+If you already have a processed dataset zip, you can skip cells 5–6 and unzip it directly into `data/` instead.
+
+## Why TF 2.15?
+
+- This repo uses `tf.keras.backend.dot` / `backend.transpose` and `tf.keras.activations.softmax(tensor)` patterns that broke in Keras 3.
+- TF 2.15 is the **last TF release on Keras 2**; Keras 3 starts at TF 2.16.
+- Earlier this notebook tried to pin TF 2.10 via `condacolab`, but Colab's base Python keeps moving past 3.10 and TF 2.10's wheel matrix doesn't follow. TF 2.15 ships wheels for the Python versions Colab actually serves.
+
+## VRAM notes
+
+| GPU | Comfortable batch size at 512×512 input |
+|---|---|
+| T4 (16 GB) | 16 |
+| A100 (40 GB) | 32+ |
+| L4 (24 GB) | 16-24 |
+| RTX 4060 Ti (8 GB) | 4-8 (and even 8 OOMs in this codebase due to SSIM gradient) |
+
+The default `batch_size: 16` in `cfs/pscn_seg.yml` works on all Colab GPUs.
+
+## Epoch budget
+
+The notebook runs:
+- **Sanity 5 epochs** before each full run, so you catch NaN losses or OOMs in <1 hour.
+- **Full 80 epochs** for both segmentation and classification.
+
+80 + 80 ≈ 18 hours on an L4, fitting inside Colab Pro's 24 h session limit with margin. The original paper trained for 100 epochs; 80 leaves a safety buffer for the inevitable Drive-mount / preprocessing time at the start of a session. If you want closer to paper-faithful runs, push to 100 once you've seen one full run complete.
+
+## Session persistence
+
+Colab wipes `/content` on disconnect but Drive persists. The notebook copies the best checkpoint to Drive after each training run; section 10 shows how to restore it in a new session for evaluation.