Skip to content

ORNLxUTK/DomainSpecific

Repository files navigation

Domain-Specific Video Segmentation with SAM 2

ORNLxUTK (Oak Ridge National Laboratory x MARCI Lab @ University of Tennessee, Knoxville)

Python SAM 2 License

Overview

This repository contains the code and data pipelines for evaluating SAM 2 (Segment Anything Model 2) on domain-specific video segmentation tasks in additive manufacturing. We systematically compare three adaptation strategies — baseline (zero-shot), LoRA fine-tuning, and full fine-tuning — across five additive manufacturing video domains: TIG, LWAM, PAW, visible-light polymer (visPOLYMER), and infrared polymer imaging (irPOLYMER).

Each domain includes multiple video sequences with per-frame segmentation annotations for two object categories: melt pool (material region) and feed wire / nozzle. All experiments explore cross-validated datasets and evaluate four SAM 2.1 model sizes (tiny, small, base-plus, large) with LoRA ranks of 2, 4, 16, and 32. Evaluation follows the DAVIS 2017 semi-supervised video object segmentation benchmark protocol.

Repository Structure

This is a monorepo with five git submodules:

DomainSpecific/
├── sam2/                   # Meta's SAM 2 framework (forked, with training extensions)
├── SAM2inference/          # Inference, evaluation, and metrics pipelines
├── Datasets/               # Roboflow/COCO annotation conversion to VOC-style masks
├── DatasetVariants/        # Cross-validation splits and IR preprocessing
├── irPOLYMERpreprocess/    # IR-specific preprocessing (BM3D denoising, normalization)
├── WAAMlabeledDataset/     # LWAM dataset preparation and prompt creation
├── pyproject.toml          # Root project configuration
└── README.md

Getting Started

Prerequisites

  • Python >= 3.11 (3.11.11 recommended for SAM2inference)
  • CUDA-capable GPU with PyTorch >= 2.8.0
  • uv package manager

Installation

# Clone with all submodules
git clone --recurse-submodules https://github.com/ORNLxUTK/DomainSpecific.git
cd DomainSpecific

# Install root project (installs sam2 as editable dependency)
uv sync

# Install submodule-specific dependencies
cd SAM2inference && uv sync && cd ..
cd irPOLYMERpreprocess && uv sync && cd ..
cd WAAMlabeledDataset && uv sync && cd ..

Pipeline Overview

                          ┌──────────────────────────────────────────────┐
                          │              Data Preparation                |
                          └──────────────────────────────────────────────┘

  Raw Roboflow/COCO Data ──► Datasets/                ──► DatasetVariants/
  (JSON annotations)         (annotation conversion        (5 custom cross-validation
                              to VOC-style PNG masks)       splits + IR preprocessing)

                          ┌──────────────────────────────────────────────┐
                          │           Training & Inference               │
                          └──────────────────────────────────────────────┘

  Prepared Datasets ──► sam2/training/                ──► SAM2inference/
                        (fine-tune SAM 2.1:                (run inference with
                         LoRA or full weights)              baseline, LoRA, or
                                                            full fine-tuned models)

                          ┌──────────────────────────────────────────────┐
                          │                Evaluation                    │
                          └──────────────────────────────────────────────┘

  Predictions + Ground Truth ──► SAM2inference/metrics.py
                                 (IoU, Boundary F-score, J&F)
                                 ──► Tables & Plots

Submodules

sam2/ — SAM 2 Framework

A fork of facebookresearch/sam2 extended with:

  • LoRA fine-tuning support via the PEFT library (including PiSSA initialization)
  • Full fine-tuning training scripts for custom datasets
  • Cross-validation training configurations
  • SLURM job submission scripts for ablation studies

SAM 2.1 model sizes:

Model Parameters Checkpoint
Tiny 38.9M sam2.1_hiera_tiny.pt
Small 46M sam2.1_hiera_small.pt
Base-Plus 80.8M sam2.1_hiera_base_plus.pt
Large 224.4M sam2.1_hiera_large.pt

SAM2inference/ — Inference & Evaluation

Runs inference and computes metrics across all model variants.

Key scripts:

Script Purpose
baseline_inference.py Run pre-trained SAM 2.1 checkpoints (no adaptation)
lora_inference.py Run LoRA fine-tuned models
fullfinetune_inference.py Run fully fine-tuned models
metrics.py Compute IoU, Boundary F-score; generate tables and plots
create_prompts.py Interactive point prompt creation (OpenCV GUI)
sav_benchmark.py SAV dataset evaluation framework

Prompt creation: Point prompts are created interactively by clicking on the first frame of each video using an OpenCV GUI. Left-click adds positive points (object location); right-click adds negative points (background). Prompts are saved as pickle files.

Inference output structure:

SAM2images/{dataset}/JPEGImages/test/
├── baselineinference/{video}/{model_size}/
├── lorainferenceeva/{video}/{model_size}/{lora_rank}/
└── fullfinetuneinference/{video}/{model_size}/

Datasets/ — Annotation Conversion

Converts Roboflow COCO-format annotations (polygon segmentations in JSON) to VOC-style PNG segmentation masks.

  • Input: _annotations.coco.json files with polygon segmentations
  • Output: 3-channel PNG masks with semantic colors
  • Color convention: White (255, 255, 255) = wire/nozzle (category 0); Green (0, 255, 0) = material/melt pool (category 1)

Key script: roboflow_to_annotationimage.py

DatasetVariants/ — Cross-Validation & Preprocessing

Creates 5 custom cross-validation dataset splits and applies IR-specific preprocessing.

Key scripts:

Script Purpose
datasetcombos.py Generate cross-validation splits with maximally dissimilar training sets
preprocess.py Apply BM3D denoising and normalization to IR datasets

Cross-validation strategy: Generates 5 dataset versions with 70/30 train/test splits, selecting video combinations that maximize diversity (most dissimilar training sets) across folds.

IR preprocessing variants created:

  • irPOLYMERglobaldepthnorm{01-05} — per-pixel depth normalization + BM3D denoising
  • irPOLYMERglobalnorm{01-05} — global min-max normalization + BM3D denoising

irPOLYMERpreprocess/ — IR Image Preprocessing

Specialized preprocessing pipeline for infrared polymer imaging. Used for algorithm exploration and benchmarking before integration into DatasetVariants.

Key scripts:

Script Purpose
main.py Compare denoising algorithms (NL-means, wavelet, TV Chambolle, BM3D)
global.py Normalization experiments with visualization (--plot)
preprocess.py Fixed pipeline: CLAHE + unsharp mask + BM3D denoising

WAAMlabeledDataset/ — LWAM Dataset Preparation

Prepares the Laser Wire Arc Additive Manufacturing (LWAM) dataset for SAM 2 training and evaluation.

Key scripts:

Script Purpose
makeannotationimages.py Convert Roboflow COCO RLE masks to PNG annotation images
createpeftsam2ftdir.py Reorganize into SAM 2-compatible VOC-style directory layout
create_prompts.py Interactive prompt creation for test videos
create_all_prompts.py Batch prompt creation for 1, 3, and 5 clicks per object

Output directory structure:

MAZAK_SAM2_Roboflow_Frames/
├── JPEGImages/{train,test,val}/{video_id}/00000.jpg, 00001.jpg, ...
├── Annotations/{train,test,val}/{video_id}/00000.png, 00001.png, ...
└── JPEGImages/test/prompts/sam2_prompt.pkl

Fine-Tuning Strategies

Three adaptation strategies are compared:

  1. Baseline — Pre-trained SAM 2.1 checkpoints used directly without any domain adaptation. Tests zero-shot generalization to additive manufacturing domains.

  2. LoRA (Low-Rank Adaptation) — Lightweight adaptation using the PEFT library. Injects low-rank trainable matrices while freezing the pre-trained weights. Tested at ranks 2, 4, 16, and 32 to evaluate the trade-off between adaptation capacity and parameter efficiency.

  3. Full Fine-Tune — All model weights are updated during training. Provides maximum adaptation capacity at the cost of storing a complete model copy per dataset.

Datasets

Domain ID Videos per Fold Description Imaging
TIG TIG01–05 5 Tungsten Inert Gas welding Visible
LWAM MAZAK01–05 5 Laser Wire Arc Manufacturing Visible
PAW PLASMA01–05 5 Plasma arc welding Visible
visPOLYMER visPOLYMER01–05 5 Polymer processing Visible
irPOLYMER irPOLYMER01–05 5 Polymer processing Infrared

Segmentation categories (2 per dataset):

  • Category 0 (white mask): Wire / nozzle
  • Category 1 (green mask): Material / melt pool

All datasets use VOC-style directory layout with JPEGImages/ and Annotations/ directories, 5-fold custom cross-validation, and 70/30 train/test splits.

Evaluation Metrics

Evaluation follows the DAVIS 2017 semi-supervised video object segmentation benchmark:

Metric Description
J (IoU) Jaccard Index — intersection-over-union between predicted and ground truth masks
F (Boundary F-score) Contour-based F-measure
J&F Combined score — mean of J and F

Usage

1. Prepare Annotations

# Convert Roboflow annotations to VOC-style masks
cd Datasets
python roboflow_to_annotationimage.py

2. Create Cross-Validation Splits

cd DatasetVariants
python datasetcombos.py

3. Preprocess IR Data (irPOLYMER only)

cd DatasetVariants
python preprocess.py

4. Create Interactive Prompts

cd SAM2inference
python create_prompts.py

5. Run Inference

cd SAM2inference

# Baseline (pre-trained SAM 2.1)
python baseline_inference.py

# LoRA fine-tuned models
python lora_inference.py

# Fully fine-tuned models
python fullfinetune_inference.py

6. Evaluate

cd SAM2inference

# Compute all metrics (baseline + LoRA + full fine-tune)
python metrics.py --metrics all

# Compute only one strategy
python metrics.py --metrics baseline
python metrics.py --metrics lora
python metrics.py --metrics fullfinetune

# Generate plots after computing metrics
python metrics.py --metrics all --plot

# Combine cross-validation folds (e.g., MAZAK01–05 → MAZAK)
python metrics.py --metrics all --plot --combine-variants

# Use a specific LoRA initialization scheme (default: "default")
python metrics.py --metrics lora --init pissa

--datasets filter: Select which datasets to include using underscore-separated abbreviation codes. The default includes all datasets: L_P_T_ir_irG_irD_vis.

Code Dataset
L MAZAK (LWAM)
P PLASMA (PAW)
T TIG
ir irPOLYMER
irG irPOLYMERglobalnorm
irD irPOLYMERglobaldepthnorm
vis visPOLYMER
# Only TIG and MAZAK
python metrics.py --metrics all --datasets T_L --plot

# Only infrared variants
python metrics.py --metrics lora --datasets ir_irG_irD --plot

# Single dataset
python metrics.py --metrics all --datasets vis --plot

Citation

If you use this code in your research, please cite:

@article{,
  title={},
  author={Wetzel, Jon Calvin and others},
  journal={TBD},
  year={2026}
}

License

This project builds on SAM 2 by Meta AI, licensed under the Apache License 2.0.

Acknowledgments

This work was developed as a collaboration between Oak Ridge National Laboratory (ORNL) and the University of Tennessee, Knoxville (UTK).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages