Domain-Specific Video Segmentation with SAM 2

ORNLxUTK (Oak Ridge National Laboratory x MARCI Lab @ University of Tennessee, Knoxville)

Overview

This repository contains the code and data pipelines for evaluating SAM 2 (Segment Anything Model 2) on domain-specific video segmentation tasks in additive manufacturing. We systematically compare three adaptation strategies — baseline (zero-shot), LoRA fine-tuning, and full fine-tuning — across five additive manufacturing video domains: TIG, LWAM, PAW, visible-light polymer (visPOLYMER), and infrared polymer imaging (irPOLYMER).

Each domain includes multiple video sequences with per-frame segmentation annotations for two object categories: melt pool (material region) and feed wire / nozzle. All experiments explore cross-validated datasets and evaluate four SAM 2.1 model sizes (tiny, small, base-plus, large) with LoRA ranks of 2, 4, 16, and 32. Evaluation follows the DAVIS 2017 semi-supervised video object segmentation benchmark protocol.

Repository Structure

This is a monorepo with five git submodules:

DomainSpecific/
├── sam2/                   # Meta's SAM 2 framework (forked, with training extensions)
├── SAM2inference/          # Inference, evaluation, and metrics pipelines
├── Datasets/               # Roboflow/COCO annotation conversion to VOC-style masks
├── DatasetVariants/        # Cross-validation splits and IR preprocessing
├── irPOLYMERpreprocess/    # IR-specific preprocessing (BM3D denoising, normalization)
├── WAAMlabeledDataset/     # LWAM dataset preparation and prompt creation
├── pyproject.toml          # Root project configuration
└── README.md

Getting Started

Prerequisites

Python >= 3.11 (3.11.11 recommended for SAM2inference)
CUDA-capable GPU with PyTorch >= 2.8.0
uv package manager

Installation

# Clone with all submodules
git clone --recurse-submodules https://github.com/ORNLxUTK/DomainSpecific.git
cd DomainSpecific

# Install root project (installs sam2 as editable dependency)
uv sync

# Install submodule-specific dependencies
cd SAM2inference && uv sync && cd ..
cd irPOLYMERpreprocess && uv sync && cd ..
cd WAAMlabeledDataset && uv sync && cd ..

Pipeline Overview

                          ┌──────────────────────────────────────────────┐
                          │              Data Preparation                |
                          └──────────────────────────────────────────────┘

  Raw Roboflow/COCO Data ──► Datasets/                ──► DatasetVariants/
  (JSON annotations)         (annotation conversion        (5 custom cross-validation
                              to VOC-style PNG masks)       splits + IR preprocessing)

                          ┌──────────────────────────────────────────────┐
                          │           Training & Inference               │
                          └──────────────────────────────────────────────┘

  Prepared Datasets ──► sam2/training/                ──► SAM2inference/
                        (fine-tune SAM 2.1:                (run inference with
                         LoRA or full weights)              baseline, LoRA, or
                                                            full fine-tuned models)

                          ┌──────────────────────────────────────────────┐
                          │                Evaluation                    │
                          └──────────────────────────────────────────────┘

  Predictions + Ground Truth ──► SAM2inference/metrics.py
                                 (IoU, Boundary F-score, J&F)
                                 ──► Tables & Plots

Submodules

sam2/ — SAM 2 Framework

A fork of facebookresearch/sam2 extended with:

LoRA fine-tuning support via the PEFT library (including PiSSA initialization)
Full fine-tuning training scripts for custom datasets
Cross-validation training configurations
SLURM job submission scripts for ablation studies

SAM 2.1 model sizes:

Model	Parameters	Checkpoint
Tiny	38.9M	`sam2.1_hiera_tiny.pt`
Small	46M	`sam2.1_hiera_small.pt`
Base-Plus	80.8M	`sam2.1_hiera_base_plus.pt`
Large	224.4M	`sam2.1_hiera_large.pt`

SAM2inference/ — Inference & Evaluation

Runs inference and computes metrics across all model variants.

Key scripts:

Script	Purpose
`baseline_inference.py`	Run pre-trained SAM 2.1 checkpoints (no adaptation)
`lora_inference.py`	Run LoRA fine-tuned models
`fullfinetune_inference.py`	Run fully fine-tuned models
`metrics.py`	Compute IoU, Boundary F-score; generate tables and plots
`create_prompts.py`	Interactive point prompt creation (OpenCV GUI)
`sav_benchmark.py`	SAV dataset evaluation framework

Prompt creation: Point prompts are created interactively by clicking on the first frame of each video using an OpenCV GUI. Left-click adds positive points (object location); right-click adds negative points (background). Prompts are saved as pickle files.

Inference output structure:

SAM2images/{dataset}/JPEGImages/test/
├── baselineinference/{video}/{model_size}/
├── lorainferenceeva/{video}/{model_size}/{lora_rank}/
└── fullfinetuneinference/{video}/{model_size}/

Datasets/ — Annotation Conversion

Converts Roboflow COCO-format annotations (polygon segmentations in JSON) to VOC-style PNG segmentation masks.

Input: _annotations.coco.json files with polygon segmentations
Output: 3-channel PNG masks with semantic colors
Color convention: White (255, 255, 255) = wire/nozzle (category 0); Green (0, 255, 0) = material/melt pool (category 1)

Key script: roboflow_to_annotationimage.py

DatasetVariants/ — Cross-Validation & Preprocessing

Creates 5 custom cross-validation dataset splits and applies IR-specific preprocessing.

Key scripts:

Script	Purpose
`datasetcombos.py`	Generate cross-validation splits with maximally dissimilar training sets
`preprocess.py`	Apply BM3D denoising and normalization to IR datasets

Cross-validation strategy: Generates 5 dataset versions with 70/30 train/test splits, selecting video combinations that maximize diversity (most dissimilar training sets) across folds.

IR preprocessing variants created:

irPOLYMERglobaldepthnorm{01-05} — per-pixel depth normalization + BM3D denoising
irPOLYMERglobalnorm{01-05} — global min-max normalization + BM3D denoising

irPOLYMERpreprocess/ — IR Image Preprocessing

Specialized preprocessing pipeline for infrared polymer imaging. Used for algorithm exploration and benchmarking before integration into DatasetVariants.

Key scripts:

Script	Purpose
`main.py`	Compare denoising algorithms (NL-means, wavelet, TV Chambolle, BM3D)
`global.py`	Normalization experiments with visualization (`--plot`)
`preprocess.py`	Fixed pipeline: CLAHE + unsharp mask + BM3D denoising

WAAMlabeledDataset/ — LWAM Dataset Preparation

Prepares the Laser Wire Arc Additive Manufacturing (LWAM) dataset for SAM 2 training and evaluation.

Key scripts:

Script	Purpose
`makeannotationimages.py`	Convert Roboflow COCO RLE masks to PNG annotation images
`createpeftsam2ftdir.py`	Reorganize into SAM 2-compatible VOC-style directory layout
`create_prompts.py`	Interactive prompt creation for test videos
`create_all_prompts.py`	Batch prompt creation for 1, 3, and 5 clicks per object

Output directory structure:

MAZAK_SAM2_Roboflow_Frames/
├── JPEGImages/{train,test,val}/{video_id}/00000.jpg, 00001.jpg, ...
├── Annotations/{train,test,val}/{video_id}/00000.png, 00001.png, ...
└── JPEGImages/test/prompts/sam2_prompt.pkl

Fine-Tuning Strategies

Three adaptation strategies are compared:

Baseline — Pre-trained SAM 2.1 checkpoints used directly without any domain adaptation. Tests zero-shot generalization to additive manufacturing domains.
LoRA (Low-Rank Adaptation) — Lightweight adaptation using the PEFT library. Injects low-rank trainable matrices while freezing the pre-trained weights. Tested at ranks 2, 4, 16, and 32 to evaluate the trade-off between adaptation capacity and parameter efficiency.
Full Fine-Tune — All model weights are updated during training. Provides maximum adaptation capacity at the cost of storing a complete model copy per dataset.

Datasets

Domain	ID	Videos per Fold	Description	Imaging
TIG	TIG01–05	5	Tungsten Inert Gas welding	Visible
LWAM	MAZAK01–05	5	Laser Wire Arc Manufacturing	Visible
PAW	PLASMA01–05	5	Plasma arc welding	Visible
visPOLYMER	visPOLYMER01–05	5	Polymer processing	Visible
irPOLYMER	irPOLYMER01–05	5	Polymer processing	Infrared

Segmentation categories (2 per dataset):

Category 0 (white mask): Wire / nozzle
Category 1 (green mask): Material / melt pool

All datasets use VOC-style directory layout with JPEGImages/ and Annotations/ directories, 5-fold custom cross-validation, and 70/30 train/test splits.

Evaluation Metrics

Evaluation follows the DAVIS 2017 semi-supervised video object segmentation benchmark:

Metric	Description
J (IoU)	Jaccard Index — intersection-over-union between predicted and ground truth masks
F (Boundary F-score)	Contour-based F-measure
J&F	Combined score — mean of J and F

Usage

1. Prepare Annotations

# Convert Roboflow annotations to VOC-style masks
cd Datasets
python roboflow_to_annotationimage.py

2. Create Cross-Validation Splits

cd DatasetVariants
python datasetcombos.py

3. Preprocess IR Data (irPOLYMER only)

cd DatasetVariants
python preprocess.py

4. Create Interactive Prompts

cd SAM2inference
python create_prompts.py

5. Run Inference

cd SAM2inference

# Baseline (pre-trained SAM 2.1)
python baseline_inference.py

# LoRA fine-tuned models
python lora_inference.py

# Fully fine-tuned models
python fullfinetune_inference.py

6. Evaluate

cd SAM2inference

# Compute all metrics (baseline + LoRA + full fine-tune)
python metrics.py --metrics all

# Compute only one strategy
python metrics.py --metrics baseline
python metrics.py --metrics lora
python metrics.py --metrics fullfinetune

# Generate plots after computing metrics
python metrics.py --metrics all --plot

# Combine cross-validation folds (e.g., MAZAK01–05 → MAZAK)
python metrics.py --metrics all --plot --combine-variants

# Use a specific LoRA initialization scheme (default: "default")
python metrics.py --metrics lora --init pissa

--datasets filter: Select which datasets to include using underscore-separated abbreviation codes. The default includes all datasets: L_P_T_ir_irG_irD_vis.

Code	Dataset
`L`	MAZAK (LWAM)
`P`	PLASMA (PAW)
`T`	TIG
`ir`	irPOLYMER
`irG`	irPOLYMERglobalnorm
`irD`	irPOLYMERglobaldepthnorm
`vis`	visPOLYMER

# Only TIG and MAZAK
python metrics.py --metrics all --datasets T_L --plot

# Only infrared variants
python metrics.py --metrics lora --datasets ir_irG_irD --plot

# Single dataset
python metrics.py --metrics all --datasets vis --plot

Citation

If you use this code in your research, please cite:

@article{,
  title={},
  author={Wetzel, Jon Calvin and others},
  journal={TBD},
  year={2026}
}

License

This project builds on SAM 2 by Meta AI, licensed under the Apache License 2.0.

Acknowledgments

This work was developed as a collaboration between Oak Ridge National Laboratory (ORNL) and the University of Tennessee, Knoxville (UTK).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
DatasetVariants @ d25e3bc		DatasetVariants @ d25e3bc
Datasets @ 2933f30		Datasets @ 2933f30
SAM2inference @ f12b9cb		SAM2inference @ f12b9cb
WAAMlabeledDataset @ 0daea9f		WAAMlabeledDataset @ 0daea9f
irPOLYMERpreprocess		irPOLYMERpreprocess
sam2 @ 71b2f72		sam2 @ 71b2f72
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Domain-Specific Video Segmentation with SAM 2

Overview

Repository Structure

Getting Started

Prerequisites

Installation

Pipeline Overview

Submodules

sam2/ — SAM 2 Framework

SAM2inference/ — Inference & Evaluation

Datasets/ — Annotation Conversion

DatasetVariants/ — Cross-Validation & Preprocessing

irPOLYMERpreprocess/ — IR Image Preprocessing

WAAMlabeledDataset/ — LWAM Dataset Preparation

Fine-Tuning Strategies

Datasets

Evaluation Metrics

Usage

1. Prepare Annotations

2. Create Cross-Validation Splits

3. Preprocess IR Data (irPOLYMER only)

4. Create Interactive Prompts

5. Run Inference

6. Evaluate

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Domain-Specific Video Segmentation with SAM 2

Overview

Repository Structure

Getting Started

Prerequisites

Installation

Pipeline Overview

Submodules

sam2/ — SAM 2 Framework

SAM2inference/ — Inference & Evaluation

Datasets/ — Annotation Conversion

DatasetVariants/ — Cross-Validation & Preprocessing

irPOLYMERpreprocess/ — IR Image Preprocessing

WAAMlabeledDataset/ — LWAM Dataset Preparation

Fine-Tuning Strategies

Datasets

Evaluation Metrics

Usage

1. Prepare Annotations

2. Create Cross-Validation Splits

3. Preprocess IR Data (irPOLYMER only)

4. Create Interactive Prompts

5. Run Inference

6. Evaluate

Citation

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages