ORNLxUTK (Oak Ridge National Laboratory x MARCI Lab @ University of Tennessee, Knoxville)
This repository contains the code and data pipelines for evaluating SAM 2 (Segment Anything Model 2) on domain-specific video segmentation tasks in additive manufacturing. We systematically compare three adaptation strategies — baseline (zero-shot), LoRA fine-tuning, and full fine-tuning — across five additive manufacturing video domains: TIG, LWAM, PAW, visible-light polymer (visPOLYMER), and infrared polymer imaging (irPOLYMER).
Each domain includes multiple video sequences with per-frame segmentation annotations for two object categories: melt pool (material region) and feed wire / nozzle. All experiments explore cross-validated datasets and evaluate four SAM 2.1 model sizes (tiny, small, base-plus, large) with LoRA ranks of 2, 4, 16, and 32. Evaluation follows the DAVIS 2017 semi-supervised video object segmentation benchmark protocol.
This is a monorepo with five git submodules:
DomainSpecific/
├── sam2/ # Meta's SAM 2 framework (forked, with training extensions)
├── SAM2inference/ # Inference, evaluation, and metrics pipelines
├── Datasets/ # Roboflow/COCO annotation conversion to VOC-style masks
├── DatasetVariants/ # Cross-validation splits and IR preprocessing
├── irPOLYMERpreprocess/ # IR-specific preprocessing (BM3D denoising, normalization)
├── WAAMlabeledDataset/ # LWAM dataset preparation and prompt creation
├── pyproject.toml # Root project configuration
└── README.md
- Python >= 3.11 (3.11.11 recommended for SAM2inference)
- CUDA-capable GPU with PyTorch >= 2.8.0
- uv package manager
# Clone with all submodules
git clone --recurse-submodules https://github.com/ORNLxUTK/DomainSpecific.git
cd DomainSpecific
# Install root project (installs sam2 as editable dependency)
uv sync
# Install submodule-specific dependencies
cd SAM2inference && uv sync && cd ..
cd irPOLYMERpreprocess && uv sync && cd ..
cd WAAMlabeledDataset && uv sync && cd .. ┌──────────────────────────────────────────────┐
│ Data Preparation |
└──────────────────────────────────────────────┘
Raw Roboflow/COCO Data ──► Datasets/ ──► DatasetVariants/
(JSON annotations) (annotation conversion (5 custom cross-validation
to VOC-style PNG masks) splits + IR preprocessing)
┌──────────────────────────────────────────────┐
│ Training & Inference │
└──────────────────────────────────────────────┘
Prepared Datasets ──► sam2/training/ ──► SAM2inference/
(fine-tune SAM 2.1: (run inference with
LoRA or full weights) baseline, LoRA, or
full fine-tuned models)
┌──────────────────────────────────────────────┐
│ Evaluation │
└──────────────────────────────────────────────┘
Predictions + Ground Truth ──► SAM2inference/metrics.py
(IoU, Boundary F-score, J&F)
──► Tables & Plots
A fork of facebookresearch/sam2 extended with:
- LoRA fine-tuning support via the PEFT library (including PiSSA initialization)
- Full fine-tuning training scripts for custom datasets
- Cross-validation training configurations
- SLURM job submission scripts for ablation studies
SAM 2.1 model sizes:
| Model | Parameters | Checkpoint |
|---|---|---|
| Tiny | 38.9M | sam2.1_hiera_tiny.pt |
| Small | 46M | sam2.1_hiera_small.pt |
| Base-Plus | 80.8M | sam2.1_hiera_base_plus.pt |
| Large | 224.4M | sam2.1_hiera_large.pt |
Runs inference and computes metrics across all model variants.
Key scripts:
| Script | Purpose |
|---|---|
baseline_inference.py |
Run pre-trained SAM 2.1 checkpoints (no adaptation) |
lora_inference.py |
Run LoRA fine-tuned models |
fullfinetune_inference.py |
Run fully fine-tuned models |
metrics.py |
Compute IoU, Boundary F-score; generate tables and plots |
create_prompts.py |
Interactive point prompt creation (OpenCV GUI) |
sav_benchmark.py |
SAV dataset evaluation framework |
Prompt creation: Point prompts are created interactively by clicking on the first frame of each video using an OpenCV GUI. Left-click adds positive points (object location); right-click adds negative points (background). Prompts are saved as pickle files.
Inference output structure:
SAM2images/{dataset}/JPEGImages/test/
├── baselineinference/{video}/{model_size}/
├── lorainferenceeva/{video}/{model_size}/{lora_rank}/
└── fullfinetuneinference/{video}/{model_size}/
Converts Roboflow COCO-format annotations (polygon segmentations in JSON) to VOC-style PNG segmentation masks.
- Input:
_annotations.coco.jsonfiles with polygon segmentations - Output: 3-channel PNG masks with semantic colors
- Color convention: White (255, 255, 255) = wire/nozzle (category 0); Green (0, 255, 0) = material/melt pool (category 1)
Key script: roboflow_to_annotationimage.py
Creates 5 custom cross-validation dataset splits and applies IR-specific preprocessing.
Key scripts:
| Script | Purpose |
|---|---|
datasetcombos.py |
Generate cross-validation splits with maximally dissimilar training sets |
preprocess.py |
Apply BM3D denoising and normalization to IR datasets |
Cross-validation strategy: Generates 5 dataset versions with 70/30 train/test splits, selecting video combinations that maximize diversity (most dissimilar training sets) across folds.
IR preprocessing variants created:
irPOLYMERglobaldepthnorm{01-05}— per-pixel depth normalization + BM3D denoisingirPOLYMERglobalnorm{01-05}— global min-max normalization + BM3D denoising
Specialized preprocessing pipeline for infrared polymer imaging. Used for algorithm exploration and benchmarking before integration into DatasetVariants.
Key scripts:
| Script | Purpose |
|---|---|
main.py |
Compare denoising algorithms (NL-means, wavelet, TV Chambolle, BM3D) |
global.py |
Normalization experiments with visualization (--plot) |
preprocess.py |
Fixed pipeline: CLAHE + unsharp mask + BM3D denoising |
Prepares the Laser Wire Arc Additive Manufacturing (LWAM) dataset for SAM 2 training and evaluation.
Key scripts:
| Script | Purpose |
|---|---|
makeannotationimages.py |
Convert Roboflow COCO RLE masks to PNG annotation images |
createpeftsam2ftdir.py |
Reorganize into SAM 2-compatible VOC-style directory layout |
create_prompts.py |
Interactive prompt creation for test videos |
create_all_prompts.py |
Batch prompt creation for 1, 3, and 5 clicks per object |
Output directory structure:
MAZAK_SAM2_Roboflow_Frames/
├── JPEGImages/{train,test,val}/{video_id}/00000.jpg, 00001.jpg, ...
├── Annotations/{train,test,val}/{video_id}/00000.png, 00001.png, ...
└── JPEGImages/test/prompts/sam2_prompt.pkl
Three adaptation strategies are compared:
-
Baseline — Pre-trained SAM 2.1 checkpoints used directly without any domain adaptation. Tests zero-shot generalization to additive manufacturing domains.
-
LoRA (Low-Rank Adaptation) — Lightweight adaptation using the PEFT library. Injects low-rank trainable matrices while freezing the pre-trained weights. Tested at ranks 2, 4, 16, and 32 to evaluate the trade-off between adaptation capacity and parameter efficiency.
-
Full Fine-Tune — All model weights are updated during training. Provides maximum adaptation capacity at the cost of storing a complete model copy per dataset.
| Domain | ID | Videos per Fold | Description | Imaging |
|---|---|---|---|---|
| TIG | TIG01–05 | 5 | Tungsten Inert Gas welding | Visible |
| LWAM | MAZAK01–05 | 5 | Laser Wire Arc Manufacturing | Visible |
| PAW | PLASMA01–05 | 5 | Plasma arc welding | Visible |
| visPOLYMER | visPOLYMER01–05 | 5 | Polymer processing | Visible |
| irPOLYMER | irPOLYMER01–05 | 5 | Polymer processing | Infrared |
Segmentation categories (2 per dataset):
- Category 0 (white mask): Wire / nozzle
- Category 1 (green mask): Material / melt pool
All datasets use VOC-style directory layout with JPEGImages/ and Annotations/ directories, 5-fold custom cross-validation, and 70/30 train/test splits.
Evaluation follows the DAVIS 2017 semi-supervised video object segmentation benchmark:
| Metric | Description |
|---|---|
| J (IoU) | Jaccard Index — intersection-over-union between predicted and ground truth masks |
| F (Boundary F-score) | Contour-based F-measure |
| J&F | Combined score — mean of J and F |
# Convert Roboflow annotations to VOC-style masks
cd Datasets
python roboflow_to_annotationimage.pycd DatasetVariants
python datasetcombos.pycd DatasetVariants
python preprocess.pycd SAM2inference
python create_prompts.pycd SAM2inference
# Baseline (pre-trained SAM 2.1)
python baseline_inference.py
# LoRA fine-tuned models
python lora_inference.py
# Fully fine-tuned models
python fullfinetune_inference.pycd SAM2inference
# Compute all metrics (baseline + LoRA + full fine-tune)
python metrics.py --metrics all
# Compute only one strategy
python metrics.py --metrics baseline
python metrics.py --metrics lora
python metrics.py --metrics fullfinetune
# Generate plots after computing metrics
python metrics.py --metrics all --plot
# Combine cross-validation folds (e.g., MAZAK01–05 → MAZAK)
python metrics.py --metrics all --plot --combine-variants
# Use a specific LoRA initialization scheme (default: "default")
python metrics.py --metrics lora --init pissa--datasets filter: Select which datasets to include using underscore-separated abbreviation codes. The default includes all datasets: L_P_T_ir_irG_irD_vis.
| Code | Dataset |
|---|---|
L |
MAZAK (LWAM) |
P |
PLASMA (PAW) |
T |
TIG |
ir |
irPOLYMER |
irG |
irPOLYMERglobalnorm |
irD |
irPOLYMERglobaldepthnorm |
vis |
visPOLYMER |
# Only TIG and MAZAK
python metrics.py --metrics all --datasets T_L --plot
# Only infrared variants
python metrics.py --metrics lora --datasets ir_irG_irD --plot
# Single dataset
python metrics.py --metrics all --datasets vis --plotIf you use this code in your research, please cite:
@article{,
title={},
author={Wetzel, Jon Calvin and others},
journal={TBD},
year={2026}
}This project builds on SAM 2 by Meta AI, licensed under the Apache License 2.0.
This work was developed as a collaboration between Oak Ridge National Laboratory (ORNL) and the University of Tennessee, Knoxville (UTK).