This project is a FastAPI + browser-based demo for finding the common object across a group of images. It combines mask proposals, per-mask feature extraction, graph-based consensus scoring, and object labeling to return one selected mask per image along with group-level similarity metrics.
The repository currently works as a technical preview with two operating modes:
Fallback modefor quick local demos without heavyweight model downloadsFull model modeusing SAM, DINOv2, and CLIP when resources are available
Given a set of at least 3 images, the pipeline:
- preprocesses each uploaded image
- generates candidate masks for each image
- extracts an embedding for every candidate mask
- selects one consensus object per image using a mask-aware graph solver
- labels the selected object in each image
- returns masks, scores, and group-level summary statistics
backend/
app.py FastAPI app and pipeline orchestration
requirements.txt Python dependencies
core/
utils.py Image decoding and preprocessing
graph_solver.py Mask-aware consensus solver
models/
sam_wrapper.py SAM mask generator + fallback masks
dinov2_wrapper.py DINOv2 embeddings + fallback features
clip_wrapper.py CLIP labeling + heuristic fallback
frontend/
index.html Main UI
assets/
css/style.css Frontend styling
js/app.js Upload flow, API calls, result rendering
The backend is implemented in backend/app.py.
GET /healthreturns a simple health payloadPOST /api/process-groupruns the full multi-image pipelinePOST /predictexists but is only a placeholder
The main endpoint validates:
- minimum 3 uploaded images
iou_thresholdin[0, 1]nms_thresholdin[0, 1]
The wrappers are designed to degrade gracefully when full models are unavailable:
SAMProposeruses simple fallback masks by default unlessSAM_GC_FORCE_SAM=1DinoV2Wrapperuses fallback handcrafted features unlessSAM_GC_USE_DINO=1CLIPObjectLabelertries to load CLIP by default, but falls back to heuristic labels if loading fails
The solver in backend/core/graph_solver.py is a mask-aware consensus method, not a plain embedding-only baseline. It combines:
- semantic similarity between mask embeddings
- area compatibility
- SAM score compatibility
- unary plausibility scoring for individual masks
- coordinate-ascent optimization over per-image candidate selections
The frontend is a static single-page interface in frontend/index.html with logic in frontend/assets/js/app.js.
It supports:
- drag-and-drop image upload
- image preview cards
- IoU threshold control
- NMS threshold control
- presence detection toggle
- pipeline step indicators
- result metrics and common object label
- mask overlay rendering for the selected consensus mask
The frontend resolves the backend URL in this order:
?api=<backend-url>query parameter- saved
localStoragevalueSAM_GC_API_BASE - default
http://localhost:8000
From the repository root:
cd backend
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txtYou can download pretrained SAM weights using https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
cd backend
uvicorn app:app --reload --host 0.0.0.0 --port 8000Open frontend/index.html in a browser.
If your browser blocks local file API usage or you prefer a local web server, you can serve the frontend folder with any simple static server.
This is the easiest way to run the project. In this mode:
- SAM uses fallback mask generation
- DINOv2 uses fallback embeddings
- CLIP will try to load, then fall back if unavailable
Start normally:
cd backend
uvicorn app:app --reload --host 0.0.0.0 --port 8000Use this when you have the required model resources and want higher-fidelity results.
cd backend
$env:SAM_GC_FORCE_SAM="1"
$env:SAM_GC_USE_DINO="1"
$env:SAM_GC_USE_CLIP="1"
uvicorn app:app --reload --host 0.0.0.0 --port 8000Notes:
- SAM checkpoint is expected at
backend/models/sam_vit_h_4b8939.pth - DINOv2 is loaded through Torch Hub
- CLIP is loaded through Hugging Face Transformers
- first-time model loading may require internet access and local cache creation
- CPU-only full mode may be slow
Example response:
{
"status": "healthy"
}Multipart form fields:
images: repeated image file input, minimum 3 filesiou_threshold: float, default0.88nms_threshold: float, default0.45presence_detection: boolean, defaulttrue
Typical response fields:
statusimage_countwinning_maskswinning_indicesconsensus_scoresglobal_affinitygroup_cohesionnodes_evaluatedgraph_solver_timemasks_per_imagesimilar_object_labelssimilar_object_confidencescommon_object_labelthresholdsprocess_timesolver_nameobjective_valuerestartsvalid_candidates_per_image
Passed to the backend as iou_threshold. Higher values keep fewer, stronger mask candidates.
Passed as nms_threshold. Lower values suppress overlapping masks more aggressively.
When disabled, smaller masks are filtered more aggressively in the SAM wrapper.
If you run the backend on another machine or a tunnelled environment, open the frontend with:
frontend/index.html?api=http://your-backend-url:8000
Example:
http://127.0.0.1:5500/?api=https://example.ngrok-free.app
If you want to run the backend on Google Colab and use the frontend locally, this is the clearest flow:
Set the runtime to GPU if you want to use full SAM/DINOv2/CLIP mode with better performance.
Run this in a Colab cell:
!git clone https://github.com/vd-0711/EE655_CourseProject.git
%cd /content/EE655_CourseProject/backendInstall the dependencies plus pyngrok:
!pip install -r requirements.txt pyngrok!wget -P /content/Automatic-Co-Segmentation-using-SAM/backend/models/ https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pthReplace "YOUR_NGROK_TOKEN" with your own token from ngrok:
import os
from pyngrok import ngrok
# Get a free token at https://dashboard.ngrok.com/get-started/your-authtoken
os.environ["NGROK_AUTHTOKEN"] = "YOUR_NGROK_TOKEN"
ngrok.set_auth_token(os.environ["NGROK_AUTHTOKEN"])
os.environ["SAM_GC_FORCE_SAM"] = "1"
os.environ["SAM_GC_USE_DINO"] = "1"
os.environ["SAM_GC_USE_CLIP"] = "1"public_url = ngrok.connect(8000).public_url
print("Public API URL:", public_url)Copy the printed public_url. You will use it in the frontend later.
!uvicorn app:app --host 0.0.0.0 --port 8000After Colab prints the public API URL:
- start your frontend locally
- open the frontend in your browser
- append
?api=to the frontend URL - paste the Colab/ngrok public URL after it
Example:
http://127.0.0.1:5500/?api=https://abc123.ngrok-free.app
This tells the local frontend to send requests to the backend running in Colab instead of http://localhost:8000.
- no automated tests are included yet
- fallback mode is convenient for demos, but not equivalent to full-model quality
- full-model setup is only partially automated because weights and first-run downloads are external
- the solver is a practical mask-aware consensus implementation, not a fully productionized research pipeline
- add unit and integration tests for the API
- add startup logging that clearly reports active runtime mode
- add a reproducible model setup script
- improve failure-state messaging in the frontend
- add result export for masks and metrics
cd backend
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
uvicorn app:app --reload --host 0.0.0.0 --port 8000Then open the frontend, upload at least 3 images, and run the pipeline.