Add VANTAGE + TAR reproduction kit for cosmos3 reasoner (vlmevalkit) by kding1 · Pull Request #245 · NVIDIA/cosmos

kding1 · 2026-06-28T22:09:21Z

Summary

Adds evaluation/cosmos3/reasoner/vlmevalkit/ — an NVIDIA-customized build of VLMEvalKit that makes the Cosmos3 Reasoner scores on the NVIDIA Metropolis "smart infra" benchmarks publicly reproducible. The reproduction logic lives in cosmos_eval/; the rest of the tree is the stock VLMEvalKit inference + scoring engine it drives.

Benchmarks shipped (9 total):

TAR — Traffic Anomaly Reasoning (AETCBench_all)
8 × VANTAGE — VANTAGE_VQA, VANTAGE_DVC, VANTAGE_Temporal,
VANTAGE_EventVerification, VANTAGE_2DGrounding, VANTAGE_2DPointing,
VANTAGE_SOT, VANTAGE_Astro2D

What's included

cosmos_eval/ — the reproduction kit (stdlib-only, no extra eval logic):

data/<domain>/<bench>.json — per-benchmark dataset config (model-independent)
models/<family>.json — per-family model layer (cosmos)
manifest.json — per-benchmark run.py flags
run_all.py — parallel launcher (compose → run.py → score)
parse_score.py — standalone score reporter that matches the internal pipeline

Plus the VLMEvalKit engine (run.py, vlmeval/, …); the upstream toolkit README is preserved as README-vlmevalkit.md.

Reproducing scores

Deploy the model behind an OpenAI-compatible endpoint, set the COSMOS_* env vars, then:

python cosmos_eval/run_all.py --model cosmos --concurrency 8 --work-dir ./out

Full setup in evaluation/cosmos3/reasoner/vlmevalkit/README.md.

Signed-off-by: Ke Ding <keding@nvidia.com>

Add VANTAGE + TAR reproduction kit for cosmos3 reasoner (vlmevalkit)

b9ee576

Signed-off-by: Ke Ding <keding@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add VANTAGE + TAR reproduction kit for cosmos3 reasoner (vlmevalkit)#245

Add VANTAGE + TAR reproduction kit for cosmos3 reasoner (vlmevalkit)#245
kding1 wants to merge 1 commit into
NVIDIA:mainfrom
kding1:vlmevalkit-vantage-and-tar

kding1 commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kding1 commented Jun 28, 2026

Summary

What's included

Reproducing scores

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant