Skip to content

Add VANTAGE + TAR reproduction kit for cosmos3 reasoner (vlmevalkit)#245

Open
kding1 wants to merge 1 commit into
NVIDIA:mainfrom
kding1:vlmevalkit-vantage-and-tar
Open

Add VANTAGE + TAR reproduction kit for cosmos3 reasoner (vlmevalkit)#245
kding1 wants to merge 1 commit into
NVIDIA:mainfrom
kding1:vlmevalkit-vantage-and-tar

Conversation

@kding1

@kding1 kding1 commented Jun 28, 2026

Copy link
Copy Markdown

Summary

Adds evaluation/cosmos3/reasoner/vlmevalkit/ — an NVIDIA-customized build of VLMEvalKit that makes the Cosmos3 Reasoner scores on the NVIDIA Metropolis "smart infra" benchmarks publicly reproducible. The reproduction logic lives in cosmos_eval/; the rest of the tree is the stock VLMEvalKit inference + scoring engine it drives.

Benchmarks shipped (9 total):

  • TAR — Traffic Anomaly Reasoning (AETCBench_all)
  • 8 × VANTAGEVANTAGE_VQA, VANTAGE_DVC, VANTAGE_Temporal,
    VANTAGE_EventVerification, VANTAGE_2DGrounding, VANTAGE_2DPointing,
    VANTAGE_SOT, VANTAGE_Astro2D

What's included

cosmos_eval/ — the reproduction kit (stdlib-only, no extra eval logic):

  • data/<domain>/<bench>.json — per-benchmark dataset config (model-independent)
  • models/<family>.json — per-family model layer (cosmos)
  • manifest.json — per-benchmark run.py flags
  • run_all.py — parallel launcher (compose → run.py → score)
  • parse_score.py — standalone score reporter that matches the internal pipeline

Plus the VLMEvalKit engine (run.py, vlmeval/, …); the upstream toolkit README is preserved as README-vlmevalkit.md.

Reproducing scores

Deploy the model behind an OpenAI-compatible endpoint, set the COSMOS_* env vars, then:

python cosmos_eval/run_all.py --model cosmos --concurrency 8 --work-dir ./out

Full setup in evaluation/cosmos3/reasoner/vlmevalkit/README.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant