Skip to content

Reproducing navtest PDMS for the released AutoVLA_PDMS_89 checkpoint — we get 83.69 vs reported 89.11 #48

@yux11180821

Description

@yux11180821

Hi, thanks for releasing the code and checkpoints. We're reproducing the NAVSIM navtest results and would appreciate clarification on the eval pipeline.

Using your released AutoVLA_PDMS_89.ckpt with the navsim PDM scoring in this repo (v1 PDMS, NC·DAC·(5·TTC+5·EP+2·C)/12, DDC weight 0), we get PDMS = 83.69 on the full navtest split (12,146 scenes), not 89.11. The gap is concentrated in the simulation/map-dependent metrics, while geometry-only metrics match Table 1 almost exactly:

NC 97.8 (paper 98.4), Comfort 99.8 (paper 99.9) — match
DAC 92.0 (paper 95.6), EP 78.0 (paper 81.9), TTC 93.7 (paper 98.0) — ~4 pts lower each
Since the checkpoint/trajectories are deterministic, this points to the scoring inputs. Could you share:

The exact navsim commit + nuplan-devkit version used for the navtest 89.11?
How the navtest metric_cache was built (the metric_caching command/config), or could you release the metric_cache you used?
(separately) The paper states SFT LR = 1e-5, but the released config uses 2e-5 — which one produced the 80.54 SFT number?
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions