Hi, thanks for releasing the code and checkpoints. We're reproducing the NAVSIM navtest results and would appreciate clarification on the eval pipeline.
Using your released AutoVLA_PDMS_89.ckpt with the navsim PDM scoring in this repo (v1 PDMS, NC·DAC·(5·TTC+5·EP+2·C)/12, DDC weight 0), we get PDMS = 83.69 on the full navtest split (12,146 scenes), not 89.11. The gap is concentrated in the simulation/map-dependent metrics, while geometry-only metrics match Table 1 almost exactly:
NC 97.8 (paper 98.4), Comfort 99.8 (paper 99.9) — match
DAC 92.0 (paper 95.6), EP 78.0 (paper 81.9), TTC 93.7 (paper 98.0) — ~4 pts lower each
Since the checkpoint/trajectories are deterministic, this points to the scoring inputs. Could you share:
The exact navsim commit + nuplan-devkit version used for the navtest 89.11?
How the navtest metric_cache was built (the metric_caching command/config), or could you release the metric_cache you used?
(separately) The paper states SFT LR = 1e-5, but the released config uses 2e-5 — which one produced the 80.54 SFT number?
Thanks!
Hi, thanks for releasing the code and checkpoints. We're reproducing the NAVSIM navtest results and would appreciate clarification on the eval pipeline.
Using your released AutoVLA_PDMS_89.ckpt with the navsim PDM scoring in this repo (v1 PDMS, NC·DAC·(5·TTC+5·EP+2·C)/12, DDC weight 0), we get PDMS = 83.69 on the full navtest split (12,146 scenes), not 89.11. The gap is concentrated in the simulation/map-dependent metrics, while geometry-only metrics match Table 1 almost exactly:
NC 97.8 (paper 98.4), Comfort 99.8 (paper 99.9) — match
DAC 92.0 (paper 95.6), EP 78.0 (paper 81.9), TTC 93.7 (paper 98.0) — ~4 pts lower each
Since the checkpoint/trajectories are deterministic, this points to the scoring inputs. Could you share:
The exact navsim commit + nuplan-devkit version used for the navtest 89.11?
How the navtest metric_cache was built (the metric_caching command/config), or could you release the metric_cache you used?
(separately) The paper states SFT LR = 1e-5, but the released config uses 2e-5 — which one produced the 80.54 SFT number?
Thanks!