Inquiry regarding Table 2 Reproduction and Evaluation Prompts

Hello,

I am currently working on reproducing the results presented in Table 2 of the DriveBench paper. I have two specific questions regarding the experimental setup:

1. Data Specification for Inference

Could you clarify whether the inference results for all models were obtained using drivebench-test.json or drivebench-test-final.json?

Additionally, I would appreciate it if you could explain the motivation behind adding the test-final version, particularly for handling single-image cases.

2. Evaluation Prompt Consistency

I noticed a potential discrepancy between the PERCEPTION_VQA_PROMPT in the repository and the version described in Figure 23 of the paper. Could you please verify this?

Since the paper mentions various prompt types (e.g., rubric-aware, context-aware), could you specify which evaluation prompt was used to generate the results in Table 2?

Thank you for your time and for sharing this valuable research.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry regarding Table 2 Reproduction and Evaluation Prompts #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inquiry regarding Table 2 Reproduction and Evaluation Prompts #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions