Skip to content

Release exact train/validation/test split manifests for reproducibility #14

@LifeIsSoSolong

Description

@LifeIsSoSolong

Hi SkillOpt team,

Thank you for releasing SkillOpt. I would like to request the exact benchmark data artifacts used in the paper, especially the train/validation/test split manifests or stable sample IDs.

Motivation

The code and paper make the high-level experimental protocol fairly clear. However, without the exact splits, it is difficult to reproduce the reported numbers directly. Readers can still run a protocol-level reproduction on reconstructed splits, but the results may not be comparable to the paper's reported cells.

SearchQA example

Using SearchQA as a concrete example:

  • One commonly used public Hugging Face version, lucadiliello/searchqa, contains 117,384 training examples and 16,980 validation examples, with no separate public test split.
  • In configs/searchqa/default.yaml, SkillOpt sets train_size: 400, split_mode: split_dir, split_ratio: "2:1:7", and split_dir: data/searchqa_split.
  • From this, readers can infer an intended 400/200/1400 train/selection/test split, but cannot know:
    • which raw SearchQA examples were selected;
    • whether the examples came from the upstream train split, validation split, or a merged pool;
    • what random seed, filtering, or preprocessing was used;
    • whether empty or invalid examples were removed;
    • whether the exact same split was fixed across all target models and baselines.

Requested artifacts

Would it be possible to release one of the following?

  1. The exact data/searchqa_split/ directory used for the paper.
  2. A split manifest with stable sample IDs, upstream split names, original indices, and the preprocessing script.
  3. Prepared dataset repositories, for example on Hugging Face, containing the exact train/validation/test splits.

It would also be very helpful to release the same exact split manifests for the other benchmarks (SpreadsheetBench, OfficeQA, DocVQA, LiveMathematicianBench, and ALFWorld), since several configs point to local data/..._split or data/ablation_splits/... paths that are not included in the repository.

This is related to #13, which asks about the SearchQA split configuration, and #10, which asks for releasing benchmark datasets/artifacts. My request is specifically about the exact split manifests or sample IDs needed to reproduce the paper's reported results.

Thanks again for the work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions