Release exact train/validation/test split manifests for reproducibility

Hi SkillOpt team,

Thank you for releasing SkillOpt. I would like to request the exact benchmark data artifacts used in the paper, especially the train/validation/test split manifests or stable sample IDs.

## Motivation

The code and paper make the high-level experimental protocol fairly clear. However, without the exact splits, it is difficult to reproduce the reported numbers directly. Readers can still run a protocol-level reproduction on reconstructed splits, but the results may not be comparable to the paper's reported cells.

## SearchQA example

Using SearchQA as a concrete example:

- One commonly used public Hugging Face version, `lucadiliello/searchqa`, contains 117,384 training examples and 16,980 validation examples, with no separate public test split.
- In `configs/searchqa/default.yaml`, SkillOpt sets `train_size: 400`, `split_mode: split_dir`, `split_ratio: "2:1:7"`, and `split_dir: data/searchqa_split`.
- From this, readers can infer an intended 400/200/1400 train/selection/test split, but cannot know:
  - which raw SearchQA examples were selected;
  - whether the examples came from the upstream train split, validation split, or a merged pool;
  - what random seed, filtering, or preprocessing was used;
  - whether empty or invalid examples were removed;
  - whether the exact same split was fixed across all target models and baselines.

## Requested artifacts

Would it be possible to release one of the following?

1. The exact `data/searchqa_split/` directory used for the paper.
2. A split manifest with stable sample IDs, upstream split names, original indices, and the preprocessing script.
3. Prepared dataset repositories, for example on Hugging Face, containing the exact train/validation/test splits.

It would also be very helpful to release the same exact split manifests for the other benchmarks (`SpreadsheetBench`, `OfficeQA`, `DocVQA`, `LiveMathematicianBench`, and `ALFWorld`), since several configs point to local `data/..._split` or `data/ablation_splits/...` paths that are not included in the repository.

This is related to #13, which asks about the SearchQA split configuration, and #10, which asks for releasing benchmark datasets/artifacts. My request is specifically about the exact split manifests or sample IDs needed to reproduce the paper's reported results.

Thanks again for the work.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release exact train/validation/test split manifests for reproducibility #14

Motivation

SearchQA example

Requested artifacts

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Release exact train/validation/test split manifests for reproducibility #14

Description

Motivation

SearchQA example

Requested artifacts

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions