Hi SkillOpt team,
Thank you for releasing SkillOpt. I would like to request the exact benchmark data artifacts used in the paper, especially the train/validation/test split manifests or stable sample IDs.
Motivation
The code and paper make the high-level experimental protocol fairly clear. However, without the exact splits, it is difficult to reproduce the reported numbers directly. Readers can still run a protocol-level reproduction on reconstructed splits, but the results may not be comparable to the paper's reported cells.
SearchQA example
Using SearchQA as a concrete example:
- One commonly used public Hugging Face version,
lucadiliello/searchqa, contains 117,384 training examples and 16,980 validation examples, with no separate public test split.
- In
configs/searchqa/default.yaml, SkillOpt sets train_size: 400, split_mode: split_dir, split_ratio: "2:1:7", and split_dir: data/searchqa_split.
- From this, readers can infer an intended 400/200/1400 train/selection/test split, but cannot know:
- which raw SearchQA examples were selected;
- whether the examples came from the upstream train split, validation split, or a merged pool;
- what random seed, filtering, or preprocessing was used;
- whether empty or invalid examples were removed;
- whether the exact same split was fixed across all target models and baselines.
Requested artifacts
Would it be possible to release one of the following?
- The exact
data/searchqa_split/ directory used for the paper.
- A split manifest with stable sample IDs, upstream split names, original indices, and the preprocessing script.
- Prepared dataset repositories, for example on Hugging Face, containing the exact train/validation/test splits.
It would also be very helpful to release the same exact split manifests for the other benchmarks (SpreadsheetBench, OfficeQA, DocVQA, LiveMathematicianBench, and ALFWorld), since several configs point to local data/..._split or data/ablation_splits/... paths that are not included in the repository.
This is related to #13, which asks about the SearchQA split configuration, and #10, which asks for releasing benchmark datasets/artifacts. My request is specifically about the exact split manifests or sample IDs needed to reproduce the paper's reported results.
Thanks again for the work.
Hi SkillOpt team,
Thank you for releasing SkillOpt. I would like to request the exact benchmark data artifacts used in the paper, especially the train/validation/test split manifests or stable sample IDs.
Motivation
The code and paper make the high-level experimental protocol fairly clear. However, without the exact splits, it is difficult to reproduce the reported numbers directly. Readers can still run a protocol-level reproduction on reconstructed splits, but the results may not be comparable to the paper's reported cells.
SearchQA example
Using SearchQA as a concrete example:
lucadiliello/searchqa, contains 117,384 training examples and 16,980 validation examples, with no separate public test split.configs/searchqa/default.yaml, SkillOpt setstrain_size: 400,split_mode: split_dir,split_ratio: "2:1:7", andsplit_dir: data/searchqa_split.Requested artifacts
Would it be possible to release one of the following?
data/searchqa_split/directory used for the paper.It would also be very helpful to release the same exact split manifests for the other benchmarks (
SpreadsheetBench,OfficeQA,DocVQA,LiveMathematicianBench, andALFWorld), since several configs point to localdata/..._splitordata/ablation_splits/...paths that are not included in the repository.This is related to #13, which asks about the SearchQA split configuration, and #10, which asks for releasing benchmark datasets/artifacts. My request is specifically about the exact split manifests or sample IDs needed to reproduce the paper's reported results.
Thanks again for the work.