Skip to content

orailix/ride

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RIDE: An Open Dataset and Benchmark for Train Delay Prediction

arXiv

RIDE is an open dataset and benchmark for train delay prediction over the Belgian railway network. It provides a reusable relational data release, model-ready benchmark datasets with shared train/test splits, and a common evaluation protocol for comparing train delay prediction models.

Example railway network snapshot from RIDE

Overview

RIDE is organized around four components:

  • Silver: a reusable relational dataset over train events, journeys, railway infrastructure, and weather observations.
  • Gold: a shared benchmark core with fixed train/test snapshots, prediction instances, target values, and a test evaluation table. Built on this shared core, Gold provides four model-ready datasets for downstream models: tabular, sequential, GNN, and graph-event. The Gold release is available in Lite and Standard tiers.
  • Evaluation protocol: using the Gold core, all models are evaluated on the same prediction instances with unified metrics, including MAE, RMSE, and breakdowns by prediction horizon and delay change.
  • Benchmark: a comparison of non-learning methods, with a Translation baseline and Graph-event model; statistical learning methods, with XGBoost; and deep learning models, with MLP, LSTM, Transformer, and GNN, using our evaluation protocol.

RIDE Dataset Releases

Asset Description
Silver Reusable relational dataset for downstream dataset construction.
Gold Lite Smaller benchmark tier for fast experimentation.
Gold Standard Full benchmark tier used for the main paper results.

Repository Structure

Path Description
src/ Reusable Python code for source downloads, dataset construction, benchmark models, and evaluation utilities.
configs/ Dataset pipeline settings, selected benchmark model configurations, and Optuna search spaces.
manifests/ Executable Bronze and Silver table specifications: sources, outputs, transforms, checks, and field metadata.
scripts/ Command-line entry points for data download/build steps, benchmark training/evaluation, hyperparameter search, and figure generation.
docs/ Task-oriented guides for setup, repository structure, dataset download, extension, and paper reproducibility.
notebooks/ Interactive walkthroughs for inspecting Silver, understanding Gold, and running a benchmark training/evaluation flow.

What do you want to do?

Get Started

Extend RIDE

Reproduce the Paper

Benchmark Results

Main test-set results on the Gold Standard tier. MAE/RMSE are in seconds; ± is std. over 10 seeds.

Model MAE RMSE
Translation 96.65 233.42
Graph-event 88.41 232.48
MLP 77.20 ± 0.04 203.21 ± 0.40
XGBoost 76.58 ± 0.01 203.46 ± 0.02
LSTM 74.62 ± 0.27 202.63 ± 0.77
Transformer 74.54 ± 0.25 195.39 ± 0.59
GNN 73.62 ± 0.19 194.56 ± 0.88

Citation

Preprint: RIDE: An Open Dataset and Benchmark for Train Delay Prediction

@misc{elliker2026rideopendatasetbenchmark,
      title={RIDE: An Open Dataset and Benchmark for Train Delay Prediction},
      author={Clément Elliker and Mathis Le Bail and Clément Mantoux and Jesse Read and Sonia Vanier},
      year={2026},
      eprint={2606.05070},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2606.05070},
}

License

The source code in this repository is released under the MIT license; see LICENSE.

The released RIDE datasets are distributed under CC BY 4.0; see DATA_LICENSE.md. RIDE is derived from Infrabel Open Data (CC0) and Open-Meteo API data (CC BY 4.0).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors