Skip to content

0xzhouchenyu/OR-Space

Repository files navigation

OR-Space logo

OR-Space

A full-lifecycle workspace benchmark for industrial optimization agents.

Dataset License Benchmark

OR-Space evaluates whether language-model agents can perform reliable operations research work inside executable, multi-file workspaces. Each instance separates business requirements, structured parameter files, code artifacts, solver state, and evaluation targets instead of flattening the optimization problem into one prompt.

Overview of the OR-Space Build, Revise, and Explain benchmark

Links

Resource Location
Dataset huggingface.co/datasets/Chenyu-Zhou/OR-Space
Code repository github.com/0xzhouchenyu/OR-Space
Paper arXiv link coming with the public manuscript release

Benchmark

OR-Space contains 100 industrial optimization topologies, each rendered as three task views on the same underlying mathematical problem:

Task What the agent receives What is evaluated
Build Business documents, tabular data, and an empty src/ scaffold Whether the agent can write solver-ready code from heterogeneous files
Revise Original workspace, revised requirements, updated data, and legacy heuristic code Whether the agent can preserve valid logic while implementing changed requirements
Explain Original and revised workspaces plus recorded solver artifacts Whether the agent can ground an explanation in code, data, solver state, and OR theory

Build and Revise are scored by executing the submitted solver program and matching the reference objective value within 1% relative error. Explain is scored with exact-match checklist items plus rubric-based judgments for reasoning, grounding, answer quality, and hallucination control.

Quick Start

Download the release from Hugging Face:

pip install -U huggingface_hub pandas
python - <<'PY'
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="Chenyu-Zhou/OR-Space",
    repo_type="dataset",
    local_dir="OR-Space",
)
PY
unzip -q OR-Space/build-revise-explain_workspaces.zip -d OR-Space

Inspect the task index:

python - <<'PY'
import pandas as pd

index = pd.read_csv("OR-Space/metadata/workspace_index.csv")
print(index.groupby("task_type").size())
print(index.head()[["workspace_id", "task_type", "workspace_path"]])
PY

The expanded workspaces follow this pattern:

build-revise-explain_workspaces/
  build_workspaces/instance_1/
    docs/
    data/
    src/
    metadata.json
  revise_workspaces/instance_1/
    original/
    revised/
    metadata.json
  explain_workspaces/instance_1/
    original/
    revised/
    solver_artifacts/
    metadata.json

What This Repo Contains

The public GitHub repository is the project and supplementary-code companion. The full dataset package is published through the Hugging Face dataset repository.

.
  README.md
  LICENSE
  figs/                     Project-page figures
  01_build/                 Build workspace generation utilities
  02_revise_modeling/       Revise workspace generation utilities
  03_revise_business/       Business-voice rewriting utilities
  04_difficulty_judge/      Difficulty judging utilities
  05_business_quality_rubric/
  06_static_diff/           Static revision-diff analysis

Main Paper Findings

Finding Result
Workspace construction remains hard The best Build score is 72.0% Pass@1
Revision context is model-dependent Legacy heuristic code helps strong models but hurts weaker models
Explanation is a distinct capability Explain scores are weakly correlated with Build and Revise success

These results should be interpreted as benchmark evidence about synthetic, executable OR workspaces, not as a deployment certificate for production optimization systems.

Release Policy

For reproducibility, cite a Hugging Face Hub tag or commit SHA rather than a moving main branch. Planned public tags are:

  • neurips2026-submission: paper submission snapshot
  • v1.0: first public archival release

Citation

@misc{zhou2026orspace,
  title = {OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents},
  author = {Zhou, Chenyu and Lu, Xinyun and Zhao, Jiangyue and Lin, Jianghao and Ge, Dongdong and Ye, Yinyu},
  year = {2026},
  note = {Dataset: https://huggingface.co/datasets/Chenyu-Zhou/OR-Space}
}

License

The dataset release is for non-commercial research use under CC BY-NC 4.0-compatible terms, following the inherited license constraints of the IndustryOR seed topologies. Proprietary solver binaries, commercial API credentials, and third-party model services are not redistributed.

About

OR-Space: a full-lifecycle workspace benchmark for industrial optimization agents

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages