OR-Space

A full-lifecycle workspace benchmark for industrial optimization agents.

OR-Space evaluates whether language-model agents can perform reliable operations research work inside executable, multi-file workspaces. Each instance separates business requirements, structured parameter files, code artifacts, solver state, and evaluation targets instead of flattening the optimization problem into one prompt.

Links

Resource	Location
Dataset	huggingface.co/datasets/Chenyu-Zhou/OR-Space
Code repository	github.com/0xzhouchenyu/OR-Space
Paper	arXiv link coming with the public manuscript release

Benchmark

OR-Space contains 100 industrial optimization topologies, each rendered as three task views on the same underlying mathematical problem:

Task	What the agent receives	What is evaluated
Build	Business documents, tabular data, and an empty `src/` scaffold	Whether the agent can write solver-ready code from heterogeneous files
Revise	Original workspace, revised requirements, updated data, and legacy heuristic code	Whether the agent can preserve valid logic while implementing changed requirements
Explain	Original and revised workspaces plus recorded solver artifacts	Whether the agent can ground an explanation in code, data, solver state, and OR theory

Build and Revise are scored by executing the submitted solver program and matching the reference objective value within 1% relative error. Explain is scored with exact-match checklist items plus rubric-based judgments for reasoning, grounding, answer quality, and hallucination control.

Quick Start

Download the release from Hugging Face:

pip install -U huggingface_hub pandas
python - <<'PY'
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="Chenyu-Zhou/OR-Space",
    repo_type="dataset",
    local_dir="OR-Space",
)
PY
unzip -q OR-Space/build-revise-explain_workspaces.zip -d OR-Space

Inspect the task index:

python - <<'PY'
import pandas as pd

index = pd.read_csv("OR-Space/metadata/workspace_index.csv")
print(index.groupby("task_type").size())
print(index.head()[["workspace_id", "task_type", "workspace_path"]])
PY

The expanded workspaces follow this pattern:

build-revise-explain_workspaces/
  build_workspaces/instance_1/
    docs/
    data/
    src/
    metadata.json
  revise_workspaces/instance_1/
    original/
    revised/
    metadata.json
  explain_workspaces/instance_1/
    original/
    revised/
    solver_artifacts/
    metadata.json

What This Repo Contains

The public GitHub repository is the project and supplementary-code companion. The full dataset package is published through the Hugging Face dataset repository.

.
  README.md
  LICENSE
  figs/                     Project-page figures
  01_build/                 Build workspace generation utilities
  02_revise_modeling/       Revise workspace generation utilities
  03_revise_business/       Business-voice rewriting utilities
  04_difficulty_judge/      Difficulty judging utilities
  05_business_quality_rubric/
  06_static_diff/           Static revision-diff analysis

Main Paper Findings

Finding	Result
Workspace construction remains hard	The best Build score is 72.0% Pass@1
Revision context is model-dependent	Legacy heuristic code helps strong models but hurts weaker models
Explanation is a distinct capability	Explain scores are weakly correlated with Build and Revise success

These results should be interpreted as benchmark evidence about synthetic, executable OR workspaces, not as a deployment certificate for production optimization systems.

Release Policy

For reproducibility, cite a Hugging Face Hub tag or commit SHA rather than a moving main branch. Planned public tags are:

neurips2026-submission: paper submission snapshot
v1.0: first public archival release

Citation

@misc{zhou2026orspace,
  title = {OR-Space: A Full-Lifecycle Workspace Benchmark for Industrial Optimization Agents},
  author = {Zhou, Chenyu and Lu, Xinyun and Zhao, Jiangyue and Lin, Jianghao and Ge, Dongdong and Ye, Yinyu},
  year = {2026},
  note = {Dataset: https://huggingface.co/datasets/Chenyu-Zhou/OR-Space}
}

License

The dataset release is for non-commercial research use under CC BY-NC 4.0-compatible terms, following the inherited license constraints of the IndustryOR seed topologies. Proprietary solver binaries, commercial API credentials, and third-party model services are not redistributed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OR-Space

Links

Benchmark

Quick Start

What This Repo Contains

Main Paper Findings

Release Policy

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
01_build		01_build
02_revise_modeling		02_revise_modeling
03_revise_business		03_revise_business
04_difficulty_judge		04_difficulty_judge
05_business_quality_rubric		05_business_quality_rubric
06_static_diff		06_static_diff
figs		figs
.gitignore		.gitignore
EXPERIMENT_DRAFT_SUGGESTIONS.md		EXPERIMENT_DRAFT_SUGGESTIONS.md
LICENSE		LICENSE
README.md		README.md
RELEASE_CHECKLIST.md		RELEASE_CHECKLIST.md

Folders and files

Latest commit

History

Repository files navigation

OR-Space

Links

Benchmark

Quick Start

What This Repo Contains

Main Paper Findings

Release Policy

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages