This repository is the artifact for the paper:
SVN: Shape Value Numbering for Comprehensive and Practical Safety Assessment
Submitted to OOPSLA 2026 / CGO 2027
It contains the benchmark suite, evaluation scripts, and build orchestration to reproduce the results (RQ1–RQ4) presented in the paper.
.
├── choreo/ # Choreo compiler (git submodule → GitHub)
├── benchmark/
│ ├── choreo/ # 310 Choreo (.co) benchmark cases (15 categories)
│ ├── mlir/ # MLIR tensor+scf comparison cases + manifest
│ ├── memref/ # MLIR memref comparison cases
│ ├── iree/ # IREE comparison cases
│ ├── triton/ # Triton comparison cases
│ └── results/ # (generated locally, not committed)
├── scripts/ # Data collection, plotting, and automation
│ ├── reproduce_all.sh # ★ One-command reproduction script
│ ├── choreo_assertion_stats.py # RQ1/RQ2: assessment statistics
│ ├── choreo_compile_overhead.py # RQ3: compile-time overhead
│ ├── choreo_runtime_entry.py # RQ4: runtime assertion overhead
│ ├── visualize_results.py # Terminal + HTML report generation
│ ├── collect_all_stats.py # Cross-system comparison
│ ├── plot_safety_figures.py # Paper figure generation
│ └── ...
├── latex/
│ └── oopsla26-dsvn/ # Paper sources
├── Makefile # Build targets
└── README.md # This file
| Tool | Version | Notes |
|---|---|---|
| GCC / G++ | >= 9.0 | C++17 support required |
| CMake | >= 3.16 | Build system |
| Ninja | any | ninja-build package |
| Python | >= 3.8 | For statistics and plotting scripts |
| matplotlib | any | Optional: for PNG figures and HTML report |
| Git | any | Submodule checkout |
| flex/bison | >= 2.6/3.8 | Auto-downloaded if missing (see below) |
| CUDA | >= 12.0 | Optional: RQ4 runtime overhead + GPU tests |
Flex and Bison are auto-downloaded and compiled from source during CMake configuration if the system versions are missing or too old.
git clone --recursive <this-repo-url>
cd svn-artifact
bash scripts/reproduce_all.shThis will:
- Initialize the Choreo submodule and its dependencies (cutlass, gtest)
- Build Choreo from source (~1 minute)
- Run compile-time tests (check + cli)
- Collect RQ1/RQ2 assessment statistics (310 cases × 15 categories)
- Measure RQ3 compile-time overhead (152 dynamic cases)
- Measure RQ4 runtime assertion overhead (if CUDA GPU is available)
- Print a comparison table against the paper values
- Generate an interactive HTML report (
benchmark/results/report.html)
Results are written to benchmark/results/.
The script produces:
- Terminal: Rich summary tables with per-category breakdowns for all RQs
benchmark/results/report.html: Self-contained HTML with interactive Chart.js graphsbenchmark/results/figures/: PNG figures for each RQ (requires matplotlib)benchmark/results/choreo_stats.csv: Raw RQ1/RQ2 databenchmark/results/choreo_compile_overhead.csv: Raw RQ3 databenchmark/results/choreo_runtime_entry.csv: Raw RQ4 data (if GPU available)
# 1. Build Choreo
make choreo-build
# 2. Run compile-time tests
make choreo-test
# 3. Collect assessment statistics (RQ1/RQ2)
make choreo-stats
# 4. Measure compile-time overhead (RQ3)
make choreo-cto
# 5. (Optional, requires CUDA GPU) Measure runtime overhead (RQ4)
export CUDA_HOME=/usr/local/cuda
export CUTE_HOME=$(pwd)/choreo/extern/cutlass
python3 scripts/choreo_runtime_entry.py --reps 5
# 6. Generate visualization
python3 scripts/visualize_results.py
# 7. (Optional) Cross-system comparison — requires MLIR baseline
make mlir-clone && make mlir-build
python3 scripts/collect_all_stats.pyThe cross-system comparison (Choreo vs MLIR vs IREE vs Triton) requires building the MLIR tools:
make mlir-clone # shallow-clone llvm-project release/22.x
make mlir-build # build mlir-opt, mlir-translate, FileCheck (~30 min)Then re-run bash scripts/reproduce_all.sh without --skip-mlir.
If a CUDA-capable GPU is available:
export CUDA_HOME=/usr/local/cuda
export CUTE_HOME=$(pwd)/choreo/extern/cutlass
cd choreo && bash tests/lit.sh tests/ && cd ..| Metric | Paper | Expected (current compiler) |
|---|---|---|
| Cases compiled | 291/310 | ≥299/310 (improved) |
| Generated | 11,524 | ~12,000 (improved) |
| Discharged | 10,693 | ~11,150 (improved) |
| Runtime surviving | 831 | ~828 |
| ADR | 92.8% | ~93.1% |
The current compiler fixes cases that previously failed to compile, generating more assessments with a slightly higher discharge rate.
| Metric | Paper | Expected |
|---|---|---|
| CTO | 4.7% | ~5% (machine-dep.) |
| Cases | 152 | 152 |
The aggregate CTO varies slightly across machines due to hardware differences. Per-category trends are stable.
| Metric | Paper | Expected |
|---|---|---|
| Median | <0.1% | ~0% (negligible) |
| Cases | ~152 | 152 |
| Range | — | [-4%, +4%] (noise) |
Entry-level runtime assertions (host-side integer comparisons before kernel launch) impose negligible overhead. The median is consistently near zero.
| Component | Version | Source |
|---|---|---|
| Choreo | svn-artifacts | github.com/LancerLab/croqtile |
| LLVM/MLIR | release/22.x | github.com/llvm/llvm-project |
| IREE | v3.10.0 | pre-compiled or scripts/fetch_mlir_baselines.sh |
| Triton | v3.6.0 | scripts/fetch_mlir_baselines.sh |
| CUTLASS | v4.2.1 | via Choreo submodule |
| GoogleTest | latest | via Choreo submodule |
See individual component licenses. The benchmark cases and evaluation scripts in this repository are provided for artifact evaluation purposes.