Add --thinking flag to orchestrator for reasoning depth control by ScuttleBot · Pull Request #40 · pinchbench/scripts

ScuttleBot · 2026-06-03T02:36:00Z

Summary

Adds --thinking flag to orchestrate_vultr.py so users can set reasoning depth when benchmarking models that support it (e.g., mercury-2).

How it works

Orchestrator: --thinking medium writes the level to /root/benchmark_thinking.txt on each Vultr instance
Bench runner: reads the file and passes --thinking <level> to benchmark.py
Benchmark: already supports --thinking and passes it through to OpenClaw

Usage

# Single model with medium reasoning
uv run orchestrate_vultr.py --models openrouter/inception/mercury-2 --thinking medium

# Batch with high reasoning
uv run orchestrate_vultr.py --models model1 model2 --count 2 --thinking high

Valid levels

off, minimal, low, medium, high, xhigh, adaptive

Notes

The benchmark.py script already had --thinking support — this just wires it through the orchestrator
If --thinking is omitted, behavior is unchanged (uses model default)
Follows the same file-based pattern as --no-fail-fast and --official-key

Passes --thinking through to Vultr instances via benchmark_thinking.txt, which bench_runner.sh reads and passes to benchmark.py. Example: uv run orchestrate_vultr.py --models model1 --thinking medium

kilo-code-bot · 2026-06-03T02:37:02Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Solid implementation. The file-based handoff pattern is consistent with existing --no-fail-fast and --official-key handling. Shell escaping with single-quote doubling ('\'') is correct, the atomic write (.tmp + mv) prevents partial reads, and the bash array expansion "${THINKING_ARG[@]}" is idiomatic and safe.

Files Reviewed (2 files)

bench_runner.sh — thinking file read and arg forwarding to benchmark.py
orchestrate_vultr.py — --thinking CLI arg, SSH file write with atomic rename
docs/benchmark-observability-plan.md — documentation only, no code issues

_{Reviewed by claude-4.6-sonnet-20260217 · 97,911 tokens}

Add --thinking flag to orchestrator for reasoning depth control

2fdd6af

Passes --thinking through to Vultr instances via benchmark_thinking.txt, which bench_runner.sh reads and passes to benchmark.py. Example: uv run orchestrate_vultr.py --models model1 --thinking medium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --thinking flag to orchestrator for reasoning depth control#40

Add --thinking flag to orchestrator for reasoning depth control#40
ScuttleBot wants to merge 1 commit into
mainfrom
fix-model-page-404

ScuttleBot commented Jun 3, 2026

Uh oh!

kilo-code-bot Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ScuttleBot commented Jun 3, 2026

Summary

How it works

Usage

Valid levels

Notes

Uh oh!

kilo-code-bot Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kilo-code-bot Bot commented Jun 3, 2026 •

edited

Loading