Skip to content

Add --thinking flag to orchestrator for reasoning depth control#40

Open
ScuttleBot wants to merge 1 commit into
mainfrom
fix-model-page-404
Open

Add --thinking flag to orchestrator for reasoning depth control#40
ScuttleBot wants to merge 1 commit into
mainfrom
fix-model-page-404

Conversation

@ScuttleBot
Copy link
Copy Markdown
Contributor

Summary

Adds --thinking flag to orchestrate_vultr.py so users can set reasoning depth when benchmarking models that support it (e.g., mercury-2).

How it works

  1. Orchestrator: --thinking medium writes the level to /root/benchmark_thinking.txt on each Vultr instance
  2. Bench runner: reads the file and passes --thinking <level> to benchmark.py
  3. Benchmark: already supports --thinking and passes it through to OpenClaw

Usage

# Single model with medium reasoning
uv run orchestrate_vultr.py --models openrouter/inception/mercury-2 --thinking medium

# Batch with high reasoning
uv run orchestrate_vultr.py --models model1 model2 --count 2 --thinking high

Valid levels

off, minimal, low, medium, high, xhigh, adaptive

Notes

  • The benchmark.py script already had --thinking support — this just wires it through the orchestrator
  • If --thinking is omitted, behavior is unchanged (uses model default)
  • Follows the same file-based pattern as --no-fail-fast and --official-key

Passes --thinking through to Vultr instances via benchmark_thinking.txt,
which bench_runner.sh reads and passes to benchmark.py.

Example:
  uv run orchestrate_vultr.py --models model1 --thinking medium
@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot Bot commented Jun 3, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Solid implementation. The file-based handoff pattern is consistent with existing --no-fail-fast and --official-key handling. Shell escaping with single-quote doubling ('\'') is correct, the atomic write (.tmp + mv) prevents partial reads, and the bash array expansion "${THINKING_ARG[@]}" is idiomatic and safe.

Files Reviewed (2 files)
  • bench_runner.sh — thinking file read and arg forwarding to benchmark.py
  • orchestrate_vultr.py--thinking CLI arg, SSH file write with atomic rename
  • docs/benchmark-observability-plan.md — documentation only, no code issues

Reviewed by claude-4.6-sonnet-20260217 · 97,911 tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants