Add --reasoning flag for models that support thinking/reasoning levels by ScuttleBot · Pull Request #393 · pinchbench/skill

ScuttleBot · 2026-06-02T18:28:00Z

Adds a --reasoning CLI argument to benchmark.py that gets passed through to the OpenClaw agent's models.json configuration. This enables benchmarking models like inception/mercury-2 with different reasoning levels (low, medium, high).

Usage

# Run mercury-2 with medium reasoning
python3 scripts/benchmark.py --model openrouter/inception/mercury-2 --reasoning medium

Changes

benchmark.py: add --reasoning argument, pass to ensure_agent_exists()
lib_agent.py: accept reasoning param, apply to model config for both custom endpoints and standard OpenRouter flow

The reasoning value is set on the model entry in models.json and passed through to the provider API (e.g. OpenRouter's reasoning parameter).

kilo-code-bot · 2026-06-02T18:29:20Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

The new commit adds _apply_reasoning_to_model and _set_agent_thinking_default helpers, plus wires the reasoning param through both the custom-endpoint and OpenRouter paths. The implementation is solid:

Error handling is consistent throughout (try/except with logger.warning fallbacks)
The custom-endpoint path sets reasoning directly on the model entry dict
The OpenRouter path correctly calls _apply_reasoning_to_model with fallback model-entry creation
_set_agent_thinking_default safely reads and writes the OpenClaw global config using setdefault to avoid KeyErrors

Files Reviewed (2 files)

scripts/benchmark.py
scripts/lib_agent.py

_{Reviewed by claude-4.6-sonnet-20260217 · 116,618 tokens}

Adds a --reasoning CLI argument to benchmark.py that gets passed through to the OpenClaw agent's models.json configuration. This enables benchmarking models like inception/mercury-2 with different reasoning levels (low, medium, high). Changes: - benchmark.py: add --reasoning argument, pass to ensure_agent_exists() - lib_agent.py: accept reasoning param, apply to model config for both custom endpoints and standard OpenRouter flow

ScuttleBot · 2026-06-02T22:04:03Z

Closing this PR — PinchBench already has a --thinking flag that correctly sets thinking levels via OpenClaw's --thinking CLI option. Verified: running with --thinking medium produces transcripts with thinkingLevel: medium. No additional --reasoning flag is needed.

olearycrew force-pushed the add-reasoning-flag branch 2 times, most recently from d3a0766 to 6b9abf0 Compare June 2, 2026 21:19

olearycrew force-pushed the add-reasoning-flag branch from 6b9abf0 to 48962f2 Compare June 2, 2026 21:35

ScuttleBot closed this Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --reasoning flag for models that support thinking/reasoning levels#393

Add --reasoning flag for models that support thinking/reasoning levels#393
ScuttleBot wants to merge 1 commit into
mainfrom
add-reasoning-flag

ScuttleBot commented Jun 2, 2026

Uh oh!

kilo-code-bot Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

ScuttleBot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ScuttleBot commented Jun 2, 2026

Usage

Changes

Uh oh!

kilo-code-bot Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Uh oh!

ScuttleBot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kilo-code-bot Bot commented Jun 2, 2026 •

edited

Loading