Skip to content

Add --reasoning flag for models that support thinking/reasoning levels#393

Closed
ScuttleBot wants to merge 1 commit into
mainfrom
add-reasoning-flag
Closed

Add --reasoning flag for models that support thinking/reasoning levels#393
ScuttleBot wants to merge 1 commit into
mainfrom
add-reasoning-flag

Conversation

@ScuttleBot
Copy link
Copy Markdown
Contributor

Adds a --reasoning CLI argument to benchmark.py that gets passed through to the OpenClaw agent's models.json configuration. This enables benchmarking models like inception/mercury-2 with different reasoning levels (low, medium, high).

Usage

# Run mercury-2 with medium reasoning
python3 scripts/benchmark.py --model openrouter/inception/mercury-2 --reasoning medium

Changes

  • benchmark.py: add --reasoning argument, pass to ensure_agent_exists()
  • lib_agent.py: accept reasoning param, apply to model config for both custom endpoints and standard OpenRouter flow

The reasoning value is set on the model entry in models.json and passed through to the provider API (e.g. OpenRouter's reasoning parameter).

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Jun 2, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

The new commit adds _apply_reasoning_to_model and _set_agent_thinking_default helpers, plus wires the reasoning param through both the custom-endpoint and OpenRouter paths. The implementation is solid:

  • Error handling is consistent throughout (try/except with logger.warning fallbacks)
  • The custom-endpoint path sets reasoning directly on the model entry dict
  • The OpenRouter path correctly calls _apply_reasoning_to_model with fallback model-entry creation
  • _set_agent_thinking_default safely reads and writes the OpenClaw global config using setdefault to avoid KeyErrors
Files Reviewed (2 files)
  • scripts/benchmark.py
  • scripts/lib_agent.py

Reviewed by claude-4.6-sonnet-20260217 · 116,618 tokens

@olearycrew olearycrew force-pushed the add-reasoning-flag branch 2 times, most recently from d3a0766 to 6b9abf0 Compare June 2, 2026 21:19
Adds a --reasoning CLI argument to benchmark.py that gets passed through
to the OpenClaw agent's models.json configuration. This enables benchmarking
models like inception/mercury-2 with different reasoning levels (low, medium, high).

Changes:
- benchmark.py: add --reasoning argument, pass to ensure_agent_exists()
- lib_agent.py: accept reasoning param, apply to model config for both
custom endpoints and standard OpenRouter flow
@olearycrew olearycrew force-pushed the add-reasoning-flag branch from 6b9abf0 to 48962f2 Compare June 2, 2026 21:35
@ScuttleBot ScuttleBot closed this Jun 2, 2026
@ScuttleBot
Copy link
Copy Markdown
Contributor Author

Closing this PR — PinchBench already has a --thinking flag that correctly sets thinking levels via OpenClaw's --thinking CLI option. Verified: running with --thinking medium produces transcripts with thinkingLevel: medium. No additional --reasoning flag is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants