Skip to content

Feature: expose a reasoning-disable knob for chat completions providers (Qwen3 / DeepSeek-R1 thinking mode) #137

@Eridanus117

Description

@Eridanus117

Summary

Reasoning-capable chat models (Qwen3 series, DeepSeek-R1, etc.) default to emitting a long internal reasoning chain. Wiki generation is mostly "read code + write markdown description" — it doesn't benefit from a chain-of-thought, but pays the time cost (often 60-80% of completion tokens go to reasoning_content rather than output).

repowise has no surface to disable reasoning per-provider. The user has to either accept the slowdown or fork the LLM provider code.

Reproduction

  • Use any reasoning model via OPENAI_BASE_URL pointing to a runtime that serves it (e.g. Ollama with a recent Qwen3 build, or vLLM/SGLang).
  • Run repowise init and observe completion_tokens vs the actual output text length in the generation phase: most tokens are reasoning, not the wiki content.

Suggested fix

Add an opt-in flag to GenerationConfig (or per-provider config) such as:

@dataclass
class GenerationConfig:
    ...
    disable_reasoning: bool = False
    # When True, providers should pass backend-specific kwargs to disable
    # the model's reasoning chain (e.g. extra_body / chat_template_kwargs /
    # reasoning_effort=minimal, depending on the backend).

Then each LLM provider translates the flag to its own backend syntax. For OpenAI-compatible chat completions, the common patterns are:

  • vLLM / SGLang serving Qwen3: extra_body={"chat_template_kwargs": {"enable_thinking": False}}
  • OpenAI Responses API o-series: reasoning={"effort": "minimal"}
  • Some proxies expose vendor-specific envelopes

Locally, we observed a 3.9× speedup translating ~3800 wiki pages (a related batch task using the same chat completions endpoint) with reasoning disabled, with no measurable quality regression on the output markdown structure or technical content.

Happy to send a PR adding the flag and a default OpenAIProvider mapping; backend-specific mappings can be added incrementally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions