Skip to content

feat(core): add token_budget parameter to limit retry token usage#2296

Open
Jwrede wants to merge 1 commit into567-labs:mainfrom
Jwrede:feat/token-budget
Open

feat(core): add token_budget parameter to limit retry token usage#2296
Jwrede wants to merge 1 commit into567-labs:mainfrom
Jwrede:feat/token-budget

Conversation

@Jwrede
Copy link
Copy Markdown

@Jwrede Jwrede commented May 2, 2026

Summary

Refs #2056

Adds an optional token_budget parameter that stops retries when cumulative token usage exceeds the specified limit. This mitigates the retry amplification vector where adversarial or malformed LLM outputs cause repeated validation failures, leading to unbounded context growth and cost amplification.

Usage

client.chat.completions.create(
    response_model=MyModel,
    max_retries=10,
    token_budget=5000,  # stop retrying after 5000 total tokens used
)

When the budget is exceeded between retry attempts, the retry loop stops and raises InstructorRetryException (same as when max_retries is exhausted).

Implementation

  • Adds token_budget and usage_container parameters to initialize_retrying()
  • Uses a closure-based tenacity stop condition that checks cumulative usage (supports both OpenAI total_tokens and Anthropic input_tokens + output_tokens)
  • Extracts token_budget from kwargs in both retry_sync and retry_async (same pattern as existing timeout)
  • No breaking changes -- parameter is optional, behavior is unchanged when not set

Test plan

  • test_token_budget_stops_retries -- verifies retries stop before max_retries when budget exceeded
  • test_token_budget_not_set_retries_normally -- verifies default behavior unchanged
  • test_token_budget_success_before_limit -- verifies successful responses return normally
  • Existing retry tests pass (test_retry_json_mode.py)
  • ruff check + format clean

Add optional token_budget parameter that stops retries when cumulative
token usage across all attempts exceeds the specified limit. This
mitigates the retry amplification attack surface described in 567-labs#2056,
where adversarial LLM outputs can cause unbounded context growth and
cost amplification through repeated validation failures.

Usage:
  client.chat.completions.create(
      response_model=MyModel,
      max_retries=10,
      token_budget=5000,  # stop retrying after 5000 total tokens
  )

The budget is enforced via a tenacity stop condition that checks
cumulative usage (OpenAI total_tokens or Anthropic input+output)
between retry attempts.

Refs 567-labs#2056

Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant