feat(core): add token_budget parameter to limit retry token usage by Jwrede · Pull Request #2296 · 567-labs/instructor

Jwrede · 2026-05-02T20:22:57Z

Summary

Refs #2056

Adds an optional token_budget parameter that stops retries when cumulative token usage exceeds the specified limit. This mitigates the retry amplification vector where adversarial or malformed LLM outputs cause repeated validation failures, leading to unbounded context growth and cost amplification.

Usage

client.chat.completions.create(
    response_model=MyModel,
    max_retries=10,
    token_budget=5000,  # stop retrying after 5000 total tokens used
)

When the budget is exceeded between retry attempts, the retry loop stops and raises InstructorRetryException (same as when max_retries is exhausted).

Implementation

Adds token_budget and usage_container parameters to initialize_retrying()
Uses a closure-based tenacity stop condition that checks cumulative usage (supports both OpenAI total_tokens and Anthropic input_tokens + output_tokens)
Extracts token_budget from kwargs in both retry_sync and retry_async (same pattern as existing timeout)
No breaking changes -- parameter is optional, behavior is unchanged when not set

Test plan

test_token_budget_stops_retries -- verifies retries stop before max_retries when budget exceeded
test_token_budget_not_set_retries_normally -- verifies default behavior unchanged
test_token_budget_success_before_limit -- verifies successful responses return normally
Existing retry tests pass (test_retry_json_mode.py)
ruff check + format clean

Add optional token_budget parameter that stops retries when cumulative token usage across all attempts exceeds the specified limit. This mitigates the retry amplification attack surface described in 567-labs#2056, where adversarial LLM outputs can cause unbounded context growth and cost amplification through repeated validation failures. Usage: client.chat.completions.create( response_model=MyModel, max_retries=10, token_budget=5000, # stop retrying after 5000 total tokens ) The budget is enforced via a tenacity stop condition that checks cumulative usage (OpenAI total_tokens or Anthropic input+output) between retry attempts. Refs 567-labs#2056 Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(core): add token_budget parameter to limit retry token usage#2296

feat(core): add token_budget parameter to limit retry token usage#2296
Jwrede wants to merge 1 commit into567-labs:mainfrom
Jwrede:feat/token-budget

Jwrede commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Jwrede commented May 2, 2026

Summary

Usage

Implementation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant