feat(core): add token_budget parameter to limit retry token usage#2296
Open
Jwrede wants to merge 1 commit into567-labs:mainfrom
Open
feat(core): add token_budget parameter to limit retry token usage#2296Jwrede wants to merge 1 commit into567-labs:mainfrom
Jwrede wants to merge 1 commit into567-labs:mainfrom
Conversation
Add optional token_budget parameter that stops retries when cumulative token usage across all attempts exceeds the specified limit. This mitigates the retry amplification attack surface described in 567-labs#2056, where adversarial LLM outputs can cause unbounded context growth and cost amplification through repeated validation failures. Usage: client.chat.completions.create( response_model=MyModel, max_retries=10, token_budget=5000, # stop retrying after 5000 total tokens ) The budget is enforced via a tenacity stop condition that checks cumulative usage (OpenAI total_tokens or Anthropic input+output) between retry attempts. Refs 567-labs#2056 Signed-off-by: Jonathan Wrede <wrede.jonathan00@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refs #2056
Adds an optional
token_budgetparameter that stops retries when cumulative token usage exceeds the specified limit. This mitigates the retry amplification vector where adversarial or malformed LLM outputs cause repeated validation failures, leading to unbounded context growth and cost amplification.Usage
When the budget is exceeded between retry attempts, the retry loop stops and raises
InstructorRetryException(same as whenmax_retriesis exhausted).Implementation
token_budgetandusage_containerparameters toinitialize_retrying()total_tokensand Anthropicinput_tokens + output_tokens)token_budgetfrom kwargs in bothretry_syncandretry_async(same pattern as existingtimeout)Test plan
test_token_budget_stops_retries-- verifies retries stop before max_retries when budget exceededtest_token_budget_not_set_retries_normally-- verifies default behavior unchangedtest_token_budget_success_before_limit-- verifies successful responses return normallytest_retry_json_mode.py)