Skip to content

Strip believed-free models from pricing block to prevent cost attribution#108

Merged
BillJr99 merged 1 commit into
mainfrom
claude/zen-bardeen-dw5608
Jun 19, 2026
Merged

Strip believed-free models from pricing block to prevent cost attribution#108
BillJr99 merged 1 commit into
mainfrom
claude/zen-bardeen-dw5608

Conversation

@BillJr99

Copy link
Copy Markdown
Owner

Summary

This PR fixes an issue where models marked as "believed_free" were still being included in the pricing block, causing the cost accounting system to incorrectly attribute per-token costs to free-tier requests. This led to free models being misclassified in the cost tracking system.

Changes

  • Removed stale Cloudflare pricing entries: Deleted three outdated Cloudflare Workers model pricing entries (llama-2-7b-chat-fp16, llama-2-7b-chat-int8, and mistral-7b-instruct-v0.1) from the providers.json pricing data
  • Enhanced pricing merge logic: Updated _merge_pricing() in scripts/update_free_models.py to automatically strip any models marked as "believed_free" from the final pricing block after merging updates
  • Improved documentation: Added detailed docstring explaining why free models must be excluded from the pricing block, noting that providers like Cloudflare have quota-limited free tiers rather than zero-priced tiers

Implementation Details

The fix works by:

  1. Collecting all models currently marked as "believed_free" across all providers
  2. Filtering them out from the merged pricing dictionary before finalizing
  3. This ensures free-tier requests are routed through the free-tier cost accounting path rather than being attributed per-token costs, preventing misclassification in cost tracking

https://claude.ai/code/session_01XLHnQxLYrMzpm5Ar83ihNC

…e run

Some providers (Cloudflare being the clearest case) have a non-zero
per-token price in the LiteLLM cost map because they publish a paid-tier
rate, even though their free plan is quota-limited rather than
zero-priced. _merge_pricing() was writing those LiteLLM-sourced prices
into the pricing block on every run, causing compute_cost() to attribute
a cost to every free-tier request — which eventually lands the model in
cost_observed_free_tier and silently breaks free routing.

Fix: after merging the LiteLLM baseline with per-source overrides,
strip any key that appears in any provider's believed_free list before
writing to sidecar["pricing"]. The filter runs on every update so newly
believed-free models are cleaned up automatically and models removed from
believed_free re-appear in pricing on the next run.

Also remove the three Cloudflare entries already in providers.json that
were in both the pricing block and believed_free (llama-2-7b-chat-fp16,
llama-2-7b-chat-int8, mistral-7b-instruct-v0.1). The fourth entry
(@hf/thebloke/codellama-7b-instruct-awq) is not believed_free and is
left intact.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01XLHnQxLYrMzpm5Ar83ihNC
@BillJr99 BillJr99 merged commit 7da17d7 into main Jun 19, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants