Skip to content

fix: gate prompt-cache features by capability, not exact model name#7985

Open
sumleo wants to merge 1 commit into
BasedHardware:mainfrom
sumleo:fix/cache-feature-capability-gate
Open

fix: gate prompt-cache features by capability, not exact model name#7985
sumleo wants to merge 1 commit into
BasedHardware:mainfrom
sumleo:fix/cache-feature-capability-gate

Conversation

@sumleo

@sumleo sumleo commented Jun 17, 2026

Copy link
Copy Markdown

OpenAI prompt-cache features in backend/utils/llm/clients.py were gated by exact model names:

  • prompt_cache_key routing was limited to _CACHE_KEY_MODELS = {'gpt-5.4', 'gpt-5.4-mini'} (in get_llm).
  • prompt_cache_retention: "24h" was gated by model == 'gpt-5.1' (in both _get_or_create_openai_llm and _create_byok_client).

These exact-name gates are brittle: they silently stop applying the moment a model is renamed or a new family member ships, and they are already inconsistent with OpenAI's matrix (e.g. gpt-5.4 is 24h-retention-eligible but never received retention under the old gpt-5.1-only check).

Fix

Detect the capability by model family instead of matching exact names:

  • _supports_prompt_cache_key(model) — gpt-4o / gpt-4.1 / gpt-5.x / o-series
  • _supports_cache_retention(model) — gpt-5.x / o-series

All three gate sites now use these helpers, so every cache-capable model gets the feature regardless of point release. Behavior for existing models is preserved (gpt-5.1 still gets 24h retention; gpt-5.4 / gpt-5.4-mini still get prompt_cache_key), and gpt-5.4 now also correctly receives retention.

Tests

Added regression coverage in tests/unit/test_prompt_cache_integration.py:

  • test_renamed_gpt5_model_still_gets_cache_features — a future/renamed gpt-5 family model still gets prompt_cache_retention=24h and is eligible for prompt_cache_key.
  • test_non_cache_capable_model_is_unchanged — Gemini-style names get neither; gpt-4.1-mini gets routing but not 24h retention.
  • test_get_llm_binds_cache_key_for_cache_capable_modelsget_llm binds prompt_cache_key for cache-capable models.

Existing source-coupled assertions in test_prompt_caching.py / test_prompt_cache_optimization.py were updated to validate the capability-based wiring (including a check that gpt-5.1 stays retention-capable).

Ran locally (targeted, via the existing stubbed unit-test harness):

pytest tests/unit/test_prompt_cache_optimization.py \
       tests/unit/test_prompt_caching.py \
       tests/unit/test_prompt_cache_integration.py
# 53 passed

Files formatted with black --line-length 120 --skip-string-normalization.

prompt_cache_key was gated by an exact-name set (_CACHE_KEY_MODELS =
{'gpt-5.4', 'gpt-5.4-mini'}) and prompt_cache_retention='24h' was gated
by 'model == gpt-5.1'. Both are brittle: they silently stop applying the
moment a model is renamed or a new family member ships, and they are
already inconsistent with OpenAI's matrix (e.g. gpt-5.4 is 24h-eligible
but never received retention).

Replace the exact-name gates with capability detection by model family:

- _supports_prompt_cache_key(): gpt-4o / gpt-4.1 / gpt-5.x / o-series
- _supports_cache_retention(): gpt-5.x / o-series

prompt_cache_key routing (get_llm) and prompt_cache_retention (both the
default and BYOK OpenAI factories) now key off these helpers, so every
cache-capable model gets the feature regardless of point release.

Add regression tests: a renamed gpt-5 family model still gets retention
and cache_key routing; non-cache-capable models (e.g. Gemini) are
unaffected; gpt-5.1 stays retention-capable. Existing source-coupled
assertions updated to the capability-based wiring.
@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces three brittle exact-model-name gates for OpenAI prompt-cache features with two capability-based prefix helpers (_supports_prompt_cache_key / _supports_cache_retention), so entire model families are covered regardless of point-release renaming. Three new regression tests and updates to two existing test files accompany the change.

  • clients.py defines _CACHE_KEY_MODEL_PREFIXES and _CACHE_RETENTION_MODEL_PREFIXES tuples and replaces all three gate sites (_create_byok_client, _get_or_create_openai_llm, get_llm) with the new helpers; this also newly enables these features for the entire gpt-4o, gpt-4.1, and o-series families.
  • The test suite adds a stub-exec harness (_load_clients_namespace) and three focused tests covering a hypothetical renamed gpt-5 model, a non-cache-capable model, and get_llm cache-key binding.

Confidence Score: 4/5

Safe to merge; the capability-based helpers are correct for all current model names, and all three gate sites are updated consistently.

The core logic is sound: existing models (gpt-5.1, gpt-5.4, gpt-5.4-mini) retain their previous behavior, gpt-5.4 correctly gains the previously missing 24h retention, and new families (gpt-4o, gpt-4.1, o-series) are onboarded as documented. The o-series prefixes (o1, o3, o4) are still individually listed rather than covered by a single root, so a future o5 or o6 model would silently miss both features until manually added — the same class of forward-compatibility gap the PR addresses for the gpt-5 family.

backend/utils/llm/clients.py — the _CACHE_KEY_MODEL_PREFIXES and _CACHE_RETENTION_MODEL_PREFIXES tuples for the o-series entries.

Important Files Changed

Filename Overview
backend/utils/llm/clients.py Replaces exact-model-name gates with capability-based prefix helpers; all three cache-feature call sites updated correctly. O-series prefixes are still individually enumerated, which is partially brittle for future models.
backend/tests/unit/test_prompt_cache_integration.py Adds three new regression tests covering renamed gpt-5 models, non-cache-capable models, and get_llm cache-key binding; uses existing exec-based stub harness correctly.
backend/tests/unit/test_prompt_cache_optimization.py Updates source-level assertion from _CACHE_KEY_MODELS to _supports_prompt_cache_key; change is correct and consistent with the refactor.
backend/tests/unit/test_prompt_caching.py Replaces two duplicate gpt-5.1 regex tests with capability-based assertions; the prefix-tuple extraction test correctly validates that both gpt-5.1 and gpt-5.4 are retention-capable.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_llm / _create_byok_client / _get_or_create_openai_llm] --> B{_supports_cache_retention?}
    B -- Yes --> C[extra_body: prompt_cache_retention=24h]
    B -- No --> D[No retention header]
    A --> E{cache_key provided AND _supports_prompt_cache_key?}
    E -- Yes --> F[result.bind prompt_cache_key=cache_key]
    E -- No --> G[Return plain LLM]

    subgraph Prefixes
        H["_CACHE_RETENTION_MODEL_PREFIXES\n('gpt-5', 'o1', 'o3', 'o4')"]
        I["_CACHE_KEY_MODEL_PREFIXES\n('gpt-5', 'gpt-4.1', 'gpt-4o', 'o1', 'o3', 'o4')"]
    end

    B -.->|model.startswith| H
    E -.->|model.startswith| I
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[get_llm / _create_byok_client / _get_or_create_openai_llm] --> B{_supports_cache_retention?}
    B -- Yes --> C[extra_body: prompt_cache_retention=24h]
    B -- No --> D[No retention header]
    A --> E{cache_key provided AND _supports_prompt_cache_key?}
    E -- Yes --> F[result.bind prompt_cache_key=cache_key]
    E -- No --> G[Return plain LLM]

    subgraph Prefixes
        H["_CACHE_RETENTION_MODEL_PREFIXES\n('gpt-5', 'o1', 'o3', 'o4')"]
        I["_CACHE_KEY_MODEL_PREFIXES\n('gpt-5', 'gpt-4.1', 'gpt-4o', 'o1', 'o3', 'o4')"]
    end

    B -.->|model.startswith| H
    E -.->|model.startswith| I
Loading

Reviews (1): Last reviewed commit: "fix: gate prompt-cache features by capab..." | Re-trigger Greptile

Comment on lines +413 to +416
_CACHE_KEY_MODEL_PREFIXES = ('gpt-5', 'gpt-4.1', 'gpt-4o', 'o1', 'o3', 'o4')

# Family prefixes whose models support 24h prompt-cache retention.
_CACHE_RETENTION_MODEL_PREFIXES = ('gpt-5', 'o1', 'o3', 'o4')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 O-series prefixes remain individually enumerated

The o-series entries (o1, o3, o4) are still listed one-by-one — the same pattern this PR correctly fixes for the gpt-5 family. OpenAI skipped o2 entirely and has been shipping new o-series models (o1 → o3 → o4) at a steady pace; a future o5 or o6 model would silently receive neither prompt_cache_key routing nor prompt_cache_retention until someone manually adds the prefix. Consolidating to a single 'o' prefix may be too broad (non-OpenAI models), but using a narrow shared root like 'o1-'/'o3-' won't help either. A pragmatic middle ground would be to add a comment flagging this and pairing each new o-series release with a prefix update, or to derive the o-series check from a small set such as ('gpt-5', 'gpt-4.1', 'gpt-4o', 'o') with an additional digit guard.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@kodjima33 kodjima33 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend prompt-cache gating by capability — approve only, Nik's LLM area

@sumleo

sumleo commented Jun 18, 2026

Copy link
Copy Markdown
Author

Hi @josancamon19, gentle nudge on this when you have a moment. It's a small, self-contained prompt-caching fix, and I'm happy to rebase or tweak anything if that would make review easier. Thanks for the project and your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants