fix: gate prompt-cache features by capability, not exact model name by sumleo · Pull Request #7985 · BasedHardware/omi

sumleo · 2026-06-17T01:05:18Z

OpenAI prompt-cache features in backend/utils/llm/clients.py were gated by exact model names:

prompt_cache_key routing was limited to _CACHE_KEY_MODELS = {'gpt-5.4', 'gpt-5.4-mini'} (in get_llm).
prompt_cache_retention: "24h" was gated by model == 'gpt-5.1' (in both _get_or_create_openai_llm and _create_byok_client).

These exact-name gates are brittle: they silently stop applying the moment a model is renamed or a new family member ships, and they are already inconsistent with OpenAI's matrix (e.g. gpt-5.4 is 24h-retention-eligible but never received retention under the old gpt-5.1-only check).

Fix

Detect the capability by model family instead of matching exact names:

_supports_prompt_cache_key(model) — gpt-4o / gpt-4.1 / gpt-5.x / o-series
_supports_cache_retention(model) — gpt-5.x / o-series

All three gate sites now use these helpers, so every cache-capable model gets the feature regardless of point release. Behavior for existing models is preserved (gpt-5.1 still gets 24h retention; gpt-5.4 / gpt-5.4-mini still get prompt_cache_key), and gpt-5.4 now also correctly receives retention.

Tests

Added regression coverage in tests/unit/test_prompt_cache_integration.py:

test_renamed_gpt5_model_still_gets_cache_features — a future/renamed gpt-5 family model still gets prompt_cache_retention=24h and is eligible for prompt_cache_key.
test_non_cache_capable_model_is_unchanged — Gemini-style names get neither; gpt-4.1-mini gets routing but not 24h retention.
test_get_llm_binds_cache_key_for_cache_capable_models — get_llm binds prompt_cache_key for cache-capable models.

Existing source-coupled assertions in test_prompt_caching.py / test_prompt_cache_optimization.py were updated to validate the capability-based wiring (including a check that gpt-5.1 stays retention-capable).

Ran locally (targeted, via the existing stubbed unit-test harness):

pytest tests/unit/test_prompt_cache_optimization.py \
       tests/unit/test_prompt_caching.py \
       tests/unit/test_prompt_cache_integration.py
# 53 passed

Files formatted with black --line-length 120 --skip-string-normalization.

prompt_cache_key was gated by an exact-name set (_CACHE_KEY_MODELS = {'gpt-5.4', 'gpt-5.4-mini'}) and prompt_cache_retention='24h' was gated by 'model == gpt-5.1'. Both are brittle: they silently stop applying the moment a model is renamed or a new family member ships, and they are already inconsistent with OpenAI's matrix (e.g. gpt-5.4 is 24h-eligible but never received retention). Replace the exact-name gates with capability detection by model family: - _supports_prompt_cache_key(): gpt-4o / gpt-4.1 / gpt-5.x / o-series - _supports_cache_retention(): gpt-5.x / o-series prompt_cache_key routing (get_llm) and prompt_cache_retention (both the default and BYOK OpenAI factories) now key off these helpers, so every cache-capable model gets the feature regardless of point release. Add regression tests: a renamed gpt-5 family model still gets retention and cache_key routing; non-cache-capable models (e.g. Gemini) are unaffected; gpt-5.1 stays retention-capable. Existing source-coupled assertions updated to the capability-based wiring.

greptile-apps · 2026-06-17T01:15:01Z

Greptile Summary

This PR replaces three brittle exact-model-name gates for OpenAI prompt-cache features with two capability-based prefix helpers (_supports_prompt_cache_key / _supports_cache_retention), so entire model families are covered regardless of point-release renaming. Three new regression tests and updates to two existing test files accompany the change.

clients.py defines _CACHE_KEY_MODEL_PREFIXES and _CACHE_RETENTION_MODEL_PREFIXES tuples and replaces all three gate sites (_create_byok_client, _get_or_create_openai_llm, get_llm) with the new helpers; this also newly enables these features for the entire gpt-4o, gpt-4.1, and o-series families.
The test suite adds a stub-exec harness (_load_clients_namespace) and three focused tests covering a hypothetical renamed gpt-5 model, a non-cache-capable model, and get_llm cache-key binding.

Confidence Score: 4/5

Safe to merge; the capability-based helpers are correct for all current model names, and all three gate sites are updated consistently.

The core logic is sound: existing models (gpt-5.1, gpt-5.4, gpt-5.4-mini) retain their previous behavior, gpt-5.4 correctly gains the previously missing 24h retention, and new families (gpt-4o, gpt-4.1, o-series) are onboarded as documented. The o-series prefixes (o1, o3, o4) are still individually listed rather than covered by a single root, so a future o5 or o6 model would silently miss both features until manually added — the same class of forward-compatibility gap the PR addresses for the gpt-5 family.

backend/utils/llm/clients.py — the _CACHE_KEY_MODEL_PREFIXES and _CACHE_RETENTION_MODEL_PREFIXES tuples for the o-series entries.

Important Files Changed

Filename	Overview
backend/utils/llm/clients.py	Replaces exact-model-name gates with capability-based prefix helpers; all three cache-feature call sites updated correctly. O-series prefixes are still individually enumerated, which is partially brittle for future models.
backend/tests/unit/test_prompt_cache_integration.py	Adds three new regression tests covering renamed gpt-5 models, non-cache-capable models, and get_llm cache-key binding; uses existing exec-based stub harness correctly.
backend/tests/unit/test_prompt_cache_optimization.py	Updates source-level assertion from _CACHE_KEY_MODELS to _supports_prompt_cache_key; change is correct and consistent with the refactor.
backend/tests/unit/test_prompt_caching.py	Replaces two duplicate gpt-5.1 regex tests with capability-based assertions; the prefix-tuple extraction test correctly validates that both gpt-5.1 and gpt-5.4 are retention-capable.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_llm / _create_byok_client / _get_or_create_openai_llm] --> B{_supports_cache_retention?}
    B -- Yes --> C[extra_body: prompt_cache_retention=24h]
    B -- No --> D[No retention header]
    A --> E{cache_key provided AND _supports_prompt_cache_key?}
    E -- Yes --> F[result.bind prompt_cache_key=cache_key]
    E -- No --> G[Return plain LLM]

    subgraph Prefixes
        H["_CACHE_RETENTION_MODEL_PREFIXES\n('gpt-5', 'o1', 'o3', 'o4')"]
        I["_CACHE_KEY_MODEL_PREFIXES\n('gpt-5', 'gpt-4.1', 'gpt-4o', 'o1', 'o3', 'o4')"]
    end

    B -.->|model.startswith| H
    E -.->|model.startswith| I

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[get_llm / _create_byok_client / _get_or_create_openai_llm] --> B{_supports_cache_retention?}
    B -- Yes --> C[extra_body: prompt_cache_retention=24h]
    B -- No --> D[No retention header]
    A --> E{cache_key provided AND _supports_prompt_cache_key?}
    E -- Yes --> F[result.bind prompt_cache_key=cache_key]
    E -- No --> G[Return plain LLM]

    subgraph Prefixes
        H["_CACHE_RETENTION_MODEL_PREFIXES\n('gpt-5', 'o1', 'o3', 'o4')"]
        I["_CACHE_KEY_MODEL_PREFIXES\n('gpt-5', 'gpt-4.1', 'gpt-4o', 'o1', 'o3', 'o4')"]
    end

    B -.->|model.startswith| H
    E -.->|model.startswith| I

_{Reviews (1): Last reviewed commit: "fix: gate prompt-cache features by capab..." | Re-trigger Greptile}

greptile-apps · 2026-06-17T01:15:04Z

+_CACHE_KEY_MODEL_PREFIXES = ('gpt-5', 'gpt-4.1', 'gpt-4o', 'o1', 'o3', 'o4')
+
+# Family prefixes whose models support 24h prompt-cache retention.
+_CACHE_RETENTION_MODEL_PREFIXES = ('gpt-5', 'o1', 'o3', 'o4')


O-series prefixes remain individually enumerated

The o-series entries (o1, o3, o4) are still listed one-by-one — the same pattern this PR correctly fixes for the gpt-5 family. OpenAI skipped o2 entirely and has been shipping new o-series models (o1 → o3 → o4) at a steady pace; a future o5 or o6 model would silently receive neither prompt_cache_key routing nor prompt_cache_retention until someone manually adds the prefix. Consolidating to a single 'o' prefix may be too broad (non-OpenAI models), but using a narrow shared root like 'o1-'/'o3-' won't help either. A pragmatic middle ground would be to add a comment flagging this and pairing each new o-series release with a prefix update, or to derive the o-series check from a small set such as ('gpt-5', 'gpt-4.1', 'gpt-4o', 'o') with an additional digit guard.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

kodjima33

Backend prompt-cache gating by capability — approve only, Nik's LLM area

sumleo · 2026-06-18T03:09:40Z

Hi @josancamon19, gentle nudge on this when you have a moment. It's a small, self-contained prompt-caching fix, and I'm happy to rebase or tweak anything if that would make review easier. Thanks for the project and your time!

greptile-apps Bot reviewed Jun 17, 2026

View reviewed changes

kodjima33 approved these changes Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: gate prompt-cache features by capability, not exact model name#7985

fix: gate prompt-cache features by capability, not exact model name#7985
sumleo wants to merge 1 commit into
BasedHardware:mainfrom
sumleo:fix/cache-feature-capability-gate

sumleo commented Jun 17, 2026

Uh oh!

greptile-apps Bot commented Jun 17, 2026

Uh oh!

greptile-apps Bot Jun 17, 2026

Uh oh!

kodjima33 left a comment

Uh oh!

sumleo commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sumleo commented Jun 17, 2026

Fix

Tests

Uh oh!

greptile-apps Bot commented Jun 17, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

kodjima33 left a comment

Choose a reason for hiding this comment

Uh oh!

sumleo commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants