A small PHP library for caching LLM responses so you don't keep paying to get the same answer back. It does two things: exact match caching (hash the prompt, look it up) and semantic caching (compare embeddings when the exact match misses).
$reply = PromptCache::remember($prompt, function () use ($client, $prompt) {
return $client->chat($prompt);
});First call runs the closure. Second call gives you the saved reply. That's it.
I had a side project that talked to OpenAI a lot, and the bill was
getting silly considering most of the prompts were variations of the
same handful of questions. I wanted something like Cache::remember()
but for LLM stuff. I couldn't find anything that wasn't tied to a
specific SDK or buried inside some big agent framework, so I made one.
- PHP 8.2 or newer
- PDO with the sqlite driver (for the default storage)
- predis/predis if you want to use Redis instead
- cURL if you use the OpenAI or Ollama embedding providers
composer require prompt-cache/prompt-cacheThe first time you call it, it creates a sqlite file at
storage/prompt-cache.sqlite and you're done. No config to write
unless you want to.
use PromptCache\PromptCache;
$prompt = 'Summarise this article in three bullet points: ...';
$summary = PromptCache::remember($prompt, function () use ($prompt) {
return $myOpenAi->chat($prompt); // or anthropic, mistral, whatever
});The prompt gets normalised (extra whitespace flattened, timestamps and UUIDs replaced with placeholders) before it's hashed, so two prompts that only differ by formatting still hit the same cache row.
When the exact hash doesn't match, semantic() looks at previously
stored prompts and picks the closest one. If the similarity is above
the threshold (0.92 by default), you get the old answer back.
$reply = PromptCache::semantic(
$prompt,
fn () => $client->chat($prompt)
);Tighten or loosen the threshold per call if you want:
$reply = PromptCache::semantic(
$prompt,
fn () => $client->chat($prompt),
0.88
);I deliberately didn't depend on any LLM SDK. You hand me a closure, I either run it or I don't. Use whatever client you like.
For streaming responses you get a Generator back. First call streams from upstream while quietly stitching the chunks together for the cache. Second call replays from the cache, same chunked interface, no upstream traffic.
foreach (PromptCache::stream($prompt, fn () => $client->stream($prompt)) as $chunk) {
echo $chunk;
}I added counters mostly because I wanted to see for myself how much
I was actually saving. There's a stats() method that gives you a
running total:
print_r(PromptCache::stats());
/*
Array (
[requests] => 1200
[exact_hits] => 400
[semantic_hits] => 300
[misses] => 500
[tokens_saved] => 2838282
[estimated_usd_saved] => 482.22
)
*/The token count is a back-of-the-envelope figure (4 chars per token), not a real tokeniser, so treat the dollar number as a rough sanity check, not a billing reconciliation.
If you want to see what it's actually doing:
PromptCache::debug(true);You'll see hits, misses, similarity scores and embedding timings
written to STDERR (or error_log() outside of CLI). Or pass a
callable and route the events yourself:
PromptCache::debug(function ($event, $data) {
Log::info("prompt-cache.$event", $data);
});Three options shipped. Pick whichever fits.
| Driver | Class | When to use it |
|---|---|---|
| SQLite | PromptCache\Stores\SqliteStore |
Default. No setup. Good for most apps. |
| File | PromptCache\Stores\FileStore |
One JSON file. Handy for shipping a warm cache. |
| Redis | PromptCache\Stores\RedisStore |
When you have multiple workers sharing a cache. |
Want a different backend? Implement PromptCache\Contracts\Store. It
has eight methods, none of them surprising.
| Provider | Class | Notes |
|---|---|---|
| Null | PromptCache\Embeddings\NullEmbeddingProvider |
Local CRC32 thing. No keys needed. Rough quality. |
| OpenAI | PromptCache\Embeddings\OpenAIEmbeddingProvider |
Uses text-embedding-3-small by default. |
| Ollama | PromptCache\Embeddings\OllamaEmbeddingProvider |
Hits a local Ollama server. |
The Null provider exists so the semantic API works out of the box without anyone having to wire up an API key. It's not great. Use one of the real ones when it matters.
The service provider auto-discovers. If you want to tweak settings, publish the config:
php artisan vendor:publish --tag=prompt-cache-configThen use the facade wherever:
use PromptCache;
$reply = PromptCache::semantic($prompt, fn () => $client->chat($prompt));Or set things from .env:
PROMPT_CACHE_DRIVER=redis
PROMPT_CACHE_EMBEDDINGS=openai
OPENAI_API_KEY=sk-...
PROMPT_CACHE_THRESHOLD=0.9
There's a folder of runnable scripts under examples/:
01_exact_cache.php- the most basic case02_semantic_cache.php- rephrased prompt that still hits03_streaming.php- stream, cache, replay04_stats_and_debug.php- counters and the debug logger05_openai_real.php- real OpenAI embeddings, needsOPENAI_API_KEY
Run them with php examples/01_exact_cache.php from the package
root. They use a tiny autoloader so they work without composer install.
composer install
vendor/bin/pestThe suite covers the exact cache, semantic cache, cosine similarity, streaming, sqlite persistence, the file store, the normaliser and the stats counters.
It's a cache. There's no agent framework here, no RAG helpers, no chain-of-thought thing. Bring your own LLM library and put this in front of it.
MIT. See LICENSE.