prefix-cache

Here are 9 public repositories matching this topic...

jjang-ai / vmlx

vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!

macbook persistent-memory mlx openai-api llm lmstudio anthropic-api mcp-server kvcache-optimization kvcache-compression openclaw kvcache-reuse openclaw-agent prefix-cache mlxllm mlxstudio vmlx omlx omlx-alternative

Updated Jun 10, 2026
Python

Venkat2811 / wombatkv

Star

Object-storage-native KV cache for LLM inference & RL. Cross-restart, cross-conversation, cross-engine via shared S3 bucket.

caching machine-learning metal amd s3 inference pytorch nvidia object-storage ds4 kv-cache llm vllm sglang prefix-cache

Updated Jun 8, 2026
Rust

DevWhale —— AI 驱动桌面开发工作台。深度契合Deepseek V4,做了针对性缓存优化。Electron + React + TypeScript，流式 Agent对话、Monaco 编辑器、xterm.js 终端、60+ 文件格式多模态输入、多模型切换。DevWhale — AI desktop dev workbench. Electron + React + TS. Streaming Agent chat,Monaco editor, xterm terminal, 60+ file formats, multi-model support.

electron monaco-editor deepseek coding-agents tool-calling deepseek-api prefix-cache deepseek-v4-pro

Updated Jun 6, 2026
TypeScript

jianzhichun / permafrost

Star

Freeze Claude Code's prompt prefix so DeepSeek's automatic cache always hits — alignment proxy + coalescing + keepalive, installable as a CC plugin. Measured 64% cheaper on real Claude Code traffic.

proxy cost-optimization cache-optimization llm deepseek claude-code claude-code-plugin prompt-cache prefix-cache

Updated Jun 10, 2026
Python

qujing226 / mini-llm-serve

Star

A Go-based LLM serving control plane that models token-aware scheduling, request lifecycle, streaming metrics, prefix cache, and KV block pressure around mock inference backends.

scheduler distributed-computing inference golang-server mlsys ai-infra dynamic-batching llm-serving llm-inference connectrpc prefix-cache prefill-decode

Updated Jun 10, 2026
Go

weksbwrx62862 / deepseek-cache-optimizer

Star

DeepSeek缓存优化器 v1.1 — Reasonix四支柱 + 语义压缩 (命中率+30%)

mimo cache-optimization ai-agent cost-reduction deepseek hermes-agent prefix-cache hermes-plugin

Updated May 26, 2026
Python

armanas / BCR-memory-2

Star

Correctness-fixed Rust/PyO3 flat-array DFA prefix cache — rewrite of BCR-memory v1 with regression tests for four bugs and an SGLang/vLLM head-to-head harness.

rust prefix-trie pyo3 kv-cache llm vllm sglang prefix-cache

Updated Apr 17, 2026
Python

huiliyi37 / Tianshu

Star

Terminal coding agent — intelligent context management, multi-model coordination, DeepSeek V4 prefix cache optimization. TypeScript + Ink 6 TUI.

typescript terminal tui ink ai-agent deepseek context-management coding-agent prefix-cache

Updated Jun 9, 2026
TypeScript

vltech55 / bastion-gateway

Star

Production LLM gateway: OpenAI-compatible API in front of OpenAI, Anthropic, and Bedrock. Ordered-fallback routing with per-provider circuit breakers, Redis prefix cache, Prometheus + Grafana, Kubernetes.

kubernetes grafana routing fallback prometheus bedrock circuit-breaker observability openai-proxy openai-compatible-api llm-gateway anthropic-proxy prefix-cache

Updated Jun 9, 2026
Python

Improve this page

Add a description, image, and links to the prefix-cache topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the prefix-cache topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prefix-cache

Here are 9 public repositories matching this topic...

jjang-ai / vmlx

Venkat2811 / wombatkv

tzz123-hub / DevWhale

jianzhichun / permafrost

qujing226 / mini-llm-serve

weksbwrx62862 / deepseek-cache-optimizer

armanas / BCR-memory-2

huiliyi37 / Tianshu

vltech55 / bastion-gateway

Improve this page

Add this topic to your repo