vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!
-
Updated
Jun 10, 2026 - Python
vMLX - JANGTQ Uber Compressed MLX Models - L2 Disk Cache (survives restart) + L1 Paged (super fast ttft) + Hybrid SSM Scheduler + Cont Batching + etc!
Object-storage-native KV cache for LLM inference & RL. Cross-restart, cross-conversation, cross-engine via shared S3 bucket.
DevWhale —— AI 驱动桌面开发工作台。深度契合Deepseek V4,做了针对性缓存优化。Electron + React + TypeScript,流式 Agent对话、Monaco 编辑器、xterm.js 终端、60+ 文件格式多模态输入、多模型切换。DevWhale — AI desktop dev workbench. Electron + React + TS. Streaming Agent chat,Monaco editor, xterm terminal, 60+ file formats, multi-model support.
Freeze Claude Code's prompt prefix so DeepSeek's automatic cache always hits — alignment proxy + coalescing + keepalive, installable as a CC plugin. Measured 64% cheaper on real Claude Code traffic.
A Go-based LLM serving control plane that models token-aware scheduling, request lifecycle, streaming metrics, prefix cache, and KV block pressure around mock inference backends.
DeepSeek缓存优化器 v1.1 — Reasonix四支柱 + 语义压缩 (命中率+30%)
Correctness-fixed Rust/PyO3 flat-array DFA prefix cache — rewrite of BCR-memory v1 with regression tests for four bugs and an SGLang/vLLM head-to-head harness.
Terminal coding agent — intelligent context management, multi-model coordination, DeepSeek V4 prefix cache optimization. TypeScript + Ink 6 TUI.
Production LLM gateway: OpenAI-compatible API in front of OpenAI, Anthropic, and Bedrock. Ordered-fallback routing with per-provider circuit breakers, Redis prefix cache, Prometheus + Grafana, Kubernetes.
Add a description, image, and links to the prefix-cache topic page so that developers can more easily learn about it.
To associate your repository with the prefix-cache topic, visit your repo's landing page and select "manage topics."