Vector compression with TurboQuant codecs for embeddings, retrieval, and KV-cache. 10x compression, pure NumPy core — optional GPU acceleration via PyTorch (CUDA/MPS) or MLX (Metal).
-
Updated
Apr 1, 2026 - Python
Vector compression with TurboQuant codecs for embeddings, retrieval, and KV-cache. 10x compression, pure NumPy core — optional GPU acceleration via PyTorch (CUDA/MPS) or MLX (Metal).
Compress embeddings, retrieval vectors, and KV-cache with TurboQuant codecs for 10x smaller storage and NumPy-first AI workloads
Compress MLX KV cache on Apple Silicon with TurboQuant mixed-precision and fused Metal kernels for lower memory use and fast decode
Add a description, image, and links to the semafold topic page so that developers can more easily learn about it.
To associate your repository with the semafold topic, visit your repo's landing page and select "manage topics."