Stop recomputing embeddings. Start shipping faster.
Website • Documentation • Skelf Research
EmbedCache is a Rust library and REST API that generates text embeddings locally and caches the results. No external API calls, no per-token billing, no rate limits. Just fast, local embeddings with 22+ models.
Building RAG apps, semantic search, or anything with embeddings? You've probably hit these problems:
- Recomputing the same embeddings every time you restart your app
- Paying for API calls to embed text you've already processed
- Waiting on rate limits when you need to embed thousands of documents
- Vendor lock-in to a specific embedding provider
EmbedCache fixes all of this. Embeddings are generated locally using FastEmbed and cached in SQLite. Process a URL once, get instant results forever.
- 22+ embedding models - BGE, MiniLM, Nomic, E5 multilingual, and more
- Local inference - No API keys, no costs, no rate limits
- Automatic caching - SQLite-backed, survives restarts
- LLM-powered chunking - Optional semantic chunking via Ollama/OpenAI
- Dual interface - Use as a Rust library or REST API
- Built-in docs - Swagger, ReDoc, RapiDoc, Scalar
cargo install embedcache
embedcache# Generate embeddings
curl -X POST http://localhost:8081/v1/embed \
-H "Content-Type: application/json" \
-d '{"text": ["Hello world", "Semantic search is cool"]}'
# Process a URL (fetches, chunks, embeds, caches)
curl -X POST http://localhost:8081/v1/process \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/article"}'[dependencies]
embedcache = "0.1"
tokio = { version = "1", features = ["full"] }use embedcache::{FastEmbedder, Embedder};
use fastembed::{InitOptions, EmbeddingModel};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let embedder = FastEmbedder {
options: InitOptions::new(EmbeddingModel::BGESmallENV15),
};
let texts = vec![
"First document to embed".to_string(),
"Second document to embed".to_string(),
];
let embeddings = embedder.embed(&texts).await?;
println!("Generated {} embeddings of {} dimensions",
embeddings.len(), embeddings[0].len());
Ok(())
}| Endpoint | Method | Description |
|---|---|---|
/v1/embed |
POST | Generate embeddings for text array |
/v1/process |
POST | Fetch URL, chunk, embed, and cache |
/v1/params |
GET | List available models and chunkers |
Interactive docs at /swagger, /redoc, /rapidoc, or /scalar.
Create a .env file or set environment variables:
SERVER_HOST=127.0.0.1
SERVER_PORT=8081
DB_PATH=cache.db
ENABLED_MODELS=BGESmallENV15,AllMiniLML6V2
# Optional: LLM-powered chunking
LLM_PROVIDER=ollama
LLM_MODEL=llama3
LLM_BASE_URL=http://localhost:11434| Model | Dimensions | Use Case |
|---|---|---|
AllMiniLML6V2 |
384 | Fast, general purpose |
BGESmallENV15 |
384 | Best quality/speed balance |
BGEBaseENV15 |
768 | Higher quality |
BGELargeENV15 |
1024 | Highest quality |
MultilingualE5Base |
768 | 100+ languages |
| Strategy | Description |
|---|---|
words |
Split by whitespace (fast, always available) |
llm-concept |
LLM identifies semantic boundaries |
llm-introspection |
LLM analyzes then chunks (highest quality) |
Implement the ContentChunker trait:
use embedcache::ContentChunker;
use async_trait::async_trait;
struct SentenceChunker;
#[async_trait]
impl ContentChunker for SentenceChunker {
async fn chunk(&self, content: &str, _size: usize) -> Vec<String> {
content.split(". ")
.map(|s| s.to_string())
.collect()
}
fn name(&self) -> &str { "sentences" }
}- First request: ~100-500ms (model loading)
- Subsequent requests: ~10-50ms per text
- Cache hits: <5ms
Memory usage depends on enabled models (~200MB-800MB each).
Build docs locally:
cd documentation
pip install -r requirements.txt
mkdocs servesrc/
├── chunking/ # Text chunking (word, LLM-based)
├── embedding/ # Embedding generation (FastEmbed)
├── handlers/ # HTTP endpoints
├── cache/ # SQLite caching
├── models/ # Data types
└── utils/ # Hash generation, URL fetching
git clone https://github.com/skelfresearch/embedcache
cd embedcache
cargo build
cargo testPRs welcome. Please open an issue first for major changes.
GPL-3.0. See LICENSE.
Built by Skelf Research with FastEmbed and Actix-web.
embedcache is built by Skelf Research — an independent UK AI research lab publishing production-grade open-source projects.
🌐 Website · 📚 Documentation · 🔬 All projects · 🤗 Hugging Face
Related projects: memista (vector search for Rust) · polymathy (answer-engine service) · slorg (search that thinks first)
Released under MIT / Apache-2.0. © Skelf Research Limited.