EmbedCache

Stop recomputing embeddings. Start shipping faster.

Website • Documentation • Skelf Research

EmbedCache is a Rust library and REST API that generates text embeddings locally and caches the results. No external API calls, no per-token billing, no rate limits. Just fast, local embeddings with 22+ models.

Why EmbedCache?

Building RAG apps, semantic search, or anything with embeddings? You've probably hit these problems:

Recomputing the same embeddings every time you restart your app
Paying for API calls to embed text you've already processed
Waiting on rate limits when you need to embed thousands of documents
Vendor lock-in to a specific embedding provider

EmbedCache fixes all of this. Embeddings are generated locally using FastEmbed and cached in SQLite. Process a URL once, get instant results forever.

Features

22+ embedding models - BGE, MiniLM, Nomic, E5 multilingual, and more
Local inference - No API keys, no costs, no rate limits
Automatic caching - SQLite-backed, survives restarts
LLM-powered chunking - Optional semantic chunking via Ollama/OpenAI
Dual interface - Use as a Rust library or REST API
Built-in docs - Swagger, ReDoc, RapiDoc, Scalar

Quick Start

As a Service

cargo install embedcache
embedcache

# Generate embeddings
curl -X POST http://localhost:8081/v1/embed \
  -H "Content-Type: application/json" \
  -d '{"text": ["Hello world", "Semantic search is cool"]}'

# Process a URL (fetches, chunks, embeds, caches)
curl -X POST http://localhost:8081/v1/process \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/article"}'

As a Library

[dependencies]
embedcache = "0.1"
tokio = { version = "1", features = ["full"] }

use embedcache::{FastEmbedder, Embedder};
use fastembed::{InitOptions, EmbeddingModel};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let embedder = FastEmbedder {
        options: InitOptions::new(EmbeddingModel::BGESmallENV15),
    };

    let texts = vec![
        "First document to embed".to_string(),
        "Second document to embed".to_string(),
    ];

    let embeddings = embedder.embed(&texts).await?;
    println!("Generated {} embeddings of {} dimensions",
             embeddings.len(), embeddings[0].len());
    Ok(())
}

API Endpoints

Endpoint	Method	Description
`/v1/embed`	POST	Generate embeddings for text array
`/v1/process`	POST	Fetch URL, chunk, embed, and cache
`/v1/params`	GET	List available models and chunkers

Interactive docs at /swagger, /redoc, /rapidoc, or /scalar.

Configuration

Create a .env file or set environment variables:

SERVER_HOST=127.0.0.1
SERVER_PORT=8081
DB_PATH=cache.db
ENABLED_MODELS=BGESmallENV15,AllMiniLML6V2

# Optional: LLM-powered chunking
LLM_PROVIDER=ollama
LLM_MODEL=llama3
LLM_BASE_URL=http://localhost:11434

Supported Models

Model	Dimensions	Use Case
`AllMiniLML6V2`	384	Fast, general purpose
`BGESmallENV15`	384	Best quality/speed balance
`BGEBaseENV15`	768	Higher quality
`BGELargeENV15`	1024	Highest quality
`MultilingualE5Base`	768	100+ languages

See all 22+ models →

Chunking Strategies

Strategy	Description
`words`	Split by whitespace (fast, always available)
`llm-concept`	LLM identifies semantic boundaries
`llm-introspection`	LLM analyzes then chunks (highest quality)

Custom Chunkers

Implement the ContentChunker trait:

use embedcache::ContentChunker;
use async_trait::async_trait;

struct SentenceChunker;

#[async_trait]
impl ContentChunker for SentenceChunker {
    async fn chunk(&self, content: &str, _size: usize) -> Vec<String> {
        content.split(". ")
            .map(|s| s.to_string())
            .collect()
    }

    fn name(&self) -> &str { "sentences" }
}

Performance

First request: ~100-500ms (model loading)
Subsequent requests: ~10-50ms per text
Cache hits: <5ms

Memory usage depends on enabled models (~200MB-800MB each).

Documentation

Build docs locally:

cd documentation
pip install -r requirements.txt
mkdocs serve

Project Structure

src/
├── chunking/          # Text chunking (word, LLM-based)
├── embedding/         # Embedding generation (FastEmbed)
├── handlers/          # HTTP endpoints
├── cache/             # SQLite caching
├── models/            # Data types
└── utils/             # Hash generation, URL fetching

Contributing

git clone https://github.com/skelfresearch/embedcache
cd embedcache
cargo build
cargo test

PRs welcome. Please open an issue first for major changes.

License

GPL-3.0. See LICENSE.

Links

Built by Skelf Research with FastEmbed and Actix-web.

Part of Skelf Research

embedcache is built by Skelf Research — an independent UK AI research lab publishing production-grade open-source projects.

🌐 Website · 📚 Documentation · 🔬 All projects · 🤗 Hugging Face

Related projects: memista (vector search for Rust) · polymathy (answer-engine service) · slorg (search that thinks first)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
documentation		documentation
examples		examples
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
sample.env		sample.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EmbedCache

Why EmbedCache?

Features

Quick Start

As a Service

As a Library

API Endpoints

Configuration

Supported Models

Chunking Strategies

Custom Chunkers

Performance

Documentation

Project Structure

Contributing

License

Links

Part of Skelf Research

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

EmbedCache

Why EmbedCache?

Features

Quick Start

As a Service

As a Library

API Endpoints

Configuration

Supported Models

Chunking Strategies

Custom Chunkers

Performance

Documentation

Project Structure

Contributing

License

Links

Part of Skelf Research

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages