Tiny OpenAI-compatible /v1/embeddings HTTP API, backed by fastembed. Drop-in for any client that already speaks the OpenAI embeddings API; runs the model locally with no per-call cost.
uv sync
uv run embed-server # listens on 127.0.0.1:8001Environment variables: EMBED_HOST (default 127.0.0.1), EMBED_PORT (default 8001).
import openai
client = openai.OpenAI(base_url="http://127.0.0.1:8001/v1", api_key="local")
r = client.embeddings.create(model="jinaai/jina-embeddings-v2-base-es", input="hola")
print(len(r.data[0].embedding)) # 768Or with curl:
curl -s http://127.0.0.1:8001/v1/embeddings \
-H 'Content-Type: application/json' \
-d '{"model":"jinaai/jina-embeddings-v2-base-es","input":"hola"}' | jq '.data[0].embedding | length'Models are loaded on first request and cached in memory. The first call to a model triggers a download into ~/.cache/fastembed; subsequent calls are fast.
See the fastembed supported models list for valid model values.
| Method | Path | Notes |
|---|---|---|
POST |
/v1/embeddings |
OpenAI-compatible request/response |
GET |
/health |
{"status":"ok","models_loaded":[...],"ts":...} |
See deploy/embed-server.service. Install:
sudo cp deploy/embed-server.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now embed-serverEdit WorkingDirectory, User, and the Environment=PATH in the unit to match the host. The unit assumes uv is on PATH and the repo is checked out at WorkingDirectory.
MIT