A high-performance, self-hostable translation API compatible with Google Cloud Translate. Built on Meta's NLLB-200 and optimized with DeepSpeed for efficient GPU inference.
This project provides a robust, private, and cost-effective alternative to commercial translation APIs.
- π° Cost Efficiency: Run on your own GPU infrastructure. Ideal for high-volume translation tasks.
- π Data Privacy: No external API calls mean your content never leaves your control.
- π Drop-in Compatibility: Implements the standard
POST /language/translate/v2API surface. Switch existing applications simply by changing the base URL. - π Advanced Models: Leverages Meta's NLLB-200 (No Language Left Behind), supporting 200+ languages.
- π High Performance: Optimized for throughput with DeepSpeed and Tensor Parallelism, capable of handling heavy concurrent loads.
Designed to work with existing Google Cloud Translate client libraries and integrations.
Before:
https://translation.googleapis.com/language/translate/v2
After:
http://localhost:8000/language/translate/v2
This command launches the API on port 8000 using the 600M distilled model.
docker pull ivanvmoreno/open-translate:latest
docker run --gpus all -p 8000:8000 \
-e NLLB_MODEL_SIZE=600M \
-e DTYPE=fp16 \
ivanvmoreno/open-translate:latestNote: The first run downloads the model weights, which may take some time depending on your internet speed.
Compatible with Google Cloud Translation API v2.
POST /language/translate/v2
Single Translation:
curl -X POST "http://localhost:8000/language/translate/v2" \
-H "Content-Type: application/json" \
-d '{
"q": "Hello world!",
"target": "es"
}'Batch Translation: Send arrays of strings to maximize GPU throughput.
curl -X POST "http://localhost:8000/language/translate/v2" \
-H "Content-Type: application/json" \
-d '{
"q": ["Hello world!", "Self hosting rulez"],
"target": "fr",
"source": "en",
"max_new_tokens": 128
}'POST /language/translate/v2/detect
curl -X POST "http://localhost:8000/language/translate/v2/detect" \
-H "Content-Type: application/json" \
-d '{"q": "Hola mundo"}'GET /language/translate/v2/languages
curl "http://localhost:8000/language/translate/v2/languages"| Variable | Default | Description |
|---|---|---|
NLLB_MODEL_SIZE |
1.3B-distilled |
Model size: 600M, 600M-distilled, 1.3B, 1.3B-distilled, or 3.3B |
NLLB_MODEL_ID |
(None) | HF model override |
TP_SIZE |
auto |
Tensor Parallel size |
DTYPE |
fp16 |
fp16, bf16, or fp32 |
MAX_BATCH_SIZE |
32 |
Max sentences processed in parallel |
HOST |
0.0.0.0 |
Bind host |
PORT |
8000 |
Bind port |
We support standard ISO 639-1 (e.g., es, en) and BCP-47 (e.g., zh-TW, pt-BR) codes, automatically mapping them to NLLB's internal representation.
For a full list of over 200 supported languages and their codes, see LANGUAGES.md.
| Model Size | FP16 / BF16 | FP32 |
|---|---|---|
600M / 600M-distilled |
~3 GB | ~5 GB |
1.3B / 1.3B-distilled |
~5 GB | ~9 GB |
3.3B |
~9 GB | ~15 GB |