A self-hosted personal search engine. Point it at your files, email, chats and scanned documents, and get one search box that finds anything across all of them. Highlighted snippets, hybrid ranking, and optional semantic search.
- One search box, many sources. Filesystem, IMAP email, Telegram, Paperless-ngx — all queried together with per-source filters, highlighted snippets, and source-aware ranking.
- Hybrid search. BM25 full-text retrieval and optional dense-vector retrieval are merged with reciprocal rank fusion (RRF), then re-scored by a dedicated reranker. Works in plain BM25 mode out of the box; light up the rest by configuring an embedding provider in Settings.
- Ask: grounded answers, not just links. A chat mode that answers questions from your own data with inline citations back to the source, multi-turn memory, agentic follow-up search, and multi-modal reading of images and PDFs. Bring your own model — Anthropic, OpenAI, or local Ollama. See Ask: grounded answers.
- Plug-in connectors. Each source is a Go package that implements
Connector(fetch + cursor-based incremental sync). Adding a new one is a day's work. - Scheduled syncs. Cron-backed scheduler runs connectors automatically; every sync is recorded with progress streamed live over Server-Sent Events and a cancellation hook.
- Conversation browser. Telegram and IMAP threads are indexed as conversation windows instead of one-message-per-document, so embeddings actually have context. A dedicated chat-style viewer lets you jump around inside a thread after a hit.
- Multi-user, auth-scoped. Username + bcrypt password, JWT sessions, two roles (admin/user). Every search result is scoped to the calling user — shared connectors (e.g. the family NAS) are visible to everyone, personal connectors (your email) are not.
- Single container to run. One binary serves the React frontend and the API. The only hard dependencies are Postgres and OpenSearch.
Requirements: Docker Engine 24+ with the Compose plugin. That's it.
git clone https://github.com/yasen-pavlov/nexus.git
cd nexus
cp .env.example .env
# Generate the two required secrets and drop them into .env
echo "NEXUS_ENCRYPTION_KEY=$(openssl rand -hex 32)" >> .env
echo "NEXUS_JWT_SECRET=$(openssl rand -base64 48)" >> .env
docker compose --profile app up -dOpen http://localhost:8080. The first account you create becomes the admin.
A bundled testdata/ directory is mounted read-only as the default data source
so you have something to search right away. Point NEXUS_DATA_PATH at your
real files once you're ready.
By default Nexus runs in BM25-only mode. To enable semantic search without sending anything to a cloud provider:
docker compose --profile app --profile ollama up -dThen open Settings → Embeddings and pick Ollama. You can also point it at
OpenAI, Voyage, or Cohere — the same UI, just a different provider.
| Connector | What it indexes | Incremental sync |
|---|---|---|
| Filesystem | Any directory tree. Text, markdown, PDF, Office docs, images (OCR via Tika). | Mtime + content hash |
| IMAP | Any IMAP mailbox (iCloud, Gmail app passwords, Fastmail…). Bodies are cleaned — tracking redirects and RFC 3676 signatures are dropped before embedding. | UIDNEXT + UID cursor |
| Telegram | Private chats, groups, and channels you're a member of. Messages are grouped into 30-minute conversation windows for richer embeddings. Attachments download to the local binary cache. | Last seen message ID per chat |
| Paperless-ngx | Scanned documents and OCR text from your Paperless instance. | modified__gt timestamp cursor |
All credentials are encrypted with AES-256-GCM using NEXUS_ENCRYPTION_KEY
before they touch the database.
A query goes through three stages:
- Retrieval. Both BM25 (OpenSearch, with per-language analyzers for English, Bulgarian and German) and dense vectors (if an embedding provider is configured) run in parallel.
- Fusion. Reciprocal rank fusion merges the two ranked lists. RRF is pure rank math, not a relevance score — nothing is filtered here.
- Reranking. Top candidates are deduped and sent to a cross-encoder
(Voyage
rerank-2or Coherererank-3). The reranker returns a calibrated relevance score, which is filterable — results below the floor (default 0.12) are dropped.
On top of that, a source-aware scoring layer applies:
- a half-life per source (email decays fast, filesystem slowly),
- a recency floor so old documents stop bleeding score forever, and
- a trust weight per source (Paperless > IMAP > Telegram by default).
All three constants live next to the connector definition, so adding a new source requires one change.
Search gives you a ranked list of sources. Ask gives you an answer — generated from those same sources and showing its work.
- Grounded + cited. Every answer is written from chunks retrieved out of your index, with inline citations that link back to the exact source. Click a citation to see the snippet it came from; nothing is asserted without one.
- Conversational. Multi-turn chats with history. A cheap "rewriter" model resolves follow-ups ("and the previous one?") into self-contained queries, and skips retrieval entirely when a question is answerable from the conversation so far.
- Agentic. When the initial evidence is thin, the model can call a
nexus_searchtool to run further searches mid-answer (capped per turn), and an optional, flag-gatednexus_open_attachmenttool to pull a specific document in full. - Multi-modal. For vision-capable models, retrieved images and PDFs are attached to the prompt, so the model can read charts, scans, and signatures that text extraction loses. Anthropic and OpenAI read PDFs natively.
- Bring your own model. Anthropic (with native citations), OpenAI, and local Ollama, picked per chat from an admin-curated allowlist. Provider keys are encrypted at rest; configure them under Settings → Ask.
Answers stream over Server-Sent Events; retrieval stays scoped to the calling user exactly like search, including tool-issued searches. Thumbs feedback and per-turn token/latency stats are recorded for tuning.
For regression-testing the pipeline, make rag-eval runs a golden question set
through the live orchestrator and grades each answer — citation correctness
plus an LLM-as-judge for faithfulness, relevance, and abstention — then writes a
markdown report diffed against the previous baseline.
Everything is an environment variable prefixed with NEXUS_. Anything marked
required must be in your .env; everything else has a sensible default.
| Variable | Required | Default | Purpose |
|---|---|---|---|
NEXUS_ENCRYPTION_KEY |
yes | — | 64 hex chars (32 bytes) for AES-256-GCM. Lose it, lose every credential. |
NEXUS_JWT_SECRET |
yes* | random per boot | Signs session tokens. Set it, or every restart logs everyone out. |
NEXUS_DATABASE_URL |
yes | — | Postgres connection string. Set in compose automatically. |
NEXUS_OPENSEARCH_URL |
no | http://localhost:9200 |
OpenSearch endpoint. |
NEXUS_OPENSEARCH_USERNAME |
no | — | Basic-auth user. Empty = no auth (default). See OpenSearch authentication. |
NEXUS_OPENSEARCH_PASSWORD |
no | — | Basic-auth password. |
NEXUS_OPENSEARCH_CA_FILE |
no | — | PEM CA bundle to verify the OpenSearch server cert. |
NEXUS_OPENSEARCH_INSECURE_SKIP_VERIFY |
no | false |
Skip TLS verification (demo certs over a private bridge). |
NEXUS_TIKA_URL |
no | http://localhost:9998 |
Apache Tika endpoint for rich binary extraction / OCR. |
NEXUS_OLLAMA_URL |
no | http://localhost:11434 |
Ollama endpoint for local embeddings. |
NEXUS_PORT |
no | 8080 |
HTTP port the app listens on. |
NEXUS_LOG_LEVEL |
no | info |
info or debug. |
NEXUS_CORS_ORIGINS |
no | http://localhost:5173 |
Comma-separated allowed origins. |
NEXUS_BINARY_STORE_PATH |
no | /var/lib/nexus/binaries (compose) / temp dir (local) |
On-disk cache for Telegram/IMAP attachments and file binaries. |
NEXUS_FS_ROOT_PATH |
no | — | On first boot, seeds a shared Filesystem connector at this path. |
NEXUS_FS_PATTERNS |
no | *.txt,*.md |
Comma-separated glob patterns for the seeded Filesystem connector. |
NEXUS_EMBEDDING_PROVIDER |
no | (configured via UI) | ollama | openai | voyage | cohere — forces the provider. |
NEXUS_EMBEDDING_MODEL |
no | provider-specific | Overrides the default model for the provider above. |
NEXUS_EMBEDDING_API_KEY |
no | — | API key for OpenAI/Voyage/Cohere. |
NEXUS_RERANK_PROVIDER |
no | (configured via UI) | voyage | cohere. |
NEXUS_RERANK_MODEL |
no | provider-specific | Overrides the default reranker model. |
NEXUS_RERANK_API_KEY |
no | falls back to NEXUS_EMBEDDING_API_KEY when the provider matches |
API key for the reranker. |
NEXUS_LLM_ANTHROPIC_API_KEY |
no | (configured via UI) | Enables Claude models for Ask (native citations + PDF). |
NEXUS_LLM_OPENAI_API_KEY |
no | (configured via UI) | Enables GPT models for Ask. |
NEXUS_LLM_OLLAMA_URL |
no | falls back to NEXUS_OLLAMA_URL |
Dedicated Ollama endpoint for Ask (local models). |
NEXUS_LLM_DEFAULT_MODEL |
no | first-boot picks the cheapest configured | Provider-prefixed default model, e.g. anthropic:claude-sonnet-4-6. |
* Required in the strict sense that omitting it works, but every restart invalidates every session — not what you want in production.
Provider credentials and most of the scoring knobs are also editable live from the Settings UI without restarting the container.
By default OpenSearch runs without authentication. The app's own JWT/role layer scopes what users see, but that protection lives in Nexus — anything that can reach OpenSearch directly bypasses it. Two layers guard against that:
-
Network isolation (default, always on). The app reaches OpenSearch over the private compose network (
opensearch:9200). The published host port is bound to127.0.0.1, so it is reachable for localmake devand debugging but never exposed to your LAN. For a single-host homelab deployment this is usually enough. To drop the host port entirely, remove theports:block from theopensearchservice. -
Security plugin + basic auth (opt-in). Layer on the secure overlay to run OpenSearch with the security plugin (HTTPS + a credential gate):
# set NEXUS_OPENSEARCH_PASSWORD in .env to a strong password first docker compose -f docker-compose.yml -f docker-compose.secure.yml --profile app up -dOpenSearch 2.12+ requires a strong admin password (≥ 8 chars, mixed case + digit + special, "strong" zxcvbn entropy) or it refuses to boot. The overlay uses the bundled demo certificates, so the app connects with TLS verification disabled (
NEXUS_OPENSEARCH_INSECURE_SKIP_VERIFY=true) — a real credential gate over the private bridge, not a public CA chain. To verify the certificate instead, pointNEXUS_OPENSEARCH_CA_FILEat a CA bundle; note the demo cert's SANs do not include theopensearchhostname, so CA verification requires certificates regenerated with a matching SAN.
Requirements: Go 1.26+, Node.js 24+, Docker.
cp .env.example .env
# fill in NEXUS_ENCRYPTION_KEY / NEXUS_JWT_SECRET as above
make dev # starts Postgres/OpenSearch/Tika in Docker, runs the Go app locally
cd web && npm install && npm run dev # starts Vite dev server at :5173 (proxies /api to :8080)The bundled Makefile also has:
make up # full stack in Docker (app + deps)
make down # stop everything
make test # unit + integration tests
make lint # golangci-lint
make coverage # integration tests with coverage (floored at 90%)
make build # build binary to bin/nexus
make rag-eval # grade the Ask pipeline against the golden set
Frontend-only targets (run inside web/):
npm run build # type check + Vite build
npm run lint # eslint
npm test # Vitest unit tests
npm run test:e2e # Playwright end-to-end
npm run coverage:all # V8 + monocart merge, floors at 85/90/75/70Integration tests spin up their dependencies via testcontainers-go, so a local
run needs no setup beyond a working Docker socket. See
CLAUDE.md for the full architecture notes.
┌───────────────┐ ┌──────────────────────────────────────┐ ┌──────────────┐
│ React SPA │ ───▶│ chi HTTP API (Go, single binary) │ ───▶│ PostgreSQL │
└───────────────┘ │ │ │ (app state) │
│ ┌────────────────────────────────┐ │ └──────────────┘
│ │ Connectors │ │
│ │ • Filesystem • IMAP │ │ ┌──────────────┐
│ │ • Telegram • Paperless-ngx │ │ ───▶│ OpenSearch │
│ │ │ │ │ (BM25 + kNN) │
│ │ → chunk → embed → index │ │ └──────────────┘
│ └────────────────────────────────┘ │
│ │ ┌──────────────┐
│ Search: BM25 + vector → RRF → │ ───▶│ Tika │
│ reranker → source scoring │ │ (extraction) │
│ │ └──────────────┘
│ Scheduler: robfig/cron per connector│ ┌──────────────┐
│ Sync runs: DB-backed + SSE streams │ ───▶│ Embedder / │
└──────────────────────────────────────┘ │ Reranker │
│ (Ollama/API) │
└──────────────┘
cmd/nexus/— entry point, wiring, graceful shutdown.internal/api/— HTTP handlers, connector manager, static file serving.internal/connector/— connector interface + Filesystem / IMAP / Telegram / Paperless-ngx implementations.internal/pipeline/— fetch → extract → chunk → embed → index.internal/search/— OpenSearch client, hybrid retrieval, highlighting.internal/llm/— provider adapters (Anthropic / OpenAI / Ollama) + the model catalog and registry.internal/rag/— the Ask orchestrator: rewrite → retrieve → generate → tool-loop, citation handling, multi-modal attachment;internal/rag/eval/is themake rag-evalharness.internal/scheduler/— cron-based automatic sync.internal/store/— PostgreSQL access layer (no ORM; raw SQL via pgx).web/— React + TypeScript + Vite frontend.
Binary releases and multi-arch Docker images (linux/amd64, linux/arm64)
are produced automatically when a v* tag is pushed.
- Docker image:
ghcr.io/yasen-pavlov/nexus:vX.Y.Z(and:latest). - Binaries: attached to each GitHub Release — Linux (amd64/arm64), macOS (amd64/arm64), Windows (amd64).
This is a personal project but issues and PRs are welcome. Before sending a PR:
- Run
make lint && make test && cd web && npm test && npm run build. - Backend coverage must stay at 90%+ and frontend at 85% statements / 90% lines.
- One-line comments only; explain why when the code isn't obvious.
MIT © Yasen Pavlov.
