Nexus

A self-hosted personal search engine. Point it at your files, email, chats and scanned documents, and get one search box that finds anything across all of them. Highlighted snippets, hybrid ranking, and optional semantic search.

What it does

One search box, many sources. Filesystem, IMAP email, Telegram, Paperless-ngx — all queried together with per-source filters, highlighted snippets, and source-aware ranking.
Hybrid search. BM25 full-text retrieval and optional dense-vector retrieval are merged with reciprocal rank fusion (RRF), then re-scored by a dedicated reranker. Works in plain BM25 mode out of the box; light up the rest by configuring an embedding provider in Settings.
Ask: grounded answers, not just links. A chat mode that answers questions from your own data with inline citations back to the source, multi-turn memory, agentic follow-up search, and multi-modal reading of images and PDFs. Bring your own model — Anthropic, OpenAI, or local Ollama. See Ask: grounded answers.
Plug-in connectors. Each source is a Go package that implements Connector (fetch + cursor-based incremental sync). Adding a new one is a day's work.
Scheduled syncs. Cron-backed scheduler runs connectors automatically; every sync is recorded with progress streamed live over Server-Sent Events and a cancellation hook.
Conversation browser. Telegram and IMAP threads are indexed as conversation windows instead of one-message-per-document, so embeddings actually have context. A dedicated chat-style viewer lets you jump around inside a thread after a hit.
Multi-user, auth-scoped. Username + bcrypt password, JWT sessions, two roles (admin/user). Every search result is scoped to the calling user — shared connectors (e.g. the family NAS) are visible to everyone, personal connectors (your email) are not.
Single container to run. One binary serves the React frontend and the API. The only hard dependencies are Postgres and OpenSearch.

Quick start

Requirements: Docker Engine 24+ with the Compose plugin. That's it.

git clone https://github.com/yasen-pavlov/nexus.git
cd nexus
cp .env.example .env

# Generate the two required secrets and drop them into .env
echo "NEXUS_ENCRYPTION_KEY=$(openssl rand -hex 32)" >> .env
echo "NEXUS_JWT_SECRET=$(openssl rand -base64 48)"  >> .env

docker compose --profile app up -d

Open http://localhost:8080. The first account you create becomes the admin.

A bundled testdata/ directory is mounted read-only as the default data source so you have something to search right away. Point NEXUS_DATA_PATH at your real files once you're ready.

Using local embeddings (optional)

By default Nexus runs in BM25-only mode. To enable semantic search without sending anything to a cloud provider:

docker compose --profile app --profile ollama up -d

Then open Settings → Embeddings and pick Ollama. You can also point it at OpenAI, Voyage, or Cohere — the same UI, just a different provider.

Data sources

Connector	What it indexes	Incremental sync
Filesystem	Any directory tree. Text, markdown, PDF, Office docs, images (OCR via Tika).	Mtime + content hash
IMAP	Any IMAP mailbox (iCloud, Gmail app passwords, Fastmail…). Bodies are cleaned — tracking redirects and RFC 3676 signatures are dropped before embedding.	`UIDNEXT` + UID cursor
Telegram	Private chats, groups, and channels you're a member of. Messages are grouped into 30-minute conversation windows for richer embeddings. Attachments download to the local binary cache.	Last seen message ID per chat
Paperless-ngx	Scanned documents and OCR text from your Paperless instance.	`modified__gt` timestamp cursor

All credentials are encrypted with AES-256-GCM using NEXUS_ENCRYPTION_KEY before they touch the database.

How search works

A query goes through three stages:

Retrieval. Both BM25 (OpenSearch, with per-language analyzers for English, Bulgarian and German) and dense vectors (if an embedding provider is configured) run in parallel.
Fusion. Reciprocal rank fusion merges the two ranked lists. RRF is pure rank math, not a relevance score — nothing is filtered here.
Reranking. Top candidates are deduped and sent to a cross-encoder (Voyage rerank-2 or Cohere rerank-3). The reranker returns a calibrated relevance score, which is filterable — results below the floor (default 0.12) are dropped.

On top of that, a source-aware scoring layer applies:

a half-life per source (email decays fast, filesystem slowly),
a recency floor so old documents stop bleeding score forever, and
a trust weight per source (Paperless > IMAP > Telegram by default).

All three constants live next to the connector definition, so adding a new source requires one change.

Ask: grounded answers

Search gives you a ranked list of sources. Ask gives you an answer — generated from those same sources and showing its work.

Grounded + cited. Every answer is written from chunks retrieved out of your index, with inline citations that link back to the exact source. Click a citation to see the snippet it came from; nothing is asserted without one.
Conversational. Multi-turn chats with history. A cheap "rewriter" model resolves follow-ups ("and the previous one?") into self-contained queries, and skips retrieval entirely when a question is answerable from the conversation so far.
Agentic. When the initial evidence is thin, the model can call a nexus_search tool to run further searches mid-answer (capped per turn), and an optional, flag-gated nexus_open_attachment tool to pull a specific document in full.
Multi-modal. For vision-capable models, retrieved images and PDFs are attached to the prompt, so the model can read charts, scans, and signatures that text extraction loses. Anthropic and OpenAI read PDFs natively.
Bring your own model. Anthropic (with native citations), OpenAI, and local Ollama, picked per chat from an admin-curated allowlist. Provider keys are encrypted at rest; configure them under Settings → Ask.

Answers stream over Server-Sent Events; retrieval stays scoped to the calling user exactly like search, including tool-issued searches. Thumbs feedback and per-turn token/latency stats are recorded for tuning.

For regression-testing the pipeline, make rag-eval runs a golden question set through the live orchestrator and grades each answer — citation correctness plus an LLM-as-judge for faithfulness, relevance, and abstention — then writes a markdown report diffed against the previous baseline.

Configuration

Everything is an environment variable prefixed with NEXUS_. Anything marked required must be in your .env; everything else has a sensible default.

Variable	Required	Default	Purpose
`NEXUS_ENCRYPTION_KEY`	yes	—	64 hex chars (32 bytes) for AES-256-GCM. Lose it, lose every credential.
`NEXUS_JWT_SECRET`	yes*	random per boot	Signs session tokens. Set it, or every restart logs everyone out.
`NEXUS_DATABASE_URL`	yes	—	Postgres connection string. Set in compose automatically.
`NEXUS_OPENSEARCH_URL`	no	`http://localhost:9200`	OpenSearch endpoint.
`NEXUS_OPENSEARCH_USERNAME`	no	—	Basic-auth user. Empty = no auth (default). See OpenSearch authentication.
`NEXUS_OPENSEARCH_PASSWORD`	no	—	Basic-auth password.
`NEXUS_OPENSEARCH_CA_FILE`	no	—	PEM CA bundle to verify the OpenSearch server cert.
`NEXUS_OPENSEARCH_INSECURE_SKIP_VERIFY`	no	`false`	Skip TLS verification (demo certs over a private bridge).
`NEXUS_TIKA_URL`	no	`http://localhost:9998`	Apache Tika endpoint for rich binary extraction / OCR.
`NEXUS_OLLAMA_URL`	no	`http://localhost:11434`	Ollama endpoint for local embeddings.
`NEXUS_PORT`	no	`8080`	HTTP port the app listens on.
`NEXUS_LOG_LEVEL`	no	`info`	`info` or `debug`.
`NEXUS_CORS_ORIGINS`	no	`http://localhost:5173`	Comma-separated allowed origins.
`NEXUS_BINARY_STORE_PATH`	no	`/var/lib/nexus/binaries` (compose) / temp dir (local)	On-disk cache for Telegram/IMAP attachments and file binaries.
`NEXUS_FS_ROOT_PATH`	no	—	On first boot, seeds a shared Filesystem connector at this path.
`NEXUS_FS_PATTERNS`	no	`.txt,.md`	Comma-separated glob patterns for the seeded Filesystem connector.
`NEXUS_EMBEDDING_PROVIDER`	no	(configured via UI)	`ollama` \| `openai` \| `voyage` \| `cohere` — forces the provider.
`NEXUS_EMBEDDING_MODEL`	no	provider-specific	Overrides the default model for the provider above.
`NEXUS_EMBEDDING_API_KEY`	no	—	API key for OpenAI/Voyage/Cohere.
`NEXUS_RERANK_PROVIDER`	no	(configured via UI)	`voyage` \| `cohere`.
`NEXUS_RERANK_MODEL`	no	provider-specific	Overrides the default reranker model.
`NEXUS_RERANK_API_KEY`	no	falls back to `NEXUS_EMBEDDING_API_KEY` when the provider matches	API key for the reranker.
`NEXUS_LLM_ANTHROPIC_API_KEY`	no	(configured via UI)	Enables Claude models for Ask (native citations + PDF).
`NEXUS_LLM_OPENAI_API_KEY`	no	(configured via UI)	Enables GPT models for Ask.
`NEXUS_LLM_OLLAMA_URL`	no	falls back to `NEXUS_OLLAMA_URL`	Dedicated Ollama endpoint for Ask (local models).
`NEXUS_LLM_DEFAULT_MODEL`	no	first-boot picks the cheapest configured	Provider-prefixed default model, e.g. `anthropic:claude-sonnet-4-6`.

* Required in the strict sense that omitting it works, but every restart invalidates every session — not what you want in production.

Provider credentials and most of the scoring knobs are also editable live from the Settings UI without restarting the container.

OpenSearch authentication

By default OpenSearch runs without authentication. The app's own JWT/role layer scopes what users see, but that protection lives in Nexus — anything that can reach OpenSearch directly bypasses it. Two layers guard against that:

Network isolation (default, always on). The app reaches OpenSearch over the private compose network (opensearch:9200). The published host port is bound to 127.0.0.1, so it is reachable for local make dev and debugging but never exposed to your LAN. For a single-host homelab deployment this is usually enough. To drop the host port entirely, remove the ports: block from the opensearch service.
Security plugin + basic auth (opt-in). Layer on the secure overlay to run OpenSearch with the security plugin (HTTPS + a credential gate):
```
# set NEXUS_OPENSEARCH_PASSWORD in .env to a strong password first
docker compose -f docker-compose.yml -f docker-compose.secure.yml --profile app up -d
```
OpenSearch 2.12+ requires a strong admin password (≥ 8 chars, mixed case + digit + special, "strong" zxcvbn entropy) or it refuses to boot. The overlay uses the bundled demo certificates, so the app connects with TLS verification disabled (NEXUS_OPENSEARCH_INSECURE_SKIP_VERIFY=true) — a real credential gate over the private bridge, not a public CA chain. To verify the certificate instead, point NEXUS_OPENSEARCH_CA_FILE at a CA bundle; note the demo cert's SANs do not include the opensearch hostname, so CA verification requires certificates regenerated with a matching SAN.

Development

Requirements: Go 1.26+, Node.js 24+, Docker.

cp .env.example .env
# fill in NEXUS_ENCRYPTION_KEY / NEXUS_JWT_SECRET as above

make dev                    # starts Postgres/OpenSearch/Tika in Docker, runs the Go app locally
cd web && npm install && npm run dev   # starts Vite dev server at :5173 (proxies /api to :8080)

The bundled Makefile also has:

make up           # full stack in Docker (app + deps)
make down         # stop everything
make test         # unit + integration tests
make lint         # golangci-lint
make coverage     # integration tests with coverage (floored at 90%)
make build        # build binary to bin/nexus
make rag-eval     # grade the Ask pipeline against the golden set

Frontend-only targets (run inside web/):

npm run build           # type check + Vite build
npm run lint            # eslint
npm test                # Vitest unit tests
npm run test:e2e        # Playwright end-to-end
npm run coverage:all    # V8 + monocart merge, floors at 85/90/75/70

Integration tests spin up their dependencies via testcontainers-go, so a local run needs no setup beyond a working Docker socket. See CLAUDE.md for the full architecture notes.

Architecture

 ┌───────────────┐     ┌──────────────────────────────────────┐     ┌──────────────┐
 │ React SPA     │ ───▶│ chi HTTP API (Go, single binary)     │ ───▶│ PostgreSQL   │
 └───────────────┘     │                                      │     │ (app state)  │
                       │  ┌────────────────────────────────┐  │     └──────────────┘
                       │  │  Connectors                    │  │
                       │  │  • Filesystem  • IMAP          │  │     ┌──────────────┐
                       │  │  • Telegram    • Paperless-ngx │  │ ───▶│ OpenSearch   │
                       │  │                                │  │     │ (BM25 + kNN) │
                       │  │  → chunk → embed → index       │  │     └──────────────┘
                       │  └────────────────────────────────┘  │
                       │                                      │     ┌──────────────┐
                       │  Search: BM25 + vector → RRF →       │ ───▶│ Tika         │
                       │          reranker → source scoring   │     │ (extraction) │
                       │                                      │     └──────────────┘
                       │  Scheduler: robfig/cron per connector│     ┌──────────────┐
                       │  Sync runs: DB-backed + SSE streams  │ ───▶│ Embedder /   │
                       └──────────────────────────────────────┘     │ Reranker     │
                                                                    │ (Ollama/API) │
                                                                    └──────────────┘

cmd/nexus/ — entry point, wiring, graceful shutdown.
internal/api/ — HTTP handlers, connector manager, static file serving.
internal/connector/ — connector interface + Filesystem / IMAP / Telegram / Paperless-ngx implementations.
internal/pipeline/ — fetch → extract → chunk → embed → index.
internal/search/ — OpenSearch client, hybrid retrieval, highlighting.
internal/llm/ — provider adapters (Anthropic / OpenAI / Ollama) + the model catalog and registry.
internal/rag/ — the Ask orchestrator: rewrite → retrieve → generate → tool-loop, citation handling, multi-modal attachment; internal/rag/eval/ is the make rag-eval harness.
internal/scheduler/ — cron-based automatic sync.
internal/store/ — PostgreSQL access layer (no ORM; raw SQL via pgx).
web/ — React + TypeScript + Vite frontend.

Releases

Binary releases and multi-arch Docker images (linux/amd64, linux/arm64) are produced automatically when a v* tag is pushed.

Docker image: ghcr.io/yasen-pavlov/nexus:vX.Y.Z (and :latest).
Binaries: attached to each GitHub Release — Linux (amd64/arm64), macOS (amd64/arm64), Windows (amd64).

Contributing

This is a personal project but issues and PRs are welcome. Before sending a PR:

Run make lint && make test && cd web && npm test && npm run build.
Backend coverage must stay at 90%+ and frontend at 85% statements / 90% lines.
One-line comments only; explain why when the code isn't obvious.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github		.github
.sonarlint		.sonarlint
cmd		cmd
docs		docs
internal		internal
migrations		migrations
testdata		testdata
web		web
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
.sqlfluff		.sqlfluff
.sqlfluffignore		.sqlfluffignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.tika		Dockerfile.tika
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.secure.yml		docker-compose.secure.yml
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
sonar-project.properties		sonar-project.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nexus

What it does

Quick start

Using local embeddings (optional)

Data sources

How search works

Ask: grounded answers

Configuration

OpenSearch authentication

Development

Architecture

Releases

Contributing

License

About

Uh oh!

Releases 12

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nexus

What it does

Quick start

Using local embeddings (optional)

Data sources

How search works

Ask: grounded answers

Configuration

OpenSearch authentication

Development

Architecture

Releases

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages