GitHub - pjmarz/LUMINAL: the child of light

Note: this repo is a showcase of the LUMINAL setup. It's not a turnkey deployment and isn't meant to be cloned and run as-is.

🎯 Overview

LUMINAL is a self-hosted AI stack. It runs on a Proxmox VM with an NVIDIA GPU and stitches together workflow automation, local LLM inference, RAG, and smart-home control in one Docker Compose project.

The goal: run a real AI product entirely on self-hosted hardware. Models, auth, data, everything. No hosted inference APIs, no commercial accounts, nothing leaving the network.

Midnight is a custom assistant written on top of OpenWebUI that talks to the HELIOS media library through 7 Python tools (Plex, Radarr, Sonarr, Tautulli, Bazarr, SABnzbd, Seerr). It answers questions by calling live APIs instead of making stuff up.

🧩 Architecture

Everything runs as Docker containers on a Proxmox VM. The stack breaks down like this:

Auth — Cloudflare Access sits in front. Google OAuth via trusted headers. No local passwords.
Interface — OpenWebUI is the frontend. It hosts Midnight, does RAG against Qdrant, and sends LLM calls to Ollama.
Inference & data — Ollama runs three local LLMs with GPU passthrough. Qdrant holds the RAG vectors. n8n handles visual workflow automation.
Physical world — Home Assistant plus Matter Server, both on host networking so mDNS device discovery works.

Midnight talks to a separate media stack (HELIOS) over HTTP APIs. LUMINAL is the brain, HELIOS is the library.

System Components

Category		Role in the system
🤖 AI Services	n8n	Visual workflow engine. Glue for cross-service automation.
	OpenWebUI	Chat interface and tool runtime. Hosts Midnight, handles RAG.
	Ollama	Local LLM inference with GPU acceleration.
🧠 AI Infrastructure	Qdrant	Vector DB for OpenWebUI's RAG.
	SearXNG	Metasearch backend for OpenWebUI's web-search RAG.
	Docker	Containers, named volumes, GPU passthrough.
🏠 Home Automation	Home Assistant	Device control hub. Runs on host network for discovery.
🏠 Home Automation	Matter Server	Matter protocol bridge for HA.
🔐 Security	Cloudflare Access	Zero Trust SSO. Google OAuth in front of OpenWebUI.

🧠 AI Models

Three models get pulled on first boot and cached on disk. Each does something different:

llama3.1:8b (4.9 GB) — fast general-purpose model. Good for quick chat and simple tool calls.
gemma4:e4b (9.6 GB) — multimodal with native tool use. This is what Midnight runs on.
gpt-oss:20b (~13 GB on disk, 20B params) — heavier reasoning when capability matters more than latency.

All three share one Ollama instance and one GPU.

🌙 Midnight Media Assistant

Midnight is a custom AI assistant built on top of OpenWebUI that queries the HELIOS media library. It uses function tools for everything. No question gets answered from what the model "knows" if a tool could answer from live data.

Setup

Component	Description
Base Model	gemma4:e4b via Ollama
Interface	OpenWebUI with a custom system prompt
Tools	7 Python function tools
Knowledge	RAG-indexed reference docs

Tools (`midnight/`)

Tool	What it does
`midnight_plex_tool`	Library search, recently added, episode details, cast, actor/director lookup
`midnight_radarr_tool`	Movie details, genres, synopses
`midnight_sonarr_tool`	TV show details, upcoming episodes
`midnight_tautulli_tool`	Watch history, current activity, most watched
`midnight_bazarr_tool`	Subtitle status and history
`midnight_sabnzbd_tool`	Download queue and history
`midnight_seerr_tool`	Content requests and search

How it behaves

Always calls a tool. Never answers a library question from model knowledge.
Normalizes curly quotes and special characters before sending to APIs.
Pulls real episode synopses from Plex instead of guessing plot summaries.
Returns the actual Plex "added on" date, not the file's download timestamp.
Says "I don't see that in the library" when something isn't there, instead of inventing a plausible answer.

Sample prompts

"What movies do we have with Tom Hanks?"
"What's new in the library?"
"What's the Bob's Burgers episode 'It's a Stunterful Life' about?"
"Show me Christmas movies"
"What's currently downloading?"
"Who's watching right now?"

See midnight/README.md for the full system prompt and tool docs.

🏗️ Design Decisions

Why things are set up the way they are.

Cloudflare Access instead of local accounts

OpenWebUI doesn't have its own login. Cloudflare Access sits in front, redirects to Google, and passes the authenticated email via a trusted header (Cf-Access-Authenticated-User-Email). OpenWebUI auto-creates the user from that header. No local passwords to manage, and access policy lives in one place instead of scattered across services.

OpenWebUI trusts that header regardless of source IP, which is only safe if nothing untrusted can reach the port. Here cloudflared runs on a separate LAN host and connects to OpenWebUI over the network, so the port stays published on the LAN — closing the direct-access/header-spoofing gap means restricting port 3000 to the tunnel host at the firewall (a DOCKER-USER iptables allowlist, since Docker's published ports bypass ufw), not binding to loopback. FORWARDED_ALLOW_IPS pins which upstream uvicorn trusts for X-Forwarded-* headers as defense-in-depth.

Docker Secrets, not env vars

Credentials (n8n encryption key, JWT secret, OpenWebUI session key) are mounted as files via Docker Secrets. They don't show up in env, process dumps, or compose logs. The plaintext files live in a locked-down system directory outside the repo.

Centralized config at `/etc/LUMINAL/`

The real env.sh and secrets/ directory live at /etc/LUMINAL/, symlinked into the project and gitignored. direnv picks them up on cd into the project, so interactive shells, cron jobs, and Docker Compose all see the same values without explicit sourcing. The pattern came after almost committing secrets one too many times.

External named volumes

Every piece of persistent state (n8n workflows, Ollama model cache, Qdrant indices, chat history, HA config) lives in an external Docker named volume. Containers get torn down and recreated without losing anything. Upgrades stop feeling risky.

Non-destructive rebuild script

scripts/docker-rebuild.sh pulls new images first, then runs docker compose up -d so only the services whose images actually changed get recreated. Everything else keeps running. It also runs a health check that skips the one-shot Ollama pullers (they're supposed to exit), retries transient failures, and returns 0/1/2 exit codes so cron can alert properly. --dry-run shows what would change without touching anything.

Every long-running service declares its own Docker healthcheck (Qdrant probes its port via bash//dev/tcp since its image ships no HTTP client; the rest hit /healthz-style endpoints), and startup ordering is gated on condition: service_healthy instead of fixed sleeps. So the script's health check gets a true signal from the whole stack, and the model pullers wait for Ollama to actually be serving before they run.

Anti-hallucination prompt engineering

Midnight's system prompt assumes the model will hallucinate if allowed to. Every question has to go through a tool call. The prompt explicitly bans answering from model knowledge when a tool could answer instead. It normalizes curly quotes in input and uses RAG against MIDNIGHT_REFERENCE.md to pick the right tool. Trade-off: Midnight is occasionally too strict and refuses things it could reasonably answer. Better than made-up movie titles.

GPU passthrough

Ollama runs all LLM inference on the NVIDIA GPU at hardware speed — no API costs, no rate limits, nothing leaving the box. OpenWebUI runs the :cuda image so its RAG side (embeddings, reranking, Whisper STT) is GPU-accelerated too; it does not run LLM inference itself — that's Ollama's job. Both reserve the GPU in the compose file (the plain :latest OpenWebUI image is CPU-only and would silently ignore the reservation).

📜 Changelog

Version history and evolution in CHANGELOG.md.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
docs		docs
midnight		midnight
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎯 Overview

🧩 Architecture

System Components

🧠 AI Models

🌙 Midnight Media Assistant

Setup

Tools (`midnight/`)

How it behaves

Sample prompts

🏗️ Design Decisions

Cloudflare Access instead of local accounts

Docker Secrets, not env vars

Centralized config at `/etc/LUMINAL/`

External named volumes

Non-destructive rebuild script

Anti-hallucination prompt engineering

GPU passthrough

📜 Changelog

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎯 Overview

🧩 Architecture

System Components

🧠 AI Models

🌙 Midnight Media Assistant

Setup

Tools (midnight/)

How it behaves

Sample prompts

🏗️ Design Decisions

Cloudflare Access instead of local accounts

Docker Secrets, not env vars

Centralized config at /etc/LUMINAL/

External named volumes

Non-destructive rebuild script

Anti-hallucination prompt engineering

GPU passthrough

📜 Changelog

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages

Tools (`midnight/`)

Centralized config at `/etc/LUMINAL/`