A proof-of-concept agentic RAG application over the CoMSES Computational Model Library, built on Temporal.io.
A 5-minute setup and demo video on youtube: https://www.youtube.com/watch?v=sfjV-Id7-vg

A POC that lets researchers ask natural-language questions across computational model data — metadata, documentation, and source code (see sample_data/) — and get answers with paragraph-level citations back to the source material.
The agent itself is a Temporal workflow (AgentWorkflow) whose tools can be either Temporal activities (fast, mostly side-effect-free) or Temporal child workflows (multi-step, durable, with their own progress events).
Screenshots
- Not production-ready. No auth hardening, no rate-limiting at the public edge, etc.
- Not a search box or a chatbot wrapper around a single model — it decomposes queries, resolves relevant models (with optional human-in-the-loop), and runs hybrid (dense + sparse) vector search before generating cited answers.
Per-module intent.md files document the why behind each major decision:
intent.md— system-level rationale: agentic RAG over CoMSES, layered code structure, Temporal, worker split, event sourcing, LiteLLM proxysrc/modules/agent/intent.md— the conversation runtime:AgentWorkflow, three tool types, transactional outbox, context propagationsrc/modules/ingestion/intent.md— write side: marker-pdf, synthetic Q&A enrichment, hybrid embeddings (dense + BM42 sparse), tree-sitter for codesrc/modules/retrieval/intent.md— read side: intent analysis, query decomposition, model relevance + HITL, hybrid RRF search, source attribution, almost real-time progress
Software
- linux, wsl2, macos (didn't test)
- Docker + Docker Compose (for the infrastructure stack)
./setup.shwill install missing dependencies automatically
Hardware
- 16 GB RAM minimum; 24 GB+ recommended (PyTorch + marker-pdf + embedding models share host memory)
- ~10 GB disk for ML model weights and Docker images
- GPU: NVIDIA GPU with CUDA for faster PDF parsing and embeddings.
Verified on Windows 11 laptop on WSL2 with 32 GB RAM, 8 GB VRAM (NVIDIA RTX 2000 Ada Generation)
LLM access
- An API key from at least one provider — OpenAI, Anthropic, OpenRouter, Groq, Google — or a local Ollama instance reachable at
OLLAMA_HOST.setup.shprobes the keys you supply and auto-picks the first live profile. - Embeddings (dense + sparse BM42) run locally via FastEmbed by default — no separate API needed. A GPU is highly recommended for embeddings computation.
./setup.shThe script bootstraps everything in phases: toolchain install, .env generation with auto-generated secrets, Docker stack startup (Postgres, Qdrant, Redis, MinIO, Temporal, LiteLLM), database migrations, model warming, sample-data ingestion. It will prompt for LLM API KEY and llm/embeddings configuration and worker startup. When it finishes you'll have a UI at http://localhost:5173 and a sample dataset to query.
Run ./setup.sh --help for individual phase verbs (re-run a phase, recreate, etc.).
Each phase is idempotent (sentinel-gated) and resumable — a re-run picks up at the first incomplete or invalidated phase.
| # | Phase | What it does |
|---|---|---|
| 1 | toolchain |
Detects required CLIs (uv, node, pnpm, zellij, shellcheck, jq) and installs anything missing via the official installers. Docker is required by later phases but not auto-installed here — install it yourself if absent. |
| 2 | uv_sync |
Runs uv sync --group pdf (and --group gpu when an NVIDIA GPU is detected). First run downloads ~2 GB (PyTorch + marker-pdf), plus ~600 MB of cuDNN/cuBLAS wheels on GPU hosts. |
| 3 | hardware_preflight |
Warn-only RAM / swap / CPU / GPU posture check. Suggests .env overrides for low-memory hosts (e.g. INGEST_WORKER_MAX_CONCURRENT_ACTIVITIES=2); never hard-fails. |
| 4 | env_bootstrap |
Creates .env from .env.example (or appends new keys to an existing one), generates per-deployment secrets (LITELLM_MASTER_KEY, MINIO_ROOT_PASSWORD, QDRANT_API_KEY, DB passwords, UI passwords). |
| 5 | app_hostnames |
Prompts for the public host the browser will use (default localhost; FQDN/IP for remote VMs - see Deploying CoMSES AgentSpace on a remote VM). Coherently writes CORS_ALLOWED_ORIGINS, MINIO_EXTERNAL_ENDPOINT, VITE_API_BASE_URL, VITE_WS_BASE_URL, VITE_HOST, and VITE_ALLOWED_HOSTS. RFC-1123-validates the input. |
| 6 | env_triage |
Detects and refuses to start when a sibling Temporal stack is already running on the same ports (7233 / 8080 / 9090 / 8085 / 16686). |
| 7 | provider_keys |
Probes every supported LLM provider (OpenAI, Anthropic, Groq, OpenRouter, xAI, Google, GPUStack) and prompts for a chat-provider key when none are alive. |
| 8 | embedding_backend |
Picks the dense embedding backend by writing EMBEDDING_DENSE_BACKEND to .env — fastembed (in-process default), ollama-container (Docker), or cloud-<provider> (any LiteLLM-supported embed provider). Derives EMBEDDING_DENSE_PROVIDER / EMBEDDING_DENSE_MODEL / OLLAMA_API_BASE from the choice. Sparse (BM42) is always local. |
| 9 | litellm_config_seed |
Picks a profile (e.g. cloud-openrouter, local) and seeds config/litellm/litellm_config.yaml from config/litellm/profiles/<profile>.yaml. Preserves user edits by detecting a seed marker; re-seeds (with backup) when the profile or embedding backend changes, or with --reseed. |
| 10 | litellm_config_review |
Prints a banner of the seeded role → model mappings and pauses for inspection so you can edit the YAML before launch. Skipped under --auto-confirm or non-interactive mode. |
| 11 | marker_prewarm |
Pre-downloads marker-pdf layout / OCR / text-recognition models (~1.5 GB) into ~/.cache/huggingface/ so the first PDF ingest doesn't stall. |
| 12 | fastembed_prewarm |
Pre-downloads dense + sparse (BM42) embedding models locally. Dense is skipped when EMBEDDING_DENSE_BACKEND is ollama-container or cloud-*; sparse is always local. |
| 13 | docker_up |
Brings up the Temporal stack (docker-compose.temporal.yml) then the infra stack (compose.yml), then health-checks 8 services in order: temporal-postgresql → temporal → comses-rag-db → litellm-db → redis → minio → qdrant → litellm-proxy. |
| 14 | ollama_prewarm |
Only runs when EMBEDDING_DENSE_BACKEND=ollama-container. Waits for the ollama-pull-llama init container to finish and ensures nomic-embed-text is pulled inside the ollama container. No-op for fastembed and cloud-* backends. |
| 15 | litellm_key |
Calls POST /key/generate against the running LiteLLM proxy to mint a virtual API key and writes it to LITELLM_PROXY_API_KEY in .env. |
| 16 | litellm_routing_probe |
Per-role smoke calls (smart / default / fast / long / embed) against the proxy. Hard-fails if no chat role responds 2xx or if embed returns no vector. |
| 17 | migrations |
Runs make db-check then make db-upgrade to bring the comses-rag-db schema to the latest Alembic head. |
| 18 | hosts_file |
Validates that the Docker DNS names workers connect to (minio, redis, qdrant, ollama, litellm-proxy, litellm-db, comses-rag-db) resolve from the host. If any are missing, offers [a]uto sudo / [m]anual / [s]kip to append 127.0.0.1 … to /etc/hosts. |
| 19 | workers |
Prompts you to start the 10-pane Zellij worker layout in a second terminal (make w) and polls each worker's metrics port (10090–10099) until ready. |
| 20 | sample_data |
Stages and ingests two bundled CoMSES codebases through the full pipeline (marker-pdf → fastembed → Qdrant + Postgres + MinIO). |
| 21 | dashboard |
Prints the final dashboard: service URLs + credentials, Temporal CLI hint, Zellij attach command, sample-data summary, and a "Try it" pointer at the configured host. |
| Service | URL | Credentials |
|---|---|---|
| Chat UI | http://localhost:5173 | API key dev-key-1 (from API_KEY_MAPPING in .env) |
| FastAPI | http://localhost:8000 | — |
| Temporal UI | http://localhost:8080 | — |
| Grafana | http://localhost:8085 | admin / $GRAFANA_ADMIN_PASSWORD |
| LiteLLM UI | http://localhost:4000/ui | admin / $LITELLM_PROXY_UI_PASSWORD |
| Jaeger | http://localhost:16686 | — |
| Prometheus | http://localhost:9090 | — |
| Qdrant dashboard | http://localhost:6333/dashboard | $QDRANT_API_KEY |
| MinIO Console | http://localhost:9001 | minio_admin / $MINIO_ROOT_PASSWORD |
| pgAdmin | http://localhost:8888 | $PGADMIN_DEFAULT_EMAIL / $PGADMIN_DEFAULT_PASSWORD |
| Databasus | http://localhost:4005 | — |
$VAR references are auto-generated values written into .env by the env-bootstrap phase — setup.sh also prints them once on completion. Look them up in .env, not here.
Temporal CLI
docker exec -it temporal-admin-tools temporal workflow listWorkers (Zellij)
zellij attach comses-workersSample data
Two actual models from the CoMSES Model Library are ingested on the first run of setup.sh:
761c91b8-897b-4e59-8b5f-83715d6c9471- MicroAnts 2.5dd847e79-bb37-43e1-ae3a-27de57573376- Ants Digging Networks
Try it
Open http://localhost:5173, log in with API key dev-key-1, and ask a multi-part question — e.g. "What ant-foraging models are in the library, and how do they differ?"
See deployment/README.md for the full recipe — SSH-tunnel mode (recommended for solo dev) and HTTPS-via-Caddy mode (for sharing a public demo URL).
The repo enforces quality gates via qlty (ruff + mypy + deptry + a pre-commit / pre-push hook combo). qlty is a standalone Rust CLI, not a Python package — setup.sh does not install it because it isn't a runtime dependency. Install it yourself, then wire the hooks:
curl -fsSL https://qlty.sh | sh # one-time install → ~/.qlty/bin/
echo 'export PATH="$HOME/.qlty/bin:$PATH"' >> ~/.bashrc # or ~/.zshrc — persist on PATH
make install-git-hooks # symlink .git/hooks/{pre-commit,pre-push}make install-git-hooks symlinks .git/hooks/pre-commit → .qlty/hooks/pre-commit.sh and .git/hooks/pre-push → .qlty/hooks/pre-push.sh. The repo ships custom, version-controlled hooks (biome + .env.example host-path guard + uv run mypy + tsc --noEmit alongside qlty check), so make install-git-hooks deliberately does not call qlty githooks install — that subcommand regenerates .qlty/hooks/*.sh from defaults and would wipe the custom logic.
make d # start infrastructure (Postgres, Qdrant, Redis, MinIO, Temporal, LiteLLM)
make w # start all 10 Temporal workers (Zellij layout) + the chat app (backend + frontend)
make k # stop infra
make kw # kill all workers + chat app
make test # unit tests (fast, mocked)
make test-integration # integration tests (PMR containers)
make check # ruff + mypy + deptry + qltyModule-specific develop notes live in the per-module READMEs: backend/, frontend/, shared/, shared/worker_base/.
Contributions are welcome.
- Temporal — the durable workflow engine that is the execution backbone of the ingestion workflows, agent runtime, every retrieval tool and the event-streaming outbox
- marker-pdf — layout-aware PDF parsing for academic model documentation
- Zellij — terminal multiplexer that hosts the 10-pane worker layout via
make w
This project is released under the MIT License.
⚠️ Caveat — GPL-3.0 dependency. The PDF ingestion pipeline depends onmarker-pdf(and its sub-dependencysurya-ocr), both of which are licensed under GPL-3.0-or-later. While this project's own source code is MIT-licensed, anyone distributing or running the combined application withmarker-pdflinked in is bound by GPL-3.0 obligations for that combined work.