Skip to content

devlille/cortex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cortex

Deployment repo for a self-hosted Ollama instance. The image ships with two pre-pulled models:

  • qwen2.5:3b (default) — general chat, drafting, summarisation
  • gemma3:1b — small/fast for classification, tagging, extraction

This repo only owns the Ollama deployment. The application code that calls it (the /orgs/{orgSlug}/ai/chat API) lives in partners-connect/server — see the spec at partners-connect/server/specs/026-llm-chat-integration/.

What's here

cortex/
├── ollama/
│   └── Dockerfile               # builds an Ollama image with qwen2.5:3b + gemma3:1b pre-pulled
├── Dockerfile.deploy            # consumed by Clever Cloud, pulls the image from GHCR
├── docker-compose.yml           # local smoke test
├── .github/workflows/
│   ├── build-push-ollama.yaml   # CI: build & push to ghcr.io on every main push
│   └── deploy-ollama.yaml       # CD: deploy to Clever Cloud
└── DEPLOYMENT.md                # one-time Clever Cloud setup runbook

Local smoke test

docker compose up --build

Host-side port is 11435 locally to avoid colliding with a native Ollama install on the dev machine (which uses 11434). Inside the compose network, services still reach Ollama at http://ollama:11434. In production (Clever Cloud), Ollama is forced to listen on 8080 via the OLLAMA_HOST env var (set by the deploy workflow) so Clever Cloud's healthcheck — which polls 0.0.0.0:8080 — succeeds.

First build takes ~4–5 minutes (pulls Ollama, then pulls qwen2.5:3b ~1.9 GB and gemma3:1b ~815 MB into the image).

Once running:

# List models baked into the image
curl -s http://127.0.0.1:11435/api/tags | jq

# Test inference
curl -s -X POST http://127.0.0.1:11435/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen2.5:3b","prompt":"Reply with one word: hello","stream":false}'

Stop:

docker compose down

Deployment pipeline

git push main ─▶ build-push-ollama.yaml ─▶ ghcr.io/<owner>/<repo>-ollama:<sha>
                                       └▶ ghcr.io/<owner>/<repo>-ollama:latest

manual trigger ─▶ deploy-ollama.yaml ─▶ Clever Cloud (single instance)

Image builds automatically on every push to main. Deploys are manual via gh workflow run "CD - Deploy Ollama to Clever Cloud" -f image_tag=latest (or the GitHub Actions UI).

First-time setup

See DEPLOYMENT.md for the one-time Clever Cloud setup (create apps, configure network group, set GitHub secrets).

Changing the default model

Edit the ollama pull lines in ollama/Dockerfile. Push to main to rebuild the image, then re-trigger the deploy workflow. Models get baked into the image, so a larger model means a larger image (and longer pulls on Clever Cloud).

Ollama loads one model into RAM at a time, so having multiple models on disk doesn't multiply runtime memory — image size grows but RAM only needs to fit the largest single model in use.

Sizing guide for the Clever Cloud instance (runtime RAM, with 8K context):

Model Disk Runtime RAM Min instance
gemma3:1b 815 MB ~2.0 GB S (~2 GB) — tight
qwen2.5:1.5b 986 MB ~1.8 GB S (~2 GB)
qwen2.5:3b (default) 1.9 GB ~3.5 GB M (~4 GB)
llama3.2:3b 2.0 GB ~3.5 GB M (~4 GB)
gemma3:4b 3.3 GB ~5 GB L (~8 GB)
qwen2.5:7b 4.7 GB ~6 GB L (~8 GB)

For runtime-pulled models (not baked in), attach a Clever Cloud FS Bucket add-on mounted at /root/.ollama so models persist across restarts.

Security note

The Ollama API has no authentication. The Clever Cloud Ollama app must NOT have a public domain — only the partners-connect server (on the same network group) should be able to reach it. See DEPLOYMENT.md.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors