Conversational assistant for CertMate (source).
Embedded local LLM (LM Studio by default) + a 1:1 mapping of CertMate's REST API as
LLM-callable tools. Read tools auto-execute; write tools queue a pending_action with a
human-readable summary and require an explicit confirmation from the UI.
Part of the CertMate ecosystem:
- CertMate — open-source SSL certificate management (API + UI).
- certmate-tools — free, privacy-first, client-side TLS / certificate / ACME diagnostics.
- nis2-public — NIS2 continuous posture management & remediation.
Enterprise / high-scale — multi-tenant, mTLS, white-label and NIS2-aligned deployments are available through CertMate-ng (source-available, BSL 1.1, EU-built). Contact fabrizio.salmi@gmail.com.
Set AGENT_MODE to choose between:
-
full(default) — sidecar to a real CertMate instance. All tools available: live state reads, write commands with confirm, admin (/reindex), RAG over docs. Used for self-hosters running their own CertMate. -
docs_only— public docs assistant. No CertMate API connection. Onlydocs_search+/help+/docsavailable; the LLM's system prompt is also adjusted to tell it there is no live state. Used for agent.certmate.org-style deployments where anyone can ask questions about CertMate features and configuration.
Mode-dependent behavior:
full |
docs_only |
|
|---|---|---|
| Tools registered | 23 | 1 (docs_search) |
| Slash commands | all | /help, /docs, /ask |
| CertMate API client | opened per turn | not opened |
/health.certmate |
live check | "status": "disabled" |
| System prompt | "you can read live state + docs" | "no live state, docs only" |
| Widget header badge | hidden | DOCS ONLY |
The widget discovers the mode by hitting /health on mount and adapts
its autocomplete + intro hint accordingly.
user ──► widget (vanilla web component)
│ SSE
▼
FastAPI agent ──► LM Studio (chat + embeddings, OpenAI-compatible)
│
└─► CertMate REST API (Bearer token)
- Single LLM endpoint, configurable via env (LM Studio by default).
- Read tools run inline; write tools save a
pending_actionand return aconfirm_token. The widget posts to/tools/executewith the token to actually run them. - sqlite for conversations / pending actions / audit log.
cp .env.example .env
# set CERTMATE_URL + CERTMATE_TOKEN, point LMSTUDIO_URL at your LM Studio server
uv pip install -e . # or: pip install -e .
python -m agent.mainOpen http://127.0.0.1:8765/widget/ for the standalone test page.
Embed in CertMate or any page:
<script type="module" src="http://127.0.0.1:8765/widget/certmate-agent.js"></script>
<link rel="stylesheet" href="http://127.0.0.1:8765/widget/certmate-agent.css">
<certmate-agent endpoint="http://127.0.0.1:8765"></certmate-agent>| Method | Path | What |
|---|---|---|
| GET | /health |
agent + LM Studio + CertMate health |
| GET | /models |
check the configured chat/embed models are loaded |
| POST | /chat |
SSE stream — body {message, history?} |
| POST | /tools/execute |
confirm a queued write — body {token} |
Configured by default with google/gemma-4-e2b (chat) and
text-embedding-embeddinggemma-300m (embeddings).
Gemma's small variants are "thinking" models — they spend tokens on internal reasoning before
producing output. If you see empty assistant replies, raise AGENT_MAX_TOKENS (try 2048+), or
swap to an instruct model with reliable native tool-calling such as qwen/qwen3-8b.
Read (auto-executed): system_overview, system_health, cert_list, cert_get,
cert_deployment_status, cert_dns_alias_check, settings_get, dns_providers_info,
dns_accounts_list, dns_account_get, backups_list, storage_info, client_certs_list.
Write (require confirm): cert_create, cert_renew, cert_auto_renew_toggle, cert_deploy,
cache_clear, backup_create, dns_account_add.
Destructive (require confirm + extra UI warning): backup_delete, dns_account_delete.
| Command | What |
|---|---|
/help (/?) |
List all commands |
/health |
CertMate service health |
/status (/overview) |
Health + cert count + certs expiring within 30d |
/expiring [days] |
Certificates expiring within N days (default 30) |
/list (/certs, /ls) |
All managed certificates |
/cert <domain> |
Details for one certificate |
/providers (/dns) |
Supported + configured DNS providers |
/accounts [provider] |
Configured DNS accounts |
/backups |
Available backups |
/renew <domain> [--force] |
Renew (queues a confirm) |
/deploy <domain> |
Run deploy hook (queues a confirm) |
/cache-clear |
Clear server cache (queues a confirm) |
/docs <query> (/ask) |
Search the CertMate docs (RAG) |
/reindex [repo] [branch] |
Rebuild the docs index (admin only) |
Slash commands bypass the LLM entirely — sub-200ms response, deterministic output, and write commands reuse the same confirm-token flow as LLM-emitted tool calls.
The agent ships with an indexer that pulls README.md and docs/*.md from
the CertMate GitHub repo, chunks by markdown headings, and embeds with the
local text-embedding-embeddinggemma-300m. Queries embed with the same
model and cosine-rank in pure Python (no numpy / vector DB).
python -m agent.rag.indexer # defaults: fabriziosalmi/certmate@main
python -m agent.rag.indexer --repo X/Y --branch main
python -m agent.rag.indexer --paths README.md,docs/api.mdIndex is written to docs_index/index.pkl (~2 MB for 271 chunks). The
agent loads it lazily at first docs_search call — restart not required
after rebuild.
Both /docs <query> (slash, sub-50ms after embed) and the LLM tool
docs_search use the same path. The system prompt instructs the LLM to
call docs_search for any conceptual / how-to question, which keeps
small models like gemma-4-e2b grounded.
Set AGENT_ADMIN_TOKEN=<secret> to enable admin-only commands like
/reindex. Clients prove admin status by sending one of:
- HTTP header:
X-Agent-Admin: <secret>(preferred) - JSON body field:
"admin_token": "<secret>"
When empty, all admin commands are disabled (404-equivalent: refused with an explanatory error in the chat). Comparison is constant-time.
In the widget, set the attribute only on admin-facing pages:
<certmate-agent endpoint="…" admin-token="MY_SECRET"></certmate-agent>Optional. Off by default (stateless: client passes history each turn).
Set AGENT_PERSIST_CONVERSATIONS=true to:
- mount
/conversations/<session_id>(GET to fetch, DELETE to clear) - store user + assistant messages in sqlite, keyed by
session_id - on each turn the server loads history server-side and ignores the
client's
historyfield — survives page reloads, multi-tab use, and works when the agent is behind a load balancer
In the widget, opt in with the persist attribute (generates a per-host
session_id in localStorage and adds a "New session" button):
<certmate-agent endpoint="…" persist></certmate-agent>A single asyncio task runs every AGENT_CLEANUP_INTERVAL_SECONDS
(default 1 hour) and prunes:
- expired
pending_actions(always, regardless of persistence) conversation_messagesolder thanAGENT_CONVERSATION_TTL_DAYS(only whenAGENT_PERSIST_CONVERSATIONS=true)
Set AGENT_CLEANUP_INTERVAL_SECONDS=0 to disable. A pass runs on boot
so a long-stopped instance cleans backlog before serving traffic.
Set OPENROUTER_API_KEY to enable a fallback chat provider. The agent
tries the primary LM Studio first and falls back to OpenRouter only when
the primary errors out (connection, timeout, 5xx). A small circuit
breaker trips the primary after LLM_PRIMARY_FAILURE_THRESHOLD
consecutive failures and keeps it tripped for
LLM_PRIMARY_COOLDOWN_SECONDS before retrying.
Embeddings always stay on the primary (only LM Studio runs the embedding
model). The widget receives an extra status event "served via
openrouter" when the fallback handled the turn, so you can see it in the
chat log.
Pre-built workflows + manifest ship the public docs assistant
(agent.certmate.org-style) end-to-end:
Runs every Monday 06:00 UTC and on manual trigger. Pulls
fabriziosalmi/certmate@main, chunks + embeds the docs, publishes the
resulting pickle as a index-latest GitHub Release.
Required repo secrets (Settings → Secrets and variables → Actions):
| Secret | Example value |
|---|---|
INDEX_EMBED_URL |
https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/ai/v1 |
INDEX_EMBED_API_KEY |
Cloudflare API token (Workers AI scope) |
INDEX_EMBED_MODEL |
@cf/baai/bge-base-en-v1.5 (768-dim) |
OpenAI works too:
| Secret | Example value |
|---|---|
INDEX_EMBED_URL |
https://api.openai.com/v1 |
INDEX_EMBED_API_KEY |
sk-... |
INDEX_EMBED_MODEL |
text-embedding-3-small |
Important: the same embedding model must be configured on the runtime that serves the index. The store warns loudly if they don't match.
The Fly app runs AGENT_MODE=docs_only, fetches the published index on
cold start via AGENT_INDEX_BOOTSTRAP_URL, and uses Cloudflare Workers
AI (or OpenRouter) for chat. Setup once:
fly apps create certmate-agent # pick any name
# Edit fly.toml: replace REPLACE_ME in LMSTUDIO_URL with your CF account id
fly secrets set LMSTUDIO_API_KEY=<token> # CF Workers AI token
fly volumes create certmate_agent_data --size 1 --region fra
fly deploy # one-shot bootstrapGet a deploy token for GitHub:
fly tokens create deploy -x 99999h
# paste into repo secret FLY_API_TOKENAfter that, the deploy workflow runs automatically on:
- push to
main(rebuilds the image) - successful
rebuild-docs-index(just restarts machines so they re-bootstrap the index, no image rebuild) - manual dispatch
Point your DNS:
agent.certmate.org → CNAME → <app-name>.fly.dev
Fly handles TLS issuance + renewal — fitting for the CertMate ecosystem.
| Layer | Updates when | Cost |
|---|---|---|
| Docker image | code/config changes (rare) | 0 |
| Index artifact | weekly cron or doc push | $0.001/run on OpenAI; free on CF WAI |
| Fly machine | code change or index refresh | scale-to-zero, ~free under low traffic |
docker/Dockerfile + docker/docker-compose.example.yml provided.
sqlite db + RAG index live under /data (volume). Build the index from
inside the container the first time:
docker compose -f docker/docker-compose.example.yml up -d
docker compose -f docker/docker-compose.example.yml exec certmate-agent \
python -m agent.rag.indexerAfter that you can rebuild via the /reindex admin command from the widget
(no shell access needed) — hot-swap, no restart.
