-
Notifications
You must be signed in to change notification settings - Fork 5
feat(platform): track and alert on response-time SLAs (#1924) #1939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,14 +19,15 @@ Tale does not ship a log shipper. The driver swap is the supported integration p | |
|
|
||
| ## Metrics | ||
|
|
||
| The Caddy proxy exposes two metrics paths gated by a single bearer token: | ||
| The Caddy proxy exposes three metrics paths gated by a single bearer token: | ||
|
|
||
| | Path | Source | What's inside | | ||
| | ------------------- | --------------- | ----------------------------------------------------------- | | ||
| | `/metrics/platform` | `tale-platform` | HTTP latency, route counters, Node process metrics | | ||
| | `/metrics/convex` | `tale-convex` | 261 built-in Convex metrics, plus the RAG and crawl timings | | ||
| | Path | Source | What's inside | | ||
| | -------------------- | --------------- | ----------------------------------------------------------------------------------- | | ||
| | `/metrics/platform` | `tale-platform` | HTTP latency, route counters, Node process metrics, response-time SLA target gauges | | ||
| | `/metrics/convex` | `tale-convex` | 261 built-in Convex metrics, plus the RAG and crawl timings | | ||
| | `/metrics/sla-rules` | `tale-platform` | Generated Prometheus recording + alerting rules for the response-time SLAs | | ||
|
|
||
| Knowledge work (RAG search, document ingestion, web crawling) runs inside the Convex backend now, so its timings ride the `/metrics/convex` series rather than a separate endpoint. Set `METRICS_BEARER_TOKEN` in `.env` to enable the two endpoints; leave it unset to keep them returning 401 to every request. Anything other than the two listed paths returns 401 too, so a misrouted scraper does not accidentally see the platform's internal health endpoints. | ||
| Knowledge work (RAG search, document ingestion, web crawling) runs inside the Convex backend now, so its timings ride the `/metrics/convex` series rather than a separate endpoint. Set `METRICS_BEARER_TOKEN` in `.env` to enable these endpoints; leave it unset to keep them returning 401 to every request. The `/metrics/sla-rules` path is a read-only YAML rules file you load into Prometheus, not a scrape target — the thresholds it carries are documented in [Operations](/self-hosted/operate/observability/operations). Anything other than the listed paths returns 401 too, so a misrouted scraper does not accidentally see the platform's internal health endpoints. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win Tighten the 401 scope here. The Caddy proxy only returns 401 for unknown 🤖 Prompt for AI Agents |
||
|
|
||
| A working Prometheus scrape stanza: | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,14 +19,15 @@ Tale ne ship pas de log shipper. L'échange de driver est le point d'intégratio | |
|
|
||
| ## Métriques | ||
|
|
||
| Le proxy Caddy expose deux chemins de métriques derrière un seul bearer token : | ||
| Le proxy Caddy expose trois chemins de métriques derrière un seul bearer token : | ||
|
|
||
| | Chemin | Source | Ce qui est dedans | | ||
| | ------------------- | --------------- | ---------------------------------------------------------------- | | ||
| | `/metrics/platform` | `tale-platform` | Latence HTTP, compteurs de routes, métriques de processus Node | | ||
| | `/metrics/convex` | `tale-convex` | 261 métriques Convex intégrées, plus les timings RAG et de crawl | | ||
| | Chemin | Source | Ce qui est dedans | | ||
| | -------------------- | --------------- | ------------------------------------------------------------------------------------------------------- | | ||
| | `/metrics/platform` | `tale-platform` | Latence HTTP, compteurs de routes, métriques de processus Node, gauges de cible SLA de temps de réponse | | ||
| | `/metrics/convex` | `tale-convex` | 261 métriques Convex intégrées, plus les timings RAG et de crawl | | ||
| | `/metrics/sla-rules` | `tale-platform` | Rules Prometheus de recording + alerting générées pour les SLA de temps de réponse | | ||
|
|
||
| Le travail de connaissances (recherche RAG, ingestion de documents, crawling web) tourne désormais dans le backend Convex, donc ses timings empruntent la série `/metrics/convex` plutôt qu'un endpoint séparé. Mets `METRICS_BEARER_TOKEN` dans `.env` pour activer les deux endpoints ; laisse-le non défini pour qu'ils retournent 401 à chaque requête. Tout sauf les deux chemins listés retourne aussi 401, donc un scraper mal routé ne voit pas accidentellement les endpoints de santé internes de la plateforme. | ||
| Le travail de connaissances (recherche RAG, ingestion de documents, crawling web) tourne désormais dans le backend Convex, donc ses timings empruntent la série `/metrics/convex` plutôt qu'un endpoint séparé. Mets `METRICS_BEARER_TOKEN` dans `.env` pour activer ces endpoints ; laisse-le non défini pour qu'ils retournent 401 à chaque requête. Le chemin `/metrics/sla-rules` est un fichier YAML de rules en lecture seule que tu charges dans Prometheus, pas une cible de scrape — les seuils qu'il porte sont documentés dans [Opérations](/fr/self-hosted/operate/observability/operations). Tout sauf les chemins listés retourne aussi 401, donc un scraper mal routé ne voit pas accidentellement les endpoints de santé internes de la plateforme. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win Tighten the 401 scope here. The Caddy proxy only returns 401 for unknown 🧰 Tools🪛 LanguageTool[typographical] ~30-~30: Il manque une espace après le point. (ESPACE_APRES_POINT) 🤖 Prompt for AI Agents |
||
|
|
||
| Une stanza de scrape Prometheus qui marche : | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Tighten the 401 scope here.
The Caddy proxy only returns 401 for unknown
/metrics/*URLs; non-metrics routes still use their normal handlers. Please narrow this wording so it doesn't read like a site-wide auth rule. Based on the Caddy matcher inservices/proxy/Caddyfile, this only applies to/metricspaths.🤖 Prompt for AI Agents