Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions docs/de/self-hosted/configuration/observability-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,15 @@ Tale bringt keinen Log-Shipper mit. Der Driver-Tausch ist der unterstützte Inte

## Metriken

Der Caddy-Proxy exponiert zwei Metric-Pfade, gegated von einem einzigen Bearer-Token:
Der Caddy-Proxy exponiert drei Metric-Pfade, gegated von einem einzigen Bearer-Token:

| Pfad | Quelle | Was drinsteckt |
| ------------------- | --------------- | --------------------------------------------------------------- |
| `/metrics/platform` | `tale-platform` | HTTP-Latenz, Route-Counter, Node-Prozessmetriken |
| `/metrics/convex` | `tale-convex` | 261 eingebaute Convex-Metriken, plus die RAG- und Crawl-Timings |
| Pfad | Quelle | Was drinsteckt |
| -------------------- | --------------- | ----------------------------------------------------------------------------- |
| `/metrics/platform` | `tale-platform` | HTTP-Latenz, Route-Counter, Node-Prozessmetriken, Antwortzeit-SLA-Ziel-Gauges |
| `/metrics/convex` | `tale-convex` | 261 eingebaute Convex-Metriken, plus die RAG- und Crawl-Timings |
| `/metrics/sla-rules` | `tale-platform` | Generierte Prometheus-Recording- + Alerting-Rules für die Antwortzeit-SLAs |

Wissens-Arbeit (RAG-Suche, Dokument-Ingestion, Web-Crawling) läuft jetzt im Convex-Backend, also reiten ihre Timings auf der `/metrics/convex`-Reihe statt auf einem separaten Endpoint. Setze `METRICS_BEARER_TOKEN` in `.env`, um die zwei Endpoints zu aktivieren; lass es unset, damit sie jeder Anfrage 401 zurückgeben. Alles ausser den zwei gelisteten Pfaden gibt ebenfalls 401 zurück, damit ein fehlgerouteter Scraper die internen Health-Endpoints der Plattform nicht versehentlich sieht.
Wissens-Arbeit (RAG-Suche, Dokument-Ingestion, Web-Crawling) läuft jetzt im Convex-Backend, also reiten ihre Timings auf der `/metrics/convex`-Reihe statt auf einem separaten Endpoint. Setze `METRICS_BEARER_TOKEN` in `.env`, um diese Endpoints zu aktivieren; lass es unset, damit sie jeder Anfrage 401 zurückgeben. Der `/metrics/sla-rules`-Pfad ist eine schreibgeschützte YAML-Rules-Datei, die du in Prometheus lädst, kein Scrape-Target — die Schwellen darin sind in [Operations](/de/self-hosted/operate/observability/operations) dokumentiert. Alles ausser den gelisteten Pfaden gibt ebenfalls 401 zurück, damit ein fehlgerouteter Scraper die internen Health-Endpoints der Plattform nicht versehentlich sieht.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Tighten the 401 scope here.

The Caddy proxy only returns 401 for unknown /metrics/* URLs; non-metrics routes still use their normal handlers. Please narrow this wording so it doesn't read like a site-wide auth rule. Based on the Caddy matcher in services/proxy/Caddyfile, this only applies to /metrics paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/de/self-hosted/configuration/observability-config.md` at line 30, The
statement about 401 responses is too broad and implies a site-wide
authentication rule. Clarify the wording of the sentence starting with "Alles
ausser den gelisteten Pfaden gibt ebenfalls 401 zurück" to specifically indicate
that this 401 behavior applies only to unknown `/metrics/*` paths based on the
Caddy proxy matcher in services/proxy/Caddyfile, not to all non-metrics routes.
Ensure the documentation makes clear that non-metrics routes continue to use
their normal handlers without being affected by this authentication rule.


Eine funktionierende Prometheus-Scrape-Stanza:

Expand Down
47 changes: 47 additions & 0 deletions docs/de/self-hosted/operate/observability/operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,53 @@ Wenn eine Page landet, folgen die ersten fünf Minuten jedes Mal derselben Form.

Ein `tale-knowledge-db`-Ausfall ist ein warn, kein page. Der Web-Crawl-Plan absorbiert Stunden von Downtime ohne Benutzerwirkung, und die Dokument-Ingestion versucht es erneut, statt Arbeit zu verwerfen — Uploads sitzen in „indexing", bis die Korpus-Datenbank zurück ist. Die Wissens-Suche liefert in der Zwischenzeit leer, aber Chats, die kein Wissen abrufen, arbeiten weiter. Fang das im warn-Band und fix es zu Geschäftszeiten.

## Antwortzeit-SLAs

Zwei Antwortzeit-Budgets werden als erstklassige Signale verfolgt: interaktive Dialog-Eingabe und langlaufende Operationen wie Evaluierungen. Beide werden als **Mittelwert** über ein gleitendes Fenster verifiziert — die vertragliche Zahl ist ein Durchschnitt, keine Obergrenze pro Anfrage — und beide sind so verdrahtet, dass Prometheus alarmiert, sobald der Durchschnitt über das Budget driftet.

| Budget | Statistik | Ziel | Fenster | Zugrundeliegende Serie |
| --------------- | ---------- | ----- | ------- | ----------------------------- |
| Dialog-Eingabe | Mittelwert | ~1 s | 30 Min | `tale_dialog_ttft_seconds` |
| Lange Operation | Mittelwert | ~40 s | 6 Std | `tale_long_operation_seconds` |

Jedes Ziel reitet zudem auf dem Plattform-Metrik-Endpoint als `tale_sla_target_seconds{sla,statistic}`, sodass ein Grafana-Panel die Budget-Linie direkt aus Prometheus zeichnet, statt sie fest zu verdrahten. Die zugrundeliegenden Latenz-Serien sind die Convex-Funktions-Ausführungs-Histogramme auf `/metrics/convex`; relabel oder record sie auf die Namen oben, damit die Rules auflösen. Die Plattform liefert die fertigen Recording- und Alerting-Rules unter `/metrics/sla-rules` (hinter demselben Bearer-Token wie die anderen Metrik-Pfade) — hole sie einmal und referenziere die Datei unter `rule_files:`, oder füge das Äquivalent ein:

```yaml
groups:
- name: tale-sla-recording
rules:
- record: tale_sla_dialog_ttft:mean30m
expr: rate(tale_dialog_ttft_seconds_sum[30m]) / rate(tale_dialog_ttft_seconds_count[30m])
labels:
sla: dialog_ttft
- record: tale_sla_long_operation:mean6h
expr: rate(tale_long_operation_seconds_sum[6h]) / rate(tale_long_operation_seconds_count[6h])
labels:
sla: long_operation
- name: tale-sla-alerts
rules:
- alert: TaleSlaDialogTtftBreached
expr: tale_sla_dialog_ttft:mean30m > 1
for: 15m
labels:
severity: warn
sla: dialog_ttft
annotations:
summary: 'Dialog input response time: mean response time over 30m exceeds the 1s SLA'
description: Mean time-to-first-token for an interactive chat / dialog turn.
- alert: TaleSlaLongOperationBreached
expr: tale_sla_long_operation:mean6h > 40
for: 30m
labels:
severity: warn
sla: long_operation
annotations:
summary: 'Long operation response time: mean response time over 6h exceeds the 40s SLA'
description: Mean end-to-end time for long-running operations such as evaluations.
```

Ein Breach hier ist ein **warn**, kein page: ein driftender Durchschnitt ist eine Degradation, die zu Geschäftszeiten zu verfolgen ist, und die `for:`-Fenster warten bewusst eine kurze Spitze aus, bevor sie feuern. Das ~1-s-Dialog-Budget versöhnt sich mit dem lockereren ~3-s-Warm-Time-to-First-Token im manuellen Performance-Plan — jene ~3 s sind eine Obergrenze pro Anfrage für ein einzelnes kaltes, Auto-geroutetes erstes Token inklusive Modell- und Netzwerk-Zeit, während die ~1 s hier der Steady-State-Mittelwert über Dialog-Turns ist, sodass gelegentliche erste Tokens, die die Obergrenze erreichen, mit einem Sub-Sekunden-Mittelwert vereinbar sind. Den 1-s-Mittelwert auf Live-Anbietern zu halten, kann noch die Backend-Overhead-Optimierung brauchen, die im Feature-Issue verfolgt wird; dieser Alert bestätigt, ob das Ziel erreicht ist.

## Wo das hingehört

Die Signale oben sind die proaktive Seite des Betreibens einer Tale-Instanz; die reaktive Seite ist [Troubleshooting](/de/self-hosted/operate/observability/troubleshooting), und die Konfiguration, die die Metriken in Prometheus bekommt, ist [Observability-Konfiguration](/de/self-hosted/configuration/observability-config). Hast du `METRICS_BEARER_TOKEN` noch nicht gesetzt, ist jede Schwelle oben unbeobachtet — fang dort an.
13 changes: 7 additions & 6 deletions docs/en/self-hosted/configuration/observability-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,15 @@ Tale does not ship a log shipper. The driver swap is the supported integration p

## Metrics

The Caddy proxy exposes two metrics paths gated by a single bearer token:
The Caddy proxy exposes three metrics paths gated by a single bearer token:

| Path | Source | What's inside |
| ------------------- | --------------- | ----------------------------------------------------------- |
| `/metrics/platform` | `tale-platform` | HTTP latency, route counters, Node process metrics |
| `/metrics/convex` | `tale-convex` | 261 built-in Convex metrics, plus the RAG and crawl timings |
| Path | Source | What's inside |
| -------------------- | --------------- | ----------------------------------------------------------------------------------- |
| `/metrics/platform` | `tale-platform` | HTTP latency, route counters, Node process metrics, response-time SLA target gauges |
| `/metrics/convex` | `tale-convex` | 261 built-in Convex metrics, plus the RAG and crawl timings |
| `/metrics/sla-rules` | `tale-platform` | Generated Prometheus recording + alerting rules for the response-time SLAs |

Knowledge work (RAG search, document ingestion, web crawling) runs inside the Convex backend now, so its timings ride the `/metrics/convex` series rather than a separate endpoint. Set `METRICS_BEARER_TOKEN` in `.env` to enable the two endpoints; leave it unset to keep them returning 401 to every request. Anything other than the two listed paths returns 401 too, so a misrouted scraper does not accidentally see the platform's internal health endpoints.
Knowledge work (RAG search, document ingestion, web crawling) runs inside the Convex backend now, so its timings ride the `/metrics/convex` series rather than a separate endpoint. Set `METRICS_BEARER_TOKEN` in `.env` to enable these endpoints; leave it unset to keep them returning 401 to every request. The `/metrics/sla-rules` path is a read-only YAML rules file you load into Prometheus, not a scrape target — the thresholds it carries are documented in [Operations](/self-hosted/operate/observability/operations). Anything other than the listed paths returns 401 too, so a misrouted scraper does not accidentally see the platform's internal health endpoints.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Tighten the 401 scope here.

The Caddy proxy only returns 401 for unknown /metrics/* URLs; non-metrics routes still use their normal handlers. Please narrow this wording so it doesn't read like a site-wide auth rule. Based on the Caddy matcher in services/proxy/Caddyfile, this only applies to /metrics paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/en/self-hosted/configuration/observability-config.md` at line 30, The
documentation statement about returning 401 for unlisted paths is too broad and
misleading. Clarify that the 401 response only applies to unknown paths under
the `/metrics/*` endpoint, not to non-metrics routes across the entire site.
Revise the sentence that currently reads "Anything other than the listed paths
returns 401 too..." to explicitly specify that this behavior is limited to
`/metrics` paths, and that non-metrics routes continue to use their normal
handlers as determined by the Caddy proxy matcher.


A working Prometheus scrape stanza:

Expand Down
47 changes: 47 additions & 0 deletions docs/en/self-hosted/operate/observability/operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,53 @@ When a page lands, the first five minutes follow the same shape every time.

A `tale-knowledge-db` outage is a warn, not a page. The web-crawl schedule absorbs hours of downtime without user impact, and document ingestion retries rather than dropping work — uploads sit in "indexing" until the corpus database is back. Knowledge search returns empty in the meantime, but chats that do not retrieve knowledge keep working. Catch this in the warn band and fix it in business hours.

## Response-time SLAs

Two response-time budgets are tracked as first-class signals: interactive dialog input and long-running operations such as evaluations. Both are verified as a **mean** over a rolling window — the contractual figure is an average, not a per-request ceiling — and both are wired so Prometheus alerts the moment the average drifts past budget.

| Budget | Statistic | Target | Window | Underlying series |
| -------------- | --------- | ------ | ------ | ----------------------------- |
| Dialog input | mean | ~1 s | 30 m | `tale_dialog_ttft_seconds` |
| Long operation | mean | ~40 s | 6 h | `tale_long_operation_seconds` |

Each target also rides the platform metrics endpoint as `tale_sla_target_seconds{sla,statistic}`, so a Grafana panel draws the budget line straight from Prometheus instead of hard-coding it. The underlying latency series are the Convex function-execution histograms on `/metrics/convex`; relabel or record them to the names above so the rules resolve. The platform serves the ready-made recording and alerting rules at `/metrics/sla-rules` (behind the same bearer token as the other metrics paths) — fetch it once and reference the file under `rule_files:`, or paste the equivalent:

```yaml
groups:
- name: tale-sla-recording
rules:
- record: tale_sla_dialog_ttft:mean30m
expr: rate(tale_dialog_ttft_seconds_sum[30m]) / rate(tale_dialog_ttft_seconds_count[30m])
labels:
sla: dialog_ttft
- record: tale_sla_long_operation:mean6h
expr: rate(tale_long_operation_seconds_sum[6h]) / rate(tale_long_operation_seconds_count[6h])
labels:
sla: long_operation
- name: tale-sla-alerts
rules:
- alert: TaleSlaDialogTtftBreached
expr: tale_sla_dialog_ttft:mean30m > 1
for: 15m
labels:
severity: warn
sla: dialog_ttft
annotations:
summary: 'Dialog input response time: mean response time over 30m exceeds the 1s SLA'
description: Mean time-to-first-token for an interactive chat / dialog turn.
- alert: TaleSlaLongOperationBreached
expr: tale_sla_long_operation:mean6h > 40
for: 30m
labels:
severity: warn
sla: long_operation
annotations:
summary: 'Long operation response time: mean response time over 6h exceeds the 40s SLA'
description: Mean end-to-end time for long-running operations such as evaluations.
```

A breach here is a **warn**, not a page: a drifting average is a degradation to chase in business hours, and the `for:` windows deliberately wait out a short spike before firing. The ~1 s dialog budget reconciles with the looser ~3 s warm time-to-first-token in the manual performance plan — that ~3 s is a per-request ceiling for a single cold, Auto-routed first token including model and network time, whereas the ~1 s here is the steady-state mean across dialog turns, so occasional first tokens reaching the ceiling are consistent with a sub-second mean. Holding the 1 s mean on live providers may still need the backend-overhead optimization tracked on the feature issue; this alert is what confirms whether the target is met.

## Where this fits

The signals above are the proactive side of operating a Tale instance; the reactive side is [Troubleshooting](/self-hosted/operate/observability/troubleshooting), and the configuration that gets the metrics into Prometheus is [Observability config](/self-hosted/configuration/observability-config). If you have not yet set `METRICS_BEARER_TOKEN`, every threshold above is unmonitored — start there.
13 changes: 7 additions & 6 deletions docs/fr/self-hosted/configuration/observability-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,15 @@ Tale ne ship pas de log shipper. L'échange de driver est le point d'intégratio

## Métriques

Le proxy Caddy expose deux chemins de métriques derrière un seul bearer token :
Le proxy Caddy expose trois chemins de métriques derrière un seul bearer token :

| Chemin | Source | Ce qui est dedans |
| ------------------- | --------------- | ---------------------------------------------------------------- |
| `/metrics/platform` | `tale-platform` | Latence HTTP, compteurs de routes, métriques de processus Node |
| `/metrics/convex` | `tale-convex` | 261 métriques Convex intégrées, plus les timings RAG et de crawl |
| Chemin | Source | Ce qui est dedans |
| -------------------- | --------------- | ------------------------------------------------------------------------------------------------------- |
| `/metrics/platform` | `tale-platform` | Latence HTTP, compteurs de routes, métriques de processus Node, gauges de cible SLA de temps de réponse |
| `/metrics/convex` | `tale-convex` | 261 métriques Convex intégrées, plus les timings RAG et de crawl |
| `/metrics/sla-rules` | `tale-platform` | Rules Prometheus de recording + alerting générées pour les SLA de temps de réponse |

Le travail de connaissances (recherche RAG, ingestion de documents, crawling web) tourne désormais dans le backend Convex, donc ses timings empruntent la série `/metrics/convex` plutôt qu'un endpoint séparé. Mets `METRICS_BEARER_TOKEN` dans `.env` pour activer les deux endpoints ; laisse-le non défini pour qu'ils retournent 401 à chaque requête. Tout sauf les deux chemins listés retourne aussi 401, donc un scraper mal routé ne voit pas accidentellement les endpoints de santé internes de la plateforme.
Le travail de connaissances (recherche RAG, ingestion de documents, crawling web) tourne désormais dans le backend Convex, donc ses timings empruntent la série `/metrics/convex` plutôt qu'un endpoint séparé. Mets `METRICS_BEARER_TOKEN` dans `.env` pour activer ces endpoints ; laisse-le non défini pour qu'ils retournent 401 à chaque requête. Le chemin `/metrics/sla-rules` est un fichier YAML de rules en lecture seule que tu charges dans Prometheus, pas une cible de scrape — les seuils qu'il porte sont documentés dans [Opérations](/fr/self-hosted/operate/observability/operations). Tout sauf les chemins listés retourne aussi 401, donc un scraper mal routé ne voit pas accidentellement les endpoints de santé internes de la plateforme.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Tighten the 401 scope here.

The Caddy proxy only returns 401 for unknown /metrics/* URLs; non-metrics routes still use their normal handlers. Please narrow this wording so it doesn't read like a site-wide auth rule. Based on the Caddy matcher in services/proxy/Caddyfile, this only applies to /metrics paths.

🧰 Tools
🪛 LanguageTool

[typographical] ~30-~30: Il manque une espace après le point.
Context: ...éparé. Mets METRICS_BEARER_TOKEN dans .env pour activer ces endpoints ; laisse-le...

(ESPACE_APRES_POINT)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/fr/self-hosted/configuration/observability-config.md` at line 30, The
current documentation states that all paths except those listed return 401,
which reads like a site-wide authentication rule. Revise this sentence to
clarify that 401 responses apply only to `/metrics` paths, not the entire site.
Tighten the wording to explicitly state that unknown `/metrics/*` URLs return
401 while non-metrics routes continue to use their normal handlers, making it
clear this auth restriction is scoped only to the metrics endpoints as
implemented in the Caddy proxy configuration.


Une stanza de scrape Prometheus qui marche :

Expand Down
Loading