Skip to content

Miccolomi/mongot-doctor

Repository files navigation

🇮🇹 Documentazione in italiano disponibile: README-it.md

🔬 MongoDB Search Diagnostics

Before: raw Prometheus metrics, scattered kubectl logs, opaque index definitions, no clear picture of what's wrong. After: one dashboard, one health status, one list of actions.

mongot-doctor transforms complex MongoDB Search cluster data into instant diagnosis — designed for SRE, MongoDB operators, and platform engineers running MongoDB Search on Kubernetes.

Dashboard Screenshot


What does it do?

  • Detects stuck search nodes, indexing lag, OOMKilled events, and configuration drift
  • Analyzes search query efficiency, scan ratios, and HNSW graph traversal in real time
  • Alerts you before problems become outages — predictive oplog window, cardinality warnings, stall detection
  • Built-in SRE Advisor runs 15 automated checks every collection cycle and ranks findings by severity
  • Automatic Search Diagnosis interprets cluster health instantly — Health Summary, Warnings, Recommendations in one panel
  • Log Intelligence parses mongot JSON logs automatically and detects errors, failures, and connection issues across configurable time windows
  • Search Index Inspector analyzes every Search index definition — mapping quality, field count, dynamic mapping overuse, and index health — with actionable suggestions
  • Status Report exports a full cluster snapshot in Text, Markdown, or JSON — shareable from the dashboard or from the CLI, ready for tickets, runbooks, and automation

No agents to install. No extra infrastructure. Just point it at your cluster and go.


Tip

Get a full diagnostic in one command

python3 mongot_doctor.py --uri "mongodb://..." --namespace mongodb --report

Prints a complete cluster snapshot — pods, search metrics, JVM heap, Lucene merges, oplog window, SRE findings, and index health — straight to your terminal. Add --format markdown or --format json to export to Confluence, GitHub Issues, or your alerting pipeline.


📋 Table of Contents


✨ Key Features

  • 🧠 SRE Advisor — 15 automated checks, severity-ranked (crit → warn → pass), served via /api/advisorsee deep dive below
  • 📡 Real-time Search QPS & Latency — delta-based computation across Prometheus scrape cycles, separate for $search and $vectorSearch
  • 🎯 Search Efficiency (Scan Ratio) — EMA-smoothed candidates_examined / results_returned, separate ratio for text and vector search, with cardinality detection
  • 🧬 HNSW Visited Nodes — early warning for ANN CPU saturation before latency becomes visible
  • Index Build ETA — animated progress bar, docs/sec speed, stall detection, dynamic ETA
  • 🔍 Robust Pod Discovery — 4-level hierarchy resilient to MCK upgrades and naming variations
  • 🌊 Sync Pipeline Analyzer — real-time DB → Change Stream → RAM → Lucene pipeline visualization with bottleneck identification
  • ⏱️ Predictive Oplog Window — warn at 40%, crit at 70% window consumed to prevent forced Initial Sync
  • 🩺 Universal K8s Diagnostics — Helm releases, MCK/K8s versions, PVCs, OOMKilled events, live log streaming
  • 📜 Log Management & Export — live terminal, download filtered by time window and severity
  • Background Collector & Rate Engine — daemon thread, < 1ms API response from in-memory cache, counter-reset safe
  • 🔌 Stable Versioned API/api/v1/search_metrics with fixed schema, safe for external consumers
  • 🔒 Security — optional Basic Auth, CSP headers, K8s name input validation, configurable CORS
  • 🩻 Automatic Search Diagnosis — real-time cluster health panel: Health Summary / Warnings / Recommendations; also available via /api/diagnose and --diagnose CLI (exit 0/1/2 for CI pipelines)
  • 🪵 Log Intelligence — on-demand mongot JSON log analysis with configurable time window (1h / 24h / 7d / 30d); detects errors, OOM, TLS/auth issues, connection failures, index failures, change stream problems
  • 🔎 Search Index Inspector — inspects every Search index definition: dynamic mapping detection, field count analysis, BUILDING/FAILED status, over-indexed collections; available via /api/indexes/inspect and --inspect-indexes CLI
  • 📋 Status Report — full cluster snapshot in Text (ASCII), Markdown, or JSON; one-click download and copy from the dashboard; --report CLI flag for CI/automation pipelines

🚀 Installation & Setup

Prerequisites: kubectl configured and pointing to your cluster. A MongoDB connection string with read access on local (oplog) and your target collections.

Prometheus required: mongot-doctor reads mongot metrics via Prometheus. Prometheus is not enabled by default — you must explicitly configure it in your Kubernetes operator:

  • MongoDB Enterprise Operator (MCK): enable the spec.prometheus section in your MongoDB resource — Enterprise guide
  • MongoDB Community Operator: enable the spec.prometheus section in your MongoDBCommunity resource — Community guide

Mode 1 — Local (Mac / PC)

Use this mode for development, demos, or when you prefer running the monitor outside the cluster.

1. Clone and install

git clone https://github.com/Miccolomi/mongot-doctor.git
cd mongot-doctor
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Start

python3 mongot_doctor.py \
  --uri "mongodb://USER:PASSWORD@HOST:PORT/admin?replicaSet=RS&authSource=admin&authMechanism=SCRAM-SHA-256" \
  --namespace mongodb \
  --port 5050

Open your browser at: http://localhost:5050

CLI options

Parameter Default Description
--uri MongoDB connection string
--namespace all Kubernetes namespace to monitor
--port 5050 HTTP port for the dashboard
--interval 5 Collection interval in seconds
--auth Basic Auth — format user:password
--in-cluster false K8s auth via ServiceAccount (in-cluster only)
--host 0.0.0.0 Flask binding address
--allowed-origins localhost CORS allowed origins (space-separated)

Mode 2 — Kubernetes (in-cluster)

Use this mode for a permanent deployment inside the cluster. The monitor runs as a pod and uses a ServiceAccount with RBAC to access the Kubernetes API.

1. Build the Docker image

docker build -t mongot-doctor:latest .

For a private registry (Docker Hub, ECR, GCR):

docker build -t <your-registry>/mongot-doctor:1.0.0 .
docker push <your-registry>/mongot-doctor:1.0.0

Update the image: field in k8s/deployment.yaml accordingly.

⚠️ Important: after every code update, rebuild and restart the deployment:

docker build -t mongot-doctor:latest .
kubectl rollout restart deployment/mongot-doctor -n mongodb

2. Configure the MongoDB URI

The connection to mongod is required for oplog, index, and compliance checks. mongot is always discovered automatically via Kubernetes — no URI needed for it.

Edit k8s/secret.yaml based on where your mongod is running:

# Scenario A — mongod inside the cluster (MCK): use the internal Service DNS
kubectl get svc -n mongodb   # look for a ClusterIP on port 27017
# Scenario A — in-cluster (MCK)
stringData:
  MONGODB_URI: "mongodb://USER:PASSWORD@<rs-name>-svc.<namespace>.svc.cluster.local/admin?replicaSet=<RS>&tls=true&tlsAllowInvalidCertificates=true&authSource=admin&authMechanism=SCRAM-SHA-256"

# Scenario B — Atlas (SRV)
# MONGODB_URI: "mongodb+srv://USER:PASSWORD@cluster0.xxxxx.mongodb.net/admin?authSource=admin&authMechanism=SCRAM-SHA-256"

# Scenario C — External replica set with DNS-resolvable hostnames
# MONGODB_URI: "mongodb://USER:PASSWORD@host1:27017,host2:27017/admin?replicaSet=RS&tls=true&authSource=admin&authMechanism=SCRAM-SHA-256"

authMechanism=SCRAM-SHA-256 is required by MongoDB 7+ with MCK.

3. Apply manifests

kubectl apply -f k8s/rbac.yaml        # ServiceAccount + ClusterRole
kubectl apply -f k8s/secret.yaml      # MongoDB URI
kubectl apply -f k8s/deployment.yaml  # Deployment
kubectl apply -f k8s/service.yaml     # NodePort
File Description
k8s/rbac.yaml ServiceAccount + ClusterRole with minimal permissions (includes pods/proxy)
k8s/secret.yaml MongoDB URI as a K8s Secret
k8s/deployment.yaml Deployment with liveness and readiness probes on /healthz
k8s/service.yaml NodePort Service to expose the dashboard

Namespace: all manifests default to mongodb. Update namespace: in all 4 files if yours is different.

4. Access the dashboard

kubectl get svc mongot-doctor -n mongodb
# Example: 5050:31855/TCP  →  NodePort = 31855
  • Docker Desktop: http://localhost:<NODE_PORT>
  • Remote cluster (GKE, EKS, on-prem): http://<NODE_IP>:<NODE_PORT> (see kubectl get nodes -o wide)

On Docker Desktop with MCK, the internal DNS (<rs>-svc.mongodb.svc.cluster.local) is reachable directly from the pod. Do not use hostnames from the host's /etc/hosts — they are not resolvable from inside the cluster.


🔌 API Endpoints

Endpoint Description
/ HTML Dashboard
/metrics Full JSON snapshot (from cache)
/api/v1/search_metrics Stable versioned API — fixed schema for external consumers
/api/advisor SRE findings in JSON (crit → warn → pass)
/healthz Liveness probe — always returns 200 if Flask is running
/healthcheck Detailed status (MongoDB ping, K8s API, cache age)
/api/logs/<ns>/<pod> Last 50 lines of pod logs
/api/download_logs/<ns>/<pod> Download logs (?time=1h&level=error)
/api/diagnose Structured diagnosis: health, warnings, recommendations
/api/logs/analyze/<ns>/<pod> Log Intelligence — pattern analysis (?window=1h|24h|7d|30d)
/api/indexes/inspect Search Index Inspector — mapping quality and health report
/api/report?format=text|markdown|json Status Report — full cluster snapshot in the requested format

🏗️ Project Structure

mongot_doctor.py        # App Factory + CLI entry point
background.py            # BackgroundCollector (thin orchestrator, daemon thread)
advisor.py               # SRE Advisor engine (15 checks, pure Python)
report.py                # Status Report builder (Text / Markdown / JSON formatters)
security.py              # Input validation, security headers, Basic Auth
state.py                 # Shared mutable state (clients, cache, lock)

engine/
  rate_calculator.py     # Delta/rate engine: QPS, latency, scan ratio EMA, HNSW, ETA
                         # Counter reset safety, spike guard, first-cycle protection

collectors/
  kubernetes.py          # K8s discovery (pods, CRDs, PVCs, services, helm)
  mongodb.py             # MongoDB collectors (vitals, oplog, indexes)
  prometheus.py          # Prometheus scraper with dual fallback
  index_inspector.py     # Search Index Inspector (mapping analysis, observation engine)
  log_analyzer.py        # Log Intelligence (JSON log parsing, 8 pattern matchers)

routes/
  api.py                 # API Blueprint (/metrics, /api/v1/search_metrics, /api/advisor, /api/logs)
  frontend.py            # Frontend Blueprint (/, /favicon.ico)

frontend/
  templates/
    dashboard.html       # Jinja2 template
  static/
    css/main.css
    js/
      utils.js           # Utilities (formatBytes, pill, gaugeRing, …)
      logs.js            # Live log management
      advisor.js         # Advisor renderer + Log Intelligence
      pipeline.js        # Sync Pipeline Analyzer
      index_inspector.js # Search Index Inspector panel
      report.js          # Status Report modal (tabs, copy, download)
      render.js          # Main renderer + polling

tests/
  conftest.py
  test_advisor.py        # tests — every SRE check
  test_background.py     # tests — collector and cache
  test_frontend.py       # tests — dashboard, CSS, JS, API
  test_security.py       # tests — validation, headers, auth

🧪 Running Tests

source venv/bin/activate
python3 -m pytest tests/ -v

🔬 SRE Advisor — Deep Dive

Every collection cycle runs a set of Python checks against the cluster and index state. Findings are sorted by severity (crit → warn → pass) and served via /api/advisor.

Checks overview

# Check Thresholds
1 Disk Space (200% Rule) warn if free < 200% of used; crit if disk ≥ 90% (mongot enters read-only)
2 Index Consolidation warn if more than one index of the same type on the same collection (fullText + vectorSearch is valid: Hybrid Search)
3 I/O Bottleneck crit if disk queue > 10 AND lag > 5s simultaneously
4 CPU & QPS crit if CPU > 80%; warn if QPS > 10 × cores
5 Memory Starvation (Page Faults) warn > 500/s; crit > 1000/s
6 OOMKilled & MMap Risk crit if JVM heap ≥ 90% of pod limit or OOMKilled detected
7 CRD Operator Status crit if CRD is not in Running phase
8 Storage Class Performance warn if PVC uses standard, hostpath, or slow
9 Operator Versioning warn if operator image uses :latest tag
10 Predictive Oplog Window warn > 40% consumed; crit > 70% consumed — prevents forced Initial Sync
11 Search Auth crit if skipAuthenticationToSearchIndexManagementServer=true — mongod↔mongot without authentication
12 Search TLS Mode crit if searchTLSMode=disabled; warn if allowTLS/preferTLS; pass if requireTLS
13 Search Efficiency (Scan Ratio) warn > 50:1; crit > 500:1; predictive warning if high ratio + low latency (cardinality problem)
14 Vector Search Efficiency same thresholds as scan ratio but computed separately for $vectorSearch
15 HNSW Visited Nodes warn > 1000 nodes/query; crit > 5000 — early warning for ANN CPU saturation

📡 Search QPS & Real-Time Latency

The 🔎 Search Commands panel shows throughput metrics computed as deltas between successive Prometheus scrape cycles:

  • $search QPS and $vectorSearch QPS displayed prominently (requests/second)
  • Average latency computed as Δlatency_sum / Δcount — actual per-query latency, not a peak
  • Max latency — historical peak from the Prometheus counter
  • Failure counters for $search and $vectorSearch

QPS values activate from the second collection cycle onward (a time delta is required).

🎯 Search Efficiency — Scan Ratio (EMA-smoothed)

scan_ratio = candidates_examined / results_returned is the true indicator of search query efficiency. Latency alone is not enough: a 50ms query with 200k candidates examined will become a timeout as the dataset grows.

Two separate ratios are computed: one for $search (mongot_query_candidates_examined_total with fallback to mongot_query_documents_scanned) and one dedicated for $vectorSearch (mongot_vector_query_candidates_examined_total).

To avoid false positives under low traffic (e.g. 1 result / 500 candidates from a single query), the ratio is EMA-smoothed (α = 0.3) with a guard: if Δresults < 10 the EMA is not updated.

Ratio Meaning
< 5 Excellent — highly selective index
5 – 50 Normal
50 – 500 Inefficient query — review index or analyzer
> 500 Critical — index or query is seriously problematic

Predictive cardinality detection: if scan_ratio > 50 but latency < 100ms, the Advisor emits a warning — the index is non-selective but the dataset is still small enough to hide the cost. This signal is not provided by Ops Manager.

Zero-results anti-pattern: if results_returned = 0 but candidates_examined > 0, a specific warning is raised. Common causes: post-search $match too restrictive, scoring threshold too high, misconfigured pipeline.

🧬 HNSW Visited Nodes — Early Warning CPU Saturation

mongot_vector_search_hnsw_visited_nodes (fallback: mongot_vector_search_graph_nodes_visited) measures how many nodes in the HNSW graph are traversed per $vectorSearch query. It is an early warning for CPU saturation: load increases before latency becomes visible.

Visited nodes Meaning
< 200 Excellent
200 – 1000 Normal
> 1000 Costly query — monitor CPU
> 5000 ANN inefficient — CPU saturation imminent

High values indicate ANN is degrading toward brute-force, typically due to excessive efSearch, poor graph connectivity, or oversized embedding dimensions. The check is optional: skipped if the metric is not exposed by the installed mongot version.

⏳ Index Build ETA

During an Initial Sync or bulk index build, a dedicated "⚙️ Index Build in Progress" panel appears with:

  • Animated progress bar — green > 75%, orange < 75%, red if stalled
  • Document counter — processed / total with percentage
  • Speed in docs/sec (computed as a delta between collection cycles)
  • Dynamic ETA in h/m/s format or "INDEX BUILD STALLED" warning if speed drops below 100 docs/s for at least 30 seconds

The panel is only shown while an Initial Sync is active (initial_sync_in_progress > 0).

🔍 Robust Pod Discovery (4-level hierarchy)

Pod discovery uses a hierarchy resilient to rolling upgrades, scaling events, and naming variations across MCK versions:

  1. Official MCK label app.kubernetes.io/component=search — most reliable
  2. Container name mongot — stable fallback across MCK versions
  3. Container image — contains mongodb-enterprise-search or mongot
  4. Pod name (last resort) — heuristic, excludes mongod and monitor

The monitor pod itself is always excluded via app: mongot-doctor.

⚡ Background Collector & Rate Engine

Data collection runs on a separate daemon thread at a configurable interval. The /metrics endpoint always responds in < 1ms from the in-memory cache — the dashboard never blocks on external calls.

All delta/rate computation logic is isolated in engine/rate_calculator.py, separated from the collection loop:

  • background.py is a thin orchestrator: scrape → compute_pod_rates() → cache update
  • engine/rate_calculator.py contains QPS, average latency, scan ratio EMA, HNSW, ETA — independently testable
  • Counter reset safety: _safe_delta() returns None on negative delta (counter reset after mongot pod restart); spike guard discards QPS > 50,000/s; first cycle (last_s=None) skips all computation silently — no spurious spikes on startup

🔌 Stable API (/api/v1/search_metrics)

Versioned JSON endpoint with a fixed schema, decoupled from internal Prometheus metric names:

{
  "schema_version": "1",
  "timestamp": "...",
  "collect_ms": 42,
  "pods": {
    "mongot-pod-0": {
      "pod":        { "namespace", "node", "phase", "all_ready", "total_restarts" },
      "qps":        { "search": 1.5, "vectorsearch": 0.3 },
      "latency_sec":{ "search_avg", "search_max", "vectorsearch_avg", "vectorsearch_max" },
      "failures":   { "search": 0, "vectorsearch": 0 },
      "efficiency": { "search_scan_ratio", "vectorsearch_scan_ratio", "hnsw_visited_nodes", "zero_results_with_candidates" },
      "indexing":   { "replication_lag_sec", "initial_sync_active", "updates_per_sec", "eta" }
    }
  }
}

Safe for external consumers (CI performance gates, Grafana dashboards, alerting tools) — the backend can evolve without breaking the API contract.


🩻 Automatic Search Diagnosis

Every collection cycle, the diagnosis engine interprets the full cluster state and presents it in three columns directly in the dashboard:

  • Health Summary — all passing checks listed as
  • Warnings & Critical — failing checks with detail message
  • Recommendations — actionable next steps derived from each finding

The health status (HEALTHY / DEGRADED / CRITICAL) is immediately visible at the top of the panel.

API

GET /api/diagnose
{
  "health": "degraded",
  "summary": { "pass": 12, "warn": 2, "crit": 1 },
  "critical": [{ "title": "OOMKilled & MMap Risk", "detail": "..." }],
  "warnings":  [{ "title": "Disk Space (200% Rule)", "detail": "..." }],
  "healthy":   [{ "title": "CRD Operator Status" }, ...],
  "recommendations": ["Increase memory limit...", "Check disk usage..."]
}

CLI

Run a single diagnostic cycle and exit — useful in CI/CD pipelines:

python3 mongot_doctor.py --diagnose \
  --uri "mongodb://..." --namespace mongodb

Exit codes: 0 = healthy, 1 = degraded, 2 = critical.


🪵 Log Intelligence

On-demand analysis of mongot JSON logs directly from the dashboard. Parses the structured log format ({"t":..., "s":..., "n":..., "msg":..., "attr":...}) and detects known failure patterns.

Configurable time window

Window Description
1h Last hour — quick triage
24h Last 24 hours — default
7d Last 7 days — trend analysis
30d Last 30 days — long-term issues

Up to 2,000 JSON lines are analyzed per request (memory guard).

Detected patterns

Pattern Severity Detection
Out of Memory 🔴 crit OutOfMemoryError in msg or attr
Errors & Fatals 🔴 crit s == "ERROR" or "FATAL"
TLS / Auth Issues 🔴 crit ssl/tls/auth/certificate in msg + ERROR/WARN
MongoDB Connection Issues 🟡 warn org.mongodb.driver class + Exception/Removing server
Index Failures 🟡 warn index/lucene class + fail/corrupt/invalid
Replication / Change Stream 🟡 warn changestream class + lag/timeout/fail
Initial Sync Activity 🔵 info initialsync class
General Warnings 🟡 warn s == "WARN"

API

GET /api/logs/analyze/<namespace>/<pod>?window=24h
{
  "pod": "my-replica-set-search-0",
  "window": "24h",
  "lines_analyzed": 350,
  "findings": [
    {
      "id": "errors",
      "name": "Errors & Fatals",
      "severity": "crit",
      "count": 3,
      "description": "ERROR or FATAL log entries detected...",
      "examples": ["[2026-03-05T14:09:07] Connection refused — ..."]
    }
  ]
}

🔎 Search Index Inspector

Many teams create Search indexes without fully understanding their cost: dynamic mapping that indexes every field, stale BUILDING indexes, oversized explicit mappings with dozens of unused fields. mongot-doctor analyzes every index definition automatically and tells you exactly what to fix.

What it checks

Check Severity Condition
FAILED state 🔴 crit Index is in FAILED status
Not queryable 🔴 crit queryable: false — index not serving queries
Dynamic mapping 🟡 warn mappings.dynamic: true — every document field is indexed
BUILDING state 🟡 warn Index still building — queries may not return full results
Empty mapping 🟡 warn dynamic: false and zero fields mapped — index returns nothing
Large static mapping 🟡 warn More than 20 explicit fields — review unused ones
Over-indexed collection 🟡 warn More than 3 Search indexes on the same collection

Why dynamic mapping matters

With dynamic: true, mongot indexes every field in every document. This is convenient during development, but in production it causes:

  • Index size far exceeding the actual data size (observed 10–50x in real clusters)
  • Longer indexing lag — more fields to process per write
  • Higher JVM heap pressure — more Lucene segments to manage
  • Opaque resource consumption — teams don't know what they're actually indexing

The inspector detects this and suggests migrating to a static mapping with only the fields used in search queries.

API

GET /api/indexes/inspect
{
  "summary": {
    "total_indexes": 4,
    "clean": 2,
    "warns": 2,
    "crits": 0,
    "health": "degraded"
  },
  "indexes": [
    {
      "ns": "mydb.products",
      "name": "default",
      "type": "fullText",
      "status": "READY",
      "queryable": true,
      "num_docs": 125432,
      "mapping_dynamic": true,
      "field_count": 0,
      "observations": [
        {
          "level": "warn",
          "msg": "Dynamic mapping enabled — every document field is indexed",
          "suggestion": "Restrict mapping to specific fields to reduce index size and improve performance"
        }
      ]
    }
  ]
}

CLI

Run a full inspection from the terminal — no dashboard needed:

python3 mongot_doctor.py --uri "mongodb://..." --inspect-indexes

Example output:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  MongoDB Search — Index Inspector
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Collection: mydb.products
  Index: default  [fullText]  READY
  Docs: 125,432
  Mapping: dynamic ⚠
  ⚠ Dynamic mapping enabled — every document field is indexed
    → Restrict mapping to specific fields to reduce index size

Collection: mydb.orders
  Index: default  [fullText]  READY
  Docs: 89,210
  Mapping: static (7 fields)
  ✔ No issues detected

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  2 index(es)  |  0 critical, 1 warnings, 1 clean
  Health: DEGRADED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Exit codes: 0 = healthy, 1 = degraded, 2 = critical.

Web UI

The inspector panel appears automatically above the main grid on page load and shows a card per index. A ↺ Refresh button lets you re-run the inspection on demand. If MongoDB is not configured, the panel displays a graceful "not connected" message instead of an error.


📋 Status Report

mongot-doctor can generate a full cluster snapshot covering pods, search metrics, JVM heap, Lucene merges, oplog, SRE findings, and index health — in three formats suited for different audiences and tools.

Formats

Format Use case
Text Human-readable ASCII report — paste into a ticket, Slack, or runbook
Markdown Tables and emoji — rendered in GitHub issues, Confluence, Notion
JSON Machine-readable — ingest into alerting tools, Grafana, CI/CD pipelines

Web UI

Click the 📋 Report button in the dashboard header to open the report modal. Three tabs switch between formats instantly. Each format panel includes:

  • Copy — copies the full report to clipboard with visual feedback
  • Download — saves the report as a file (mongot-report-<timestamp>.txt|md|json)

The modal closes with the button or the Escape key.

API

GET /api/report?format=text
GET /api/report?format=markdown
GET /api/report?format=json

Text and Markdown are returned as text/plain. JSON is returned as application/json.

Example JSON schema:

{
  "generated_at": "2026-03-17T10:00:00Z",
  "health": "degraded",
  "pods": [...],
  "per_pod_metrics": {
    "my-replica-set-search-0": {
      "search_commands": { "search_qps": 1.5, "search_avg_latency_sec": 0.012, ... },
      "jvm": { "heap_used_bytes": 1073741824, "heap_max_bytes": 4294967296, ... },
      "lucene_merges": { "running_merges": 2, "merging_docs": 45000, ... },
      "indexing": { "change_stream_lag_sec": 0.4, "initial_sync_in_progress": 0, ... }
    }
  },
  "oplog": { "window_hours": 72.5, "used_pct": 12.3 },
  "advisor_findings": [...],
  "indexes": [...],
  "errors": [...]
}

CLI

Generate a report without starting the web server:

python3 mongot_doctor.py --uri "mongodb://..." --namespace mongodb \
  --report --format text

python3 mongot_doctor.py --uri "mongodb://..." --namespace mongodb \
  --report --format markdown

python3 mongot_doctor.py --uri "mongodb://..." --namespace mongodb \
  --report --format json > report.json

--format defaults to text if omitted. The output is printed to stdout — redirect to a file as needed.


📄 License

MIT License — free to use, modify, and distribute. See LICENSE for the full text.


🤝 Support the Project

If you find this useful:

  • Star the repo — it helps others discover the project
  • 🧵 Share it with your team — SREs, MongoDB operators, platform engineers
  • 🐛 Report issuesopen an issue on GitHub

About

Real-time diagnostics dashboard for MongoDB Search (mongot) on Kubernetes — SRE checks, index inspector, log analysis, and cluster health reports.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors