🔬 MongoDB Search Diagnostics

🇮🇹 Documentazione in italiano disponibile: README-it.md

🔬 MongoDB Search Diagnostics

Before: raw Prometheus metrics, scattered kubectl logs, opaque index definitions, no clear picture of what's wrong. After: one dashboard, one health status, one list of actions.

mongot-doctor transforms complex MongoDB Search cluster data into instant diagnosis — designed for SRE, MongoDB operators, and platform engineers running MongoDB Search on Kubernetes.

What does it do?

Detects stuck search nodes, indexing lag, OOMKilled events, and configuration drift
Analyzes search query efficiency, scan ratios, and HNSW graph traversal in real time
Alerts you before problems become outages — predictive oplog window, cardinality warnings, stall detection
Built-in SRE Advisor runs 15 automated checks every collection cycle and ranks findings by severity
Automatic Search Diagnosis interprets cluster health instantly — Health Summary, Warnings, Recommendations in one panel
Log Intelligence parses mongot JSON logs automatically and detects errors, failures, and connection issues across configurable time windows
Search Index Inspector analyzes every Search index definition — mapping quality, field count, dynamic mapping overuse, and index health — with actionable suggestions
Status Report exports a full cluster snapshot in Text, Markdown, or JSON — shareable from the dashboard or from the CLI, ready for tickets, runbooks, and automation

No agents to install. No extra infrastructure. Just point it at your cluster and go.

Tip

Get a full diagnostic in one command

python3 mongot_doctor.py --uri "mongodb://..." --namespace mongodb --report

Prints a complete cluster snapshot — pods, search metrics, JVM heap, Lucene merges, oplog window, SRE findings, and index health — straight to your terminal. Add --format markdown or --format json to export to Confluence, GitHub Issues, or your alerting pipeline.

📋 Table of Contents

✨ Key Features
🚀 Installation & Setup
- Mode 1 — Local
- Mode 2 — Kubernetes
🔌 API Endpoints
🏗️ Project Structure
🧪 Running Tests
🔬 SRE Advisor — Deep Dive
🩻 Automatic Search Diagnosis
🪵 Log Intelligence
🔎 Search Index Inspector
📋 Status Report

✨ Key Features

🧠 SRE Advisor — 15 automated checks, severity-ranked (crit → warn → pass), served via /api/advisor — see deep dive below
📡 Real-time Search QPS & Latency — delta-based computation across Prometheus scrape cycles, separate for $search and $vectorSearch
🎯 Search Efficiency (Scan Ratio) — EMA-smoothed candidates_examined / results_returned, separate ratio for text and vector search, with cardinality detection
🧬 HNSW Visited Nodes — early warning for ANN CPU saturation before latency becomes visible
⏳ Index Build ETA — animated progress bar, docs/sec speed, stall detection, dynamic ETA
🔍 Robust Pod Discovery — 4-level hierarchy resilient to MCK upgrades and naming variations
🌊 Sync Pipeline Analyzer — real-time DB → Change Stream → RAM → Lucene pipeline visualization with bottleneck identification
⏱️ Predictive Oplog Window — warn at 40%, crit at 70% window consumed to prevent forced Initial Sync
🩺 Universal K8s Diagnostics — Helm releases, MCK/K8s versions, PVCs, OOMKilled events, live log streaming
📜 Log Management & Export — live terminal, download filtered by time window and severity
⚡ Background Collector & Rate Engine — daemon thread, < 1ms API response from in-memory cache, counter-reset safe
🔌 Stable Versioned API — /api/v1/search_metrics with fixed schema, safe for external consumers
🔒 Security — optional Basic Auth, CSP headers, K8s name input validation, configurable CORS
🩻 Automatic Search Diagnosis — real-time cluster health panel: Health Summary / Warnings / Recommendations; also available via /api/diagnose and --diagnose CLI (exit 0/1/2 for CI pipelines)
🪵 Log Intelligence — on-demand mongot JSON log analysis with configurable time window (1h / 24h / 7d / 30d); detects errors, OOM, TLS/auth issues, connection failures, index failures, change stream problems
🔎 Search Index Inspector — inspects every Search index definition: dynamic mapping detection, field count analysis, BUILDING/FAILED status, over-indexed collections; available via /api/indexes/inspect and --inspect-indexes CLI
📋 Status Report — full cluster snapshot in Text (ASCII), Markdown, or JSON; one-click download and copy from the dashboard; --report CLI flag for CI/automation pipelines

🚀 Installation & Setup

Prerequisites: kubectl configured and pointing to your cluster. A MongoDB connection string with read access on local (oplog) and your target collections.

Prometheus required: mongot-doctor reads mongot metrics via Prometheus. Prometheus is not enabled by default — you must explicitly configure it in your Kubernetes operator:

MongoDB Enterprise Operator (MCK): enable the spec.prometheus section in your MongoDB resource — Enterprise guide

MongoDB Community Operator: enable the spec.prometheus section in your MongoDBCommunity resource — Community guide

Mode 1 — Local (Mac / PC)

Use this mode for development, demos, or when you prefer running the monitor outside the cluster.

1. Clone and install

git clone https://github.com/Miccolomi/mongot-doctor.git
cd mongot-doctor
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

2. Start

python3 mongot_doctor.py \
  --uri "mongodb://USER:PASSWORD@HOST:PORT/admin?replicaSet=RS&authSource=admin&authMechanism=SCRAM-SHA-256" \
  --namespace mongodb \
  --port 5050

Open your browser at: http://localhost:5050

CLI options

Parameter	Default	Description
`--uri`	—	MongoDB connection string
`--namespace`	all	Kubernetes namespace to monitor
`--port`	`5050`	HTTP port for the dashboard
`--interval`	`5`	Collection interval in seconds
`--auth`	—	Basic Auth — format `user:password`
`--in-cluster`	`false`	K8s auth via ServiceAccount (in-cluster only)
`--host`	`0.0.0.0`	Flask binding address
`--allowed-origins`	localhost	CORS allowed origins (space-separated)

Mode 2 — Kubernetes (in-cluster)

Use this mode for a permanent deployment inside the cluster. The monitor runs as a pod and uses a ServiceAccount with RBAC to access the Kubernetes API.

1. Build the Docker image

docker build -t mongot-doctor:latest .

For a private registry (Docker Hub, ECR, GCR):

docker build -t <your-registry>/mongot-doctor:1.0.0 .
docker push <your-registry>/mongot-doctor:1.0.0

Update the image: field in k8s/deployment.yaml accordingly.

⚠️ Important: after every code update, rebuild and restart the deployment:
docker build -t mongot-doctor:latest .
kubectl rollout restart deployment/mongot-doctor -n mongodb

2. Configure the MongoDB URI

The connection to mongod is required for oplog, index, and compliance checks. mongot is always discovered automatically via Kubernetes — no URI needed for it.

Edit k8s/secret.yaml based on where your mongod is running:

# Scenario A — mongod inside the cluster (MCK): use the internal Service DNS
kubectl get svc -n mongodb   # look for a ClusterIP on port 27017

# Scenario A — in-cluster (MCK)
stringData:
  MONGODB_URI: "mongodb://USER:PASSWORD@<rs-name>-svc.<namespace>.svc.cluster.local/admin?replicaSet=<RS>&tls=true&tlsAllowInvalidCertificates=true&authSource=admin&authMechanism=SCRAM-SHA-256"

# Scenario B — Atlas (SRV)
# MONGODB_URI: "mongodb+srv://USER:PASSWORD@cluster0.xxxxx.mongodb.net/admin?authSource=admin&authMechanism=SCRAM-SHA-256"

# Scenario C — External replica set with DNS-resolvable hostnames
# MONGODB_URI: "mongodb://USER:PASSWORD@host1:27017,host2:27017/admin?replicaSet=RS&tls=true&authSource=admin&authMechanism=SCRAM-SHA-256"

authMechanism=SCRAM-SHA-256 is required by MongoDB 7+ with MCK.

3. Apply manifests

kubectl apply -f k8s/rbac.yaml        # ServiceAccount + ClusterRole
kubectl apply -f k8s/secret.yaml      # MongoDB URI
kubectl apply -f k8s/deployment.yaml  # Deployment
kubectl apply -f k8s/service.yaml     # NodePort

File	Description
`k8s/rbac.yaml`	ServiceAccount + ClusterRole with minimal permissions (includes `pods/proxy`)
`k8s/secret.yaml`	MongoDB URI as a K8s Secret
`k8s/deployment.yaml`	Deployment with liveness and readiness probes on `/healthz`
`k8s/service.yaml`	NodePort Service to expose the dashboard

Namespace: all manifests default to mongodb. Update namespace: in all 4 files if yours is different.

4. Access the dashboard

kubectl get svc mongot-doctor -n mongodb
# Example: 5050:31855/TCP  →  NodePort = 31855

Docker Desktop: http://localhost:<NODE_PORT>
Remote cluster (GKE, EKS, on-prem): http://<NODE_IP>:<NODE_PORT> (see kubectl get nodes -o wide)

On Docker Desktop with MCK, the internal DNS (<rs>-svc.mongodb.svc.cluster.local) is reachable directly from the pod. Do not use hostnames from the host's /etc/hosts — they are not resolvable from inside the cluster.

🔌 API Endpoints

Endpoint	Description
`/`	HTML Dashboard
`/metrics`	Full JSON snapshot (from cache)
`/api/v1/search_metrics`	Stable versioned API — fixed schema for external consumers
`/api/advisor`	SRE findings in JSON (crit → warn → pass)
`/healthz`	Liveness probe — always returns 200 if Flask is running
`/healthcheck`	Detailed status (MongoDB ping, K8s API, cache age)
`/api/logs/<ns>/<pod>`	Last 50 lines of pod logs
`/api/download_logs/<ns>/<pod>`	Download logs (`?time=1h&level=error`)
`/api/diagnose`	Structured diagnosis: health, warnings, recommendations
`/api/logs/analyze/<ns>/<pod>`	Log Intelligence — pattern analysis (`?window=1h\|24h\|7d\|30d`)
`/api/indexes/inspect`	Search Index Inspector — mapping quality and health report
`/api/report?format=text\|markdown\|json`	Status Report — full cluster snapshot in the requested format

🏗️ Project Structure

mongot_doctor.py        # App Factory + CLI entry point
background.py            # BackgroundCollector (thin orchestrator, daemon thread)
advisor.py               # SRE Advisor engine (15 checks, pure Python)
report.py                # Status Report builder (Text / Markdown / JSON formatters)
security.py              # Input validation, security headers, Basic Auth
state.py                 # Shared mutable state (clients, cache, lock)

engine/
  rate_calculator.py     # Delta/rate engine: QPS, latency, scan ratio EMA, HNSW, ETA
                         # Counter reset safety, spike guard, first-cycle protection

collectors/
  kubernetes.py          # K8s discovery (pods, CRDs, PVCs, services, helm)
  mongodb.py             # MongoDB collectors (vitals, oplog, indexes)
  prometheus.py          # Prometheus scraper with dual fallback
  index_inspector.py     # Search Index Inspector (mapping analysis, observation engine)
  log_analyzer.py        # Log Intelligence (JSON log parsing, 8 pattern matchers)

routes/
  api.py                 # API Blueprint (/metrics, /api/v1/search_metrics, /api/advisor, /api/logs)
  frontend.py            # Frontend Blueprint (/, /favicon.ico)

frontend/
  templates/
    dashboard.html       # Jinja2 template
  static/
    css/main.css
    js/
      utils.js           # Utilities (formatBytes, pill, gaugeRing, …)
      logs.js            # Live log management
      advisor.js         # Advisor renderer + Log Intelligence
      pipeline.js        # Sync Pipeline Analyzer
      index_inspector.js # Search Index Inspector panel
      report.js          # Status Report modal (tabs, copy, download)
      render.js          # Main renderer + polling

tests/
  conftest.py
  test_advisor.py        # tests — every SRE check
  test_background.py     # tests — collector and cache
  test_frontend.py       # tests — dashboard, CSS, JS, API
  test_security.py       # tests — validation, headers, auth

🧪 Running Tests

source venv/bin/activate
python3 -m pytest tests/ -v

🔬 SRE Advisor — Deep Dive

Every collection cycle runs a set of Python checks against the cluster and index state. Findings are sorted by severity (crit → warn → pass) and served via /api/advisor.

Checks overview

#	Check	Thresholds
1	Disk Space (200% Rule)	warn if free < 200% of used; crit if disk ≥ 90% (mongot enters read-only)
2	Index Consolidation	warn if more than one index of the same type on the same collection (fullText + vectorSearch is valid: Hybrid Search)
3	I/O Bottleneck	crit if disk queue > 10 AND lag > 5s simultaneously
4	CPU & QPS	crit if CPU > 80%; warn if QPS > 10 × cores
5	Memory Starvation (Page Faults)	warn > 500/s; crit > 1000/s
6	OOMKilled & MMap Risk	crit if JVM heap ≥ 90% of pod limit or OOMKilled detected
7	CRD Operator Status	crit if CRD is not in `Running` phase
8	Storage Class Performance	warn if PVC uses `standard`, `hostpath`, or `slow`
9	Operator Versioning	warn if operator image uses `:latest` tag
10	Predictive Oplog Window	warn > 40% consumed; crit > 70% consumed — prevents forced Initial Sync
11	Search Auth	crit if `skipAuthenticationToSearchIndexManagementServer=true` — mongod↔mongot without authentication
12	Search TLS Mode	crit if `searchTLSMode=disabled`; warn if `allowTLS`/`preferTLS`; pass if `requireTLS`
13	Search Efficiency (Scan Ratio)	warn > 50:1; crit > 500:1; predictive warning if high ratio + low latency (cardinality problem)
14	Vector Search Efficiency	same thresholds as scan ratio but computed separately for `$vectorSearch`
15	HNSW Visited Nodes	warn > 1000 nodes/query; crit > 5000 — early warning for ANN CPU saturation

📡 Search QPS & Real-Time Latency

The 🔎 Search Commands panel shows throughput metrics computed as deltas between successive Prometheus scrape cycles:

$search QPS and $vectorSearch QPS displayed prominently (requests/second)
Average latency computed as Δlatency_sum / Δcount — actual per-query latency, not a peak
Max latency — historical peak from the Prometheus counter
Failure counters for $search and $vectorSearch

QPS values activate from the second collection cycle onward (a time delta is required).

🎯 Search Efficiency — Scan Ratio (EMA-smoothed)

scan_ratio = candidates_examined / results_returned is the true indicator of search query efficiency. Latency alone is not enough: a 50ms query with 200k candidates examined will become a timeout as the dataset grows.

Two separate ratios are computed: one for $search (mongot_query_candidates_examined_total with fallback to mongot_query_documents_scanned) and one dedicated for $vectorSearch (mongot_vector_query_candidates_examined_total).

To avoid false positives under low traffic (e.g. 1 result / 500 candidates from a single query), the ratio is EMA-smoothed (α = 0.3) with a guard: if Δresults < 10 the EMA is not updated.

Ratio	Meaning
< 5	Excellent — highly selective index
5 – 50	Normal
50 – 500	Inefficient query — review index or analyzer
> 500	Critical — index or query is seriously problematic

Predictive cardinality detection: if scan_ratio > 50 but latency < 100ms, the Advisor emits a warning — the index is non-selective but the dataset is still small enough to hide the cost. This signal is not provided by Ops Manager.

Zero-results anti-pattern: if results_returned = 0 but candidates_examined > 0, a specific warning is raised. Common causes: post-search $match too restrictive, scoring threshold too high, misconfigured pipeline.

🧬 HNSW Visited Nodes — Early Warning CPU Saturation

mongot_vector_search_hnsw_visited_nodes (fallback: mongot_vector_search_graph_nodes_visited) measures how many nodes in the HNSW graph are traversed per $vectorSearch query. It is an early warning for CPU saturation: load increases before latency becomes visible.

Visited nodes	Meaning
< 200	Excellent
200 – 1000	Normal
> 1000	Costly query — monitor CPU
> 5000	ANN inefficient — CPU saturation imminent

High values indicate ANN is degrading toward brute-force, typically due to excessive efSearch, poor graph connectivity, or oversized embedding dimensions. The check is optional: skipped if the metric is not exposed by the installed mongot version.

⏳ Index Build ETA

During an Initial Sync or bulk index build, a dedicated "⚙️ Index Build in Progress" panel appears with:

Animated progress bar — green > 75%, orange < 75%, red if stalled
Document counter — processed / total with percentage
Speed in docs/sec (computed as a delta between collection cycles)
Dynamic ETA in h/m/s format or "INDEX BUILD STALLED" warning if speed drops below 100 docs/s for at least 30 seconds

The panel is only shown while an Initial Sync is active (initial_sync_in_progress > 0).

🔍 Robust Pod Discovery (4-level hierarchy)

Pod discovery uses a hierarchy resilient to rolling upgrades, scaling events, and naming variations across MCK versions:

Official MCK label app.kubernetes.io/component=search — most reliable
Container name mongot — stable fallback across MCK versions
Container image — contains mongodb-enterprise-search or mongot
Pod name (last resort) — heuristic, excludes mongod and monitor

The monitor pod itself is always excluded via app: mongot-doctor.

⚡ Background Collector & Rate Engine

Data collection runs on a separate daemon thread at a configurable interval. The /metrics endpoint always responds in < 1ms from the in-memory cache — the dashboard never blocks on external calls.

All delta/rate computation logic is isolated in engine/rate_calculator.py, separated from the collection loop:

background.py is a thin orchestrator: scrape → compute_pod_rates() → cache update
engine/rate_calculator.py contains QPS, average latency, scan ratio EMA, HNSW, ETA — independently testable
Counter reset safety: _safe_delta() returns None on negative delta (counter reset after mongot pod restart); spike guard discards QPS > 50,000/s; first cycle (last_s=None) skips all computation silently — no spurious spikes on startup

🔌 Stable API (`/api/v1/search_metrics`)

Versioned JSON endpoint with a fixed schema, decoupled from internal Prometheus metric names:

{
  "schema_version": "1",
  "timestamp": "...",
  "collect_ms": 42,
  "pods": {
    "mongot-pod-0": {
      "pod":        { "namespace", "node", "phase", "all_ready", "total_restarts" },
      "qps":        { "search": 1.5, "vectorsearch": 0.3 },
      "latency_sec":{ "search_avg", "search_max", "vectorsearch_avg", "vectorsearch_max" },
      "failures":   { "search": 0, "vectorsearch": 0 },
      "efficiency": { "search_scan_ratio", "vectorsearch_scan_ratio", "hnsw_visited_nodes", "zero_results_with_candidates" },
      "indexing":   { "replication_lag_sec", "initial_sync_active", "updates_per_sec", "eta" }
    }
  }
}

Safe for external consumers (CI performance gates, Grafana dashboards, alerting tools) — the backend can evolve without breaking the API contract.

🩻 Automatic Search Diagnosis

Every collection cycle, the diagnosis engine interprets the full cluster state and presents it in three columns directly in the dashboard:

Health Summary — all passing checks listed as ✔
Warnings & Critical — failing checks with detail message
Recommendations — actionable next steps derived from each finding

The health status (HEALTHY / DEGRADED / CRITICAL) is immediately visible at the top of the panel.

API

GET /api/diagnose

{
  "health": "degraded",
  "summary": { "pass": 12, "warn": 2, "crit": 1 },
  "critical": [{ "title": "OOMKilled & MMap Risk", "detail": "..." }],
  "warnings":  [{ "title": "Disk Space (200% Rule)", "detail": "..." }],
  "healthy":   [{ "title": "CRD Operator Status" }, ...],
  "recommendations": ["Increase memory limit...", "Check disk usage..."]
}

CLI

Run a single diagnostic cycle and exit — useful in CI/CD pipelines:

python3 mongot_doctor.py --diagnose \
  --uri "mongodb://..." --namespace mongodb

Exit codes: 0 = healthy, 1 = degraded, 2 = critical.

🪵 Log Intelligence

On-demand analysis of mongot JSON logs directly from the dashboard. Parses the structured log format ({"t":..., "s":..., "n":..., "msg":..., "attr":...}) and detects known failure patterns.

Configurable time window

Window	Description
`1h`	Last hour — quick triage
`24h`	Last 24 hours — default
`7d`	Last 7 days — trend analysis
`30d`	Last 30 days — long-term issues

Up to 2,000 JSON lines are analyzed per request (memory guard).

Detected patterns

Pattern	Severity	Detection
Out of Memory	🔴 crit	`OutOfMemoryError` in `msg` or `attr`
Errors & Fatals	🔴 crit	`s == "ERROR"` or `"FATAL"`
TLS / Auth Issues	🔴 crit	`ssl`/`tls`/`auth`/`certificate` in `msg` + ERROR/WARN
MongoDB Connection Issues	🟡 warn	`org.mongodb.driver` class + `Exception`/`Removing server`
Index Failures	🟡 warn	`index`/`lucene` class + `fail`/`corrupt`/`invalid`
Replication / Change Stream	🟡 warn	`changestream` class + `lag`/`timeout`/`fail`
Initial Sync Activity	🔵 info	`initialsync` class
General Warnings	🟡 warn	`s == "WARN"`

API

GET /api/logs/analyze/<namespace>/<pod>?window=24h

{
  "pod": "my-replica-set-search-0",
  "window": "24h",
  "lines_analyzed": 350,
  "findings": [
    {
      "id": "errors",
      "name": "Errors & Fatals",
      "severity": "crit",
      "count": 3,
      "description": "ERROR or FATAL log entries detected...",
      "examples": ["[2026-03-05T14:09:07] Connection refused — ..."]
    }
  ]
}

🔎 Search Index Inspector

Many teams create Search indexes without fully understanding their cost: dynamic mapping that indexes every field, stale BUILDING indexes, oversized explicit mappings with dozens of unused fields. mongot-doctor analyzes every index definition automatically and tells you exactly what to fix.

What it checks

Check	Severity	Condition
FAILED state	🔴 crit	Index is in `FAILED` status
Not queryable	🔴 crit	`queryable: false` — index not serving queries
Dynamic mapping	🟡 warn	`mappings.dynamic: true` — every document field is indexed
BUILDING state	🟡 warn	Index still building — queries may not return full results
Empty mapping	🟡 warn	`dynamic: false` and zero fields mapped — index returns nothing
Large static mapping	🟡 warn	More than 20 explicit fields — review unused ones
Over-indexed collection	🟡 warn	More than 3 Search indexes on the same collection

Why dynamic mapping matters

With dynamic: true, mongot indexes every field in every document. This is convenient during development, but in production it causes:

Index size far exceeding the actual data size (observed 10–50x in real clusters)
Longer indexing lag — more fields to process per write
Higher JVM heap pressure — more Lucene segments to manage
Opaque resource consumption — teams don't know what they're actually indexing

The inspector detects this and suggests migrating to a static mapping with only the fields used in search queries.

API

GET /api/indexes/inspect

{
  "summary": {
    "total_indexes": 4,
    "clean": 2,
    "warns": 2,
    "crits": 0,
    "health": "degraded"
  },
  "indexes": [
    {
      "ns": "mydb.products",
      "name": "default",
      "type": "fullText",
      "status": "READY",
      "queryable": true,
      "num_docs": 125432,
      "mapping_dynamic": true,
      "field_count": 0,
      "observations": [
        {
          "level": "warn",
          "msg": "Dynamic mapping enabled — every document field is indexed",
          "suggestion": "Restrict mapping to specific fields to reduce index size and improve performance"
        }
      ]
    }
  ]
}

CLI

Run a full inspection from the terminal — no dashboard needed:

python3 mongot_doctor.py --uri "mongodb://..." --inspect-indexes

Example output:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  MongoDB Search — Index Inspector
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Collection: mydb.products
  Index: default  [fullText]  READY
  Docs: 125,432
  Mapping: dynamic ⚠
  ⚠ Dynamic mapping enabled — every document field is indexed
    → Restrict mapping to specific fields to reduce index size

Collection: mydb.orders
  Index: default  [fullText]  READY
  Docs: 89,210
  Mapping: static (7 fields)
  ✔ No issues detected

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  2 index(es)  |  0 critical, 1 warnings, 1 clean
  Health: DEGRADED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Exit codes: 0 = healthy, 1 = degraded, 2 = critical.

Web UI

The inspector panel appears automatically above the main grid on page load and shows a card per index. A ↺ Refresh button lets you re-run the inspection on demand. If MongoDB is not configured, the panel displays a graceful "not connected" message instead of an error.

📋 Status Report

mongot-doctor can generate a full cluster snapshot covering pods, search metrics, JVM heap, Lucene merges, oplog, SRE findings, and index health — in three formats suited for different audiences and tools.

Formats

Format	Use case
Text	Human-readable ASCII report — paste into a ticket, Slack, or runbook
Markdown	Tables and emoji — rendered in GitHub issues, Confluence, Notion
JSON	Machine-readable — ingest into alerting tools, Grafana, CI/CD pipelines

Web UI

Click the 📋 Report button in the dashboard header to open the report modal. Three tabs switch between formats instantly. Each format panel includes:

Copy — copies the full report to clipboard with visual feedback
Download — saves the report as a file (mongot-report-<timestamp>.txt|md|json)

The modal closes with the ✕ button or the Escape key.

API

GET /api/report?format=text
GET /api/report?format=markdown
GET /api/report?format=json

Text and Markdown are returned as text/plain. JSON is returned as application/json.

Example JSON schema:

{
  "generated_at": "2026-03-17T10:00:00Z",
  "health": "degraded",
  "pods": [...],
  "per_pod_metrics": {
    "my-replica-set-search-0": {
      "search_commands": { "search_qps": 1.5, "search_avg_latency_sec": 0.012, ... },
      "jvm": { "heap_used_bytes": 1073741824, "heap_max_bytes": 4294967296, ... },
      "lucene_merges": { "running_merges": 2, "merging_docs": 45000, ... },
      "indexing": { "change_stream_lag_sec": 0.4, "initial_sync_in_progress": 0, ... }
    }
  },
  "oplog": { "window_hours": 72.5, "used_pct": 12.3 },
  "advisor_findings": [...],
  "indexes": [...],
  "errors": [...]
}

CLI

Generate a report without starting the web server:

python3 mongot_doctor.py --uri "mongodb://..." --namespace mongodb \
  --report --format text

python3 mongot_doctor.py --uri "mongodb://..." --namespace mongodb \
  --report --format markdown

python3 mongot_doctor.py --uri "mongodb://..." --namespace mongodb \
  --report --format json > report.json

--format defaults to text if omitted. The output is printed to stdout — redirect to a file as needed.

📄 License

MIT License — free to use, modify, and distribute. See LICENSE for the full text.

🤝 Support the Project

If you find this useful:

⭐ Star the repo — it helps others discover the project
🧵 Share it with your team — SREs, MongoDB operators, platform engineers
🐛 Report issues — open an issue on GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
collectors		collectors
engine		engine
frontend		frontend
k8s		k8s
routes		routes
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README-it.md		README-it.md
README.md		README.md
Screenshot.png		Screenshot.png
advisor.py		advisor.py
background.py		background.py
dashboard.png		dashboard.png
mdb_operator_diagnostic_data.sh		mdb_operator_diagnostic_data.sh
mongot_doctor.py		mongot_doctor.py
requirements.txt		requirements.txt
security.py		security.py
state.py		state.py
status_report.py		status_report.py

Folders and files

Latest commit

History

Repository files navigation

🔬 MongoDB Search Diagnostics

What does it do?

📋 Table of Contents

✨ Key Features

🚀 Installation & Setup

Mode 1 — Local (Mac / PC)

Mode 2 — Kubernetes (in-cluster)

🔌 API Endpoints

🏗️ Project Structure

🧪 Running Tests

🔬 SRE Advisor — Deep Dive

Checks overview

📡 Search QPS & Real-Time Latency

🎯 Search Efficiency — Scan Ratio (EMA-smoothed)

🧬 HNSW Visited Nodes — Early Warning CPU Saturation

⏳ Index Build ETA

🔍 Robust Pod Discovery (4-level hierarchy)

⚡ Background Collector & Rate Engine

🔌 Stable API (/api/v1/search_metrics)

🩻 Automatic Search Diagnosis

API

CLI

🪵 Log Intelligence

Configurable time window

Detected patterns

API

🔎 Search Index Inspector

What it checks

Why dynamic mapping matters

API

CLI

Web UI

📋 Status Report

Formats

Web UI

API

CLI

📄 License

🤝 Support the Project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🔌 Stable API (`/api/v1/search_metrics`)

Packages