feat: replace Prometheus monitoring with Netdata + ntfy.sh alerts#1
Merged
Conversation
10-task phased plan (Phase 1: additive, Phase 2: removal). Replaces prometheus + node-exporter + postgres-exporter with one netdata agent; keeps Loki + Alloy + Grafana for logs. Alerts go to public ntfy.sh with random topic stored in .env. Phase 1 leaves the deployment in a working dual-stack state so the user can validate Netdata for 24h before Phase 2 removes the old Prometheus services.
Adds netdata agent (v1.99.0) with full host visibility, Docker socket for container auto-discovery, named volumes for persistent state and resource limits (256m/1.0 default). Service is added but not yet started in this commit - configs come in subsequent tasks.
No other service in the project sets container_name explicitly - all
rely on Compose's default ${COMPOSE_PROJECT_NAME}_<service>_1 naming.
Forcing 'container_name: netdata' would break multi-stack hosts (dev +
prod on one machine) with 'container name already in use' errors.
Netdata's node-label-in-dashboard is set by 'hostname:' not
'container_name:' - that line stays.
netdata.conf disables registry + binds 0.0.0.0:19999 (reverse-proxied
via nginx /netdata/ - not exposed on host). postgres.conf builds DSN
from ${PG_*} env vars (works for both internal and external DB modes).
health_alarm_notify.conf is shell-sourced override that enables only
ntfy channel and routes all roles to ${NTFY_SERVER}/${NTFY_TOPIC}.
ensure-config-files.sh now recursively copies defaults/netdata/ to BPP_CONFIGS_DIR/netdata/ (copy_if_missing - non-destructive). init-configs.sh generates random NTFY_TOPIC (openssl rand -hex 16) for existing deployments missing the var, and ensures DJANGO_BPP_NTFY_SERVER defaults to https://ntfy.sh. Topic is a secret (anyone with the URL reads alerts), so it's logged once during the migration with a 'subscribe in app' hint.
Same pattern as /grafana/ - auth_request to /_bpp_superuser_auth gates access, trailing-slash proxy_pass strips the /netdata/ prefix. WebSocket headers enabled (Netdata uses WS for live charts), buffering disabled for stream-style data.
make ntfy-test - sends test push to NTFY_TOPIC from .env
(confirms phone subscription works)
make health-netdata - curl /api/v1/info via nginx and direct
make logs-netdata - tail netdata container logs
make netdata-shell - exec bash in netdata container
scripts/grant-pg-monitor.sh detects internal vs external DB mode. Internal: execs psql in dbserver and runs GRANT pg_monitor. External: prints the SQL for the DBA to run manually. Idempotent - GRANT can be re-run safely.
Three issues caught by cross-task Phase 1 review: - nginx /netdata/ used trailing-slash proxy_pass, stripping the URI prefix - Netdata then generated /static/* asset URLs that browser resolved to root (Django 404). Switch to regex location with named capture, preserve prefix, add X-Forwarded-Url for autodetection. Also add /netdata -> /netdata/ redirect for typed URLs without slash. - make health-netdata curled via nginx, hit auth_request, got 302 redirect, displayed as 'HTTP 302' looking like failure. Drop the nginx hop - container-direct check is the meaningful signal. - ensure-config-files.sh recursive copy was catching .gitkeep and leaking it to user's configs dir. Exclude with -not -name.
ntfy-test, health-netdata, logs-netdata, netdata-shell, grant-pg-monitor - all referenced in CLAUDE.md as the make-target source of truth but missing from 'make help' until now.
…porter Netdata (added in Phase 1) replaces all three: host metrics (node-exporter), Postgres stats (postgres-exporter via go.d/postgres), and time-series storage (prometheus). One container, ~200MB RAM, preconfigured alerts, push to ntfy.sh - vs three containers and zero alerts. Changes: - docker-compose.monitoring.yml: remove 3 services + prometheus_data volume - defaults/grafana/provisioning/datasources/datasources.yaml.tpl: remove Prometheus datasource, promote Loki to default - defaults/grafana/provisioning/dashboards/: delete disk-usage.json, http-performance.json, errors.json (all 3 referenced Prometheus - Netdata has equivalent built-in dashboards) - scripts/ensure-config-files.sh: drop prometheus seeding - scripts/configure-resources.sh: drop prometheus tunable, add netdata - scripts/upgrade-postgres.sh: drop postgres-exporter stop/restart lines - scripts/init-configs.sh + defaults/docker-compose.local_overrides.yml: drop node-exporter macOS override (entire file - no other content), also drop the include in docker-compose.yml and the .gitignore entry - docker-compose.database.external.yml: drop postgres-exporter from explanatory comments - docker-compose.yml: update volume-list comment defaults/prometheus/ kept as historical artifact - delete later if no rollback path needed. Existing deployments: prometheus_data volume becomes orphan, will be cleaned by 'make prune-orphan-volumes'. PROMETHEUS_*, NODE_EXPORTER_*, PG_EXPORTER_* env vars in user .env files are harmless (Docker Compose ignores unreferenced vars). No migration needed.
- Delete defaults/grafana/provisioning/dashboards/postgresql-health.json (referenced the removed Prometheus datasource in 64 places; Netdata Postgres collector dashboards cover the same metrics at /netdata/). - Flip DJANGO_BPP_ENABLE_PROMETHEUS default from true to false in docker-compose.application.yml. Nothing scrapes /metrics anymore, django-prometheus middleware is pure overhead. Existing deployments that set the var in .env keep their value (backwards compat).
Reflects the architectural change: netdata replaces prometheus + node-exporter + postgres-exporter (Phase 2 commit 16ae9c1 / 8cf921f). Loki + Alloy + Grafana stay for logs. Updated sections: - Architecture Overview > Services > Monitoring - Architecture Overview > Data Flow (metrics path) - Monitoring Access (URLs + new ntfy info) - Logging (drop Prometheus retention, add Netdata tiered retention) - Make Targets (new ntfy/netdata commands) - Resource Limits (drop prometheus/exporters, add netdata)
tests/test_makefile.sh asserted presence of prometheus dir + config, which Phase 2 (commit 16ae9c1) removed. Switched assertions to the new netdata structure: netdata.conf, go.d/postgres.conf, health_alarm_notify.conf, plus health.d/ directory. CI test_init_configs_creates_structure, test_init_configs_copies_templates, and test_init_configs_no_overwrite now reflect post-migration state.
v1.99.0 did not exist on Docker Hub (planning oversight - I picked a placeholder version without verifying). Netdata jumped from v1.47 straight to v2.0 - no v1.99 release line. v2 split [global] into [global] + [db], so the dbengine directives move: - memory mode -> [db] mode - page cache size -> [db] dbengine page cache size - dbengine multihost disk -> [db] dbengine tier 0 retention size space v2 maintains backwards-compat with v1 directive names (logs deprecation warnings) but cleaner to use current idiom upfront. Sizing kept identical: 512MB tier 0 retention, 32MB page cache, 1s update interval.
NETDATA_DISABLE_CLOUD=1 turns off the agent-side Cloud client entirely. Without it, Netdata v2's first-time dashboard pops a 'please connect your agent / docker exec netdata...' dialog even though the user is already authed via authserver+nginx. DO_NOT_TRACK and DISABLE_TELEMETRY (already set) only suppress anonymous-stats phone-home, not the Cloud claim prompt - those are separate code paths.
tests/test_makefile.sh Test 14e ran nginx container that created letsencrypt cert files as root (container default user). Teardown's bare rm failed on Ubuntu GHA (runner user != root) with 'Permission denied' - the test itself passed all assertions, only cleanup exit code was non-zero. macOS Docker Desktop uses a user-mapped VM, so files appear as runner-owned and rm works there. Ubuntu runs native Docker with no user mapping. This pre-existing failure has been red on main for every commit since the LE test was added. Fix: try sudo rm first (GHA has passwordless sudo), fall back to plain rm (no-op since files already gone, or harmless error message on dev machine without sudo). Applied to all rm sites in Test 14 (LE certs) and Test 15 (runtime ssl/) that touch dirs populated by docker run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs caught on the deployment host:
1) scripts/grant-pg-monitor.sh sourced .env as bash, breaking on
shell-unfriendly values like 'EMAIL=Name <addr@domain>' (the < is
parsed as redirect). Switched to grep-based extraction matching
the get_env_var helper pattern from init-configs.sh.
2) defaults/netdata/go.d/postgres.conf used ${PG_*} env var
placeholders, assuming go.d.plugin would substitute them. It does
not (verified on v2.10.3): the literal string went through to
the URL parser which choked on ':${PG_PORT}'. Reworked as a
template (.tpl) rendered host-side by ensure-config-files.sh
on every make up/refresh - so password changes in .env propagate
on next deploy. Removed now-useless PG_* env vars from compose
(NTFY_* stay - those ARE used because health_alarm_notify.conf
is bash-sourced).
Auto-generated file lives at \$BPP_CONFIGS_DIR/netdata/go.d/postgres.conf
with a clear DO-NOT-EDIT header.
Test 4 (init-configs copies templates) was asserting that $CONFIG_DIR/netdata/go.d/postgres.conf exists after init-configs. After the .tpl rendering refactor, that file is generated from the .tpl by ensure-config-files.sh - but only when .env exists, and init-configs.sh invoked ensure-config-files BEFORE creating .env (intentionally - to seed the directory layout first). Fix: invoke ensure-config-files.sh a second time at the end of init-configs, after .env is fully populated. Idempotent - just re-renders postgres.conf (and any other .env-dependent configs we add later).
…_log to Loki
Dodaje monitorowanie nginx w netdacie oraz access_log w Grafanie/Loki.
nginx (stub_status) -> netdata:
- default.conf.template: wewnetrzny server { listen 8090; /stub_status }
(port niepublikowany w compose, osiagalny tylko netdata->webserver:8090)
- defaults/netdata/go.d/nginx.conf: kolektor live metryk polaczen
access_log -> Loki + web_log:
- 00-log-format.conf: format bpp_access (combined + request_time/
upstream_response_time/request_length), ladowany w kontekscie http
- vhost.conf.template: dwa sinki access_log w bpp_access:
/dev/stdout (-> Alloy -> Loki) ORAZ plik na wolumenie nginx_access_log
- defaults/netdata/go.d/web_log.conf: kolektor metryk z access logu
(kody HTTP, latencja) + alerty 5xx/latencja; log_type auto + escape-hatch
- infrastructure.yml: mount 00-log-format.conf, wolumen nginx_access_log,
skrypt rotacji + label Ofelia (04:10)
- monitoring.yml: nginx_access_log RO do netdaty
- scripts/nginx-access-log-rotate.sh: mv .1 + nginx -s reopen (Docker log
driver nie rotuje plikow, tylko stdout/stderr)
cleanup po migracji Prometheus->Netdata:
- datasources.yaml.tpl: deleteDatasources Prometheus (kasuje martwy
datasource z grafana_data na upgrade'owanych instalacjach)
testy + docs:
- test_makefile.sh: asercje dla nginx.conf/web_log.conf
- CLAUDE.md: sekcje go.d collectors, nginx access log, data flow
Zweryfikowane: nginx -t (realny kontener nginx:1.29.7), docker compose
config (merge 7 plikow, wolumen rozwiazuje sie cross-file), make init-configs
(nowe pliki go.d kopiuja sie).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two complementary channels for tracking slow PostgreSQL queries, both reusing existing infrastructure (Loki + Grafana + PostgreSQL datasource): - log_min_duration_statement=1000: every query >1s logged to dbserver log -> Alloy -> Loki (90d retention). Grafana dashboard 'Slow queries (log)' renders via LogQL with regex extraction of duration and query text. Natural time-windowing via UI time picker. - pg_stat_statements: aggregated stats per normalized query (calls, mean/total/stddev exec time, rows). Grafana dashboard 'Top 100 queries (pg_stat_statements)' via existing PostgreSQL datasource. Manual pg_stat_statements_reset() for rolling time windows. Bootstrap: make pg-monitoring-setup - ALTER SYSTEM SET log_min_duration_statement = 1000 + reload - Append pg_stat_statements to shared_preload_libraries (preserving existing libs), restart dbserver, CREATE EXTENSION - Idempotent, detects external DB mode (prints SQL for DBA)
Commit 566b146 added defaults/webserver/00-log-format.conf (log_format bpp_access ...) and wired it into production via docker-compose.infrastructure.yml. The test helper _run_nginx_t builds its own nginx container with explicit mounts and didn't propagate the new file - nginx -t failed with 'unknown log format bpp_access' in 6 different test 14 / 15 variants. Fix: mount the same file in the test container, matching the production configuration. Also create and mount /var/log/nginx-shared/ so the access_log file destination in vhost.conf.template can be opened. Pure test plumbing - no production behavior change.
…erride - Mount host root (/:/host/root:ro,rslave) so diskspace.plugin reports used/avail/% for ALL host partitions (df), not just container fs. No NETDATA_HOST_PREFIX needed — image knows the /host/root convention. - Remove custom healthcheck: it called `wget --spider`, but the netdata image ships no wget (only curl/nc), so the container was ALWAYS reported unhealthy despite a working agent. Image's built-in HEALTHCHECK /usr/sbin/health.sh is correct and maintained upstream. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…boards Error Monitoring: - data link na "Error Count Over Time": klik w serię serwera ustawia var-service i filtruje panel "Error Logs" - "Error Logs" wyzszy (h 16 -> 24) + enableInfiniteScrolling Top 100 queries (pg_stat_statements): - towarzyszacy bar chart "Top 15 by mean execution time"; klik w slupek ustawia zmienna qid i zaweza tabele do tego queryid - tabela honoruje $qid (puste = wszystkie 100); pole qid u gory do resetu - pg_stat_statements nie ma osi czasu, wiec filtr jest po queryid (migawka) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Dashboard "PostgreSQL: Storage & tables" (uid postgresql-storage): rozmiar bazy, najwieksze tabele/indeksy (top 20), dead tuples & autovacuum, szacowany bloat tabel i indeksow. Datasource grafana-postgresql-datasource. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ensure-config-files.sh: dashboardy Grafany (grafana/provisioning/dashboards/*) sa teraz force-syncowane z defaults/ przy kazdym make up/refresh/run (copy_always, overwrite tylko gdy tresc sie rozni). Wczesniej copy_if_missing pomijal istniejace pliki, wiec zaktualizowany dashboard nie trafial na zywy deployment bez recznego cp. User-tunable configi (loki/netdata/alloy) zostaja copy_if_missing. Docs: CLAUDE.md + README opisuja force-sync oraz komplet dashboardow (Error Monitoring z cross-filterem serwera, companion bar chart + klik-filtr na pg_stat_statements, Storage & tables). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Host) Redirect @bpp_login budowal URL z $http_host. Pod HTTP/3 (QUIC) nie ma naglowka Host: — jest pseudo-naglowek :authority — wiec $http_host jest PUSTY i przegladarka dostawala 302 na https:///__external_auth/login/?next=https:///... (bez domeny). Firefox po Alt-Svc przelaczal sie na h3 i trafial na bug; Safari (jeszcze h2) dzialal. $host bierze wartosc z :authority/Host/server_name, wiec poprawny we wszystkich protokolach. Wlaczamy h3 w vhost.conf.template (listen 443 quic + Alt-Svc), wiec to realny regres dla kazdego /grafana /netdata /dozzle /flower przy wygaslej sesji. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ce table Dashboard loguje tez INFO/WARN, nie tylko bledy -> "Error Monitoring" przemianowany na "Log Monitoring" (uid zostaje error-monitoring, zeby nie osierocic provisioned dashboardu / nie zepsuc zakladek). Gorny wykres "Log volume by level over time": rozbity po detected_level (stackowane slupki, kolory per poziom: error=czerwony, warn=pomaranczowy, info=zielony, debug=niebieski). Klik w serie poziomu ustawia var-level. Nowy panel-tabela "By service (click to filter)": liczba linii per serwer w zakresie czasu; klik w wiersz ustawia var-service. Tabela nie filtruje sie po $service (zostaje pelnym menu do przelaczania), respektuje container/level. Dolny panel przemianowany na "Logs". Efekt: filtrowanie po serwerze (klik w tabele) ORAZ po poziomie (klik w serie wykresu), plus wizualne rozroznienie poziomow na wykresie. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ighten netdata ACL - ntfy-test: nie drukuj sekretnego NTFY_TOPIC na stdout (historia/CI/tee) - health-netdata: curl zamiast wget (obraz netdaty nie ma wget -> zawsze failowalo) - netdata.conf: allow badges/streaming from = sieci Dockera+localhost zamiast * (single-agent, brak parent/child; * pozwalal kazdemu kontenerowi wstrzykiwac metryki) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…/Netdata Grafana datasource laczyl sie uzytkownikiem APLIKACJI (RW na produkcji), a GF_USERS_AUTO_ASSIGN_ORG_ROLE=Admin + panel SQL = kazdy zalogowany mogl wykonac dowolny DML/DDL. Teraz osobna read-only rola bpp_monitor (pg_monitor + pg_read_all_data, bez DDL/DML). - scripts/create-monitoring-user.sh (NOWY): idempotentny CREATE/ALTER ROLE + granty. Internal: psql przez docker exec jako superuser, PGPASSWORD przez -e (nie w argv), ON_ERROR_STOP=1. External: wypisuje SQL. --soft: nie blokuje make up gdy DB jeszcze nie wstala. Walidacja hasla [A-Za-z0-9] (literal SQL). - datasources.yaml.tpl + postgres.conf.tpl: lacza sie jako bpp_monitor (BEZ fallbacku do usera Django - rola ma istniec). - ensure-config-files.sh: self-heal sekretow (DJANGO_BPP_PG_MONITOR_PASSWORD, NTFY_TOPIC) append-only -> git pull && make up na starym .env dziala bez recznych krokow. _esc escapuje teraz backslash. postgres.conf renderowany atomowo (tmp+mv) + chmod 600 (haslo w DSN). - pg-monitoring-setup.sh + grant-pg-monitor.sh: tryb external wykrywany przez BPP_DATABASE_COMPOSE (nie obecnosc serwisu - sentinel tez zwie sie dbserver). PGPASSWORD + ON_ERROR_STOP. Walidacja shared_preload_libraries przed ALTER. grant-pg-monitor -> alias do create-monitoring-user. - up/refresh: wolaja create-monitoring-user.sh --soft (rola ma istniec). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tpl + review fixes Continues the Prometheus->Netdata migration; bundles in-flight datasource work with code-review fixes on PR #1. bpp_monitor (security): - Drop pg_read_all_data; keep only pg_monitor. Grafana auto-promotes every authenticated user to Admin and exposes an ad-hoc SQL panel, so a data-read grant would let any Grafana user read employee PII. All shipped dashboards query stat-views / catalog / size functions only (verified) - pg_monitor suffices; the Netdata postgres collector needs only it too. - pg-monitoring-setup external mode now also emits the bpp_monitor role SQL (was: printed slow-query SQL and exited before creating the monitor user). - ensure-config-files: warn loudly when the postgres.conf render is skipped, so a stale pre-migration DSN (app superuser) cannot silently persist. datasource / config rendering (in-flight): - Force-sync datasources.yaml.tpl (copy_always) so upgrades pick up the bpp_monitor switch + deleteDatasources: Prometheus cleanup. - Extract generate-grafana-datasources.sh (reads .env from disk, atomic render). - _ensure_secret treats empty 'VAR=' as missing; default PG port 5432. - NTFY_SERVER: $(or $(strip ...)) fallback for old .env; qid filter uses ${qid:sqlstring} (no ::bigint crash on non-numeric input). cleanup: - Remove `make health-netdata`: Netdata has a built-in image HEALTHCHECK, and the wrapper masked curl failure through the head pipe (always exited 0). - Remove prometheus.yml + stale health-netdata / grant-pg-monitor doc refs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
README still described the old Prometheus stack. Sync it to what ships now: - add /netdata/ to the monitoring access paths - config-dir tree: drop prometheus/, add loki/ + netdata/ (go.d, health.d, ntfy) - "Monitoring i logi": add `make logs-netdata` + `make ntfy-test` - configure-resources high-risk list: prometheus -> netdata - services table: replace prometheus row with netdata (metrics + ntfy push) - server-move section: prometheus_data -> netdata_lib + netdata_cache volumes Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ewhere Test 7 wrote a custom marker into netdata.conf and asserted it survived re-init (copy_if_missing). netdata.conf is now force-synced (rendered from netdata.conf.tpl for the registry-announce URL), so the marker is overwritten by design and the assertion failed in CI. Test preservation on health_alarm_notify.conf instead, which stays copy_if_missing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
prometheus+node-exporter+postgres-exporter(3 containers, ~700MB RAM, zero preconfigured alerts) with a singlenetdataagent (1 container, ~200MB RAM, hundreds of built-in alerts, 1s resolution).envasNTFY_TOPIC— no Slack, no email, no PagerDuty)loki+alloy+grafanauntouched as the log search stack (180d retention for nginx access log, 90d for app, etc.)What's added
defaults/netdata/{netdata.conf, go.d/postgres.conf, health_alarm_notify.conf, health.d/}location /netdata/behind existing authserver (regex with named capture — handles subpath proxying correctly)init-configs.shmigration: auto-generatesNTFY_TOPIC(random 32-hex) for existing deployments, prints subscribe URL oncemk/monitoring.mkwithmake ntfy-test,health-netdata,logs-netdata,netdata-shell,grant-pg-monitorscripts/grant-pg-monitor.sh— auto-detects internal vs external dbserver modeWhat's removed
prometheus_datavolumedisk-usage,http-performance,errors,postgresql-health) — Netdata has equivalent built-insDJANGO_BPP_ENABLE_PROMETHEUSdefault flipped tofalse(django-prometheus middleware was pure overhead)local_overrides.yml(only purpose was disabling node-exporter)defaults/prometheus/directory kept as historical artifact (delete in follow-up if no rollback needed).Backwards compatibility
.envfiles withoutNTFY_TOPICparse cleanly (${NTFY_TOPIC:-}default in compose)make init-configsmigrates existing deployments (idempotent — won't regenerate topic and break phone subscriptions)PROMETHEUS_*/NODE_EXPORTER_*/PG_EXPORTER_*env vars are harmless (Compose ignores unreferenced vars)prometheus_dataDocker volume becomes orphan after deploy — cleaned bymake prune-orphan-volumesTest plan
git checkout feat/netdata-monitoringmake init-configs— verifyNTFY_TOPICappears in.env, subscribe URL printedmake refresh— verify netdata pulls and starts; prometheus/exporter containers gonemake ntfy-test— confirm push notification arrives on phonemake grant-pg-monitor— confirm idempotent (run twice)https://<host>/netdata/— verify dashboard loads through authservermake health-netdata— confirm agent reports healthypg_stat_*data (no permission errors)/grafana/)make prune-orphan-volumesto removeprometheus_dataPlan & history
Full implementation plan:
docs/superpowers/plans/2026-05-31-netdata-monitoring.md(in branch)13 commits, organized as: plan → Phase 1 (additive, 8 commits) → Phase 2 (removal, 2 commits) → docs polish (2 commits).
🤖 Generated with Claude Code