feat: replace Prometheus monitoring with Netdata + ntfy.sh alerts by mpasternak · Pull Request #1 · iplweb/bpp-deploy

mpasternak · 2026-05-31T12:55:33Z

Summary

Replace prometheus + node-exporter + postgres-exporter (3 containers, ~700MB RAM, zero preconfigured alerts) with a single netdata agent (1 container, ~200MB RAM, hundreds of built-in alerts, 1s resolution)
Alerts push directly to phone via ntfy.sh (random per-deployment topic stored in .env as NTFY_TOPIC — no Slack, no email, no PagerDuty)
Keep loki + alloy + grafana untouched as the log search stack (180d retention for nginx access log, 90d for app, etc.)

What's added

defaults/netdata/{netdata.conf, go.d/postgres.conf, health_alarm_notify.conf, health.d/}
Nginx location /netdata/ behind existing authserver (regex with named capture — handles subpath proxying correctly)
init-configs.sh migration: auto-generates NTFY_TOPIC (random 32-hex) for existing deployments, prints subscribe URL once
mk/monitoring.mk with make ntfy-test, health-netdata, logs-netdata, netdata-shell, grant-pg-monitor
scripts/grant-pg-monitor.sh — auto-detects internal vs external dbserver mode

What's removed

Three monitoring services + prometheus_data volume
4 Grafana dashboards that depended on Prometheus (disk-usage, http-performance, errors, postgresql-health) — Netdata has equivalent built-ins
Prometheus datasource in Grafana provisioning (Loki promoted to default)
DJANGO_BPP_ENABLE_PROMETHEUS default flipped to false (django-prometheus middleware was pure overhead)
macOS local_overrides.yml (only purpose was disabling node-exporter)

defaults/prometheus/ directory kept as historical artifact (delete in follow-up if no rollback needed).

Backwards compatibility

Old .env files without NTFY_TOPIC parse cleanly (${NTFY_TOPIC:-} default in compose)
make init-configs migrates existing deployments (idempotent — won't regenerate topic and break phone subscriptions)
Stale PROMETHEUS_* / NODE_EXPORTER_* / PG_EXPORTER_* env vars are harmless (Compose ignores unreferenced vars)
prometheus_data Docker volume becomes orphan after deploy — cleaned by make prune-orphan-volumes

Test plan

Plan & history

Full implementation plan: docs/superpowers/plans/2026-05-31-netdata-monitoring.md (in branch)

13 commits, organized as: plan → Phase 1 (additive, 8 commits) → Phase 2 (removal, 2 commits) → docs polish (2 commits).

🤖 Generated with Claude Code

10-task phased plan (Phase 1: additive, Phase 2: removal). Replaces prometheus + node-exporter + postgres-exporter with one netdata agent; keeps Loki + Alloy + Grafana for logs. Alerts go to public ntfy.sh with random topic stored in .env. Phase 1 leaves the deployment in a working dual-stack state so the user can validate Netdata for 24h before Phase 2 removes the old Prometheus services.

Adds netdata agent (v1.99.0) with full host visibility, Docker socket for container auto-discovery, named volumes for persistent state and resource limits (256m/1.0 default). Service is added but not yet started in this commit - configs come in subsequent tasks.

No other service in the project sets container_name explicitly - all rely on Compose's default ${COMPOSE_PROJECT_NAME}_<service>_1 naming. Forcing 'container_name: netdata' would break multi-stack hosts (dev + prod on one machine) with 'container name already in use' errors. Netdata's node-label-in-dashboard is set by 'hostname:' not 'container_name:' - that line stays.

netdata.conf disables registry + binds 0.0.0.0:19999 (reverse-proxied via nginx /netdata/ - not exposed on host). postgres.conf builds DSN from ${PG_*} env vars (works for both internal and external DB modes). health_alarm_notify.conf is shell-sourced override that enables only ntfy channel and routes all roles to ${NTFY_SERVER}/${NTFY_TOPIC}.

ensure-config-files.sh now recursively copies defaults/netdata/ to BPP_CONFIGS_DIR/netdata/ (copy_if_missing - non-destructive). init-configs.sh generates random NTFY_TOPIC (openssl rand -hex 16) for existing deployments missing the var, and ensures DJANGO_BPP_NTFY_SERVER defaults to https://ntfy.sh. Topic is a secret (anyone with the URL reads alerts), so it's logged once during the migration with a 'subscribe in app' hint.

Same pattern as /grafana/ - auth_request to /_bpp_superuser_auth gates access, trailing-slash proxy_pass strips the /netdata/ prefix. WebSocket headers enabled (Netdata uses WS for live charts), buffering disabled for stream-style data.

make ntfy-test - sends test push to NTFY_TOPIC from .env (confirms phone subscription works) make health-netdata - curl /api/v1/info via nginx and direct make logs-netdata - tail netdata container logs make netdata-shell - exec bash in netdata container

scripts/grant-pg-monitor.sh detects internal vs external DB mode. Internal: execs psql in dbserver and runs GRANT pg_monitor. External: prints the SQL for the DBA to run manually. Idempotent - GRANT can be re-run safely.

Three issues caught by cross-task Phase 1 review: - nginx /netdata/ used trailing-slash proxy_pass, stripping the URI prefix - Netdata then generated /static/* asset URLs that browser resolved to root (Django 404). Switch to regex location with named capture, preserve prefix, add X-Forwarded-Url for autodetection. Also add /netdata -> /netdata/ redirect for typed URLs without slash. - make health-netdata curled via nginx, hit auth_request, got 302 redirect, displayed as 'HTTP 302' looking like failure. Drop the nginx hop - container-direct check is the meaningful signal. - ensure-config-files.sh recursive copy was catching .gitkeep and leaking it to user's configs dir. Exclude with -not -name.

ntfy-test, health-netdata, logs-netdata, netdata-shell, grant-pg-monitor - all referenced in CLAUDE.md as the make-target source of truth but missing from 'make help' until now.

…porter Netdata (added in Phase 1) replaces all three: host metrics (node-exporter), Postgres stats (postgres-exporter via go.d/postgres), and time-series storage (prometheus). One container, ~200MB RAM, preconfigured alerts, push to ntfy.sh - vs three containers and zero alerts. Changes: - docker-compose.monitoring.yml: remove 3 services + prometheus_data volume - defaults/grafana/provisioning/datasources/datasources.yaml.tpl: remove Prometheus datasource, promote Loki to default - defaults/grafana/provisioning/dashboards/: delete disk-usage.json, http-performance.json, errors.json (all 3 referenced Prometheus - Netdata has equivalent built-in dashboards) - scripts/ensure-config-files.sh: drop prometheus seeding - scripts/configure-resources.sh: drop prometheus tunable, add netdata - scripts/upgrade-postgres.sh: drop postgres-exporter stop/restart lines - scripts/init-configs.sh + defaults/docker-compose.local_overrides.yml: drop node-exporter macOS override (entire file - no other content), also drop the include in docker-compose.yml and the .gitignore entry - docker-compose.database.external.yml: drop postgres-exporter from explanatory comments - docker-compose.yml: update volume-list comment defaults/prometheus/ kept as historical artifact - delete later if no rollback path needed. Existing deployments: prometheus_data volume becomes orphan, will be cleaned by 'make prune-orphan-volumes'. PROMETHEUS_*, NODE_EXPORTER_*, PG_EXPORTER_* env vars in user .env files are harmless (Docker Compose ignores unreferenced vars). No migration needed.

- Delete defaults/grafana/provisioning/dashboards/postgresql-health.json (referenced the removed Prometheus datasource in 64 places; Netdata Postgres collector dashboards cover the same metrics at /netdata/). - Flip DJANGO_BPP_ENABLE_PROMETHEUS default from true to false in docker-compose.application.yml. Nothing scrapes /metrics anymore, django-prometheus middleware is pure overhead. Existing deployments that set the var in .env keep their value (backwards compat).

Reflects the architectural change: netdata replaces prometheus + node-exporter + postgres-exporter (Phase 2 commit 16ae9c1 / 8cf921f). Loki + Alloy + Grafana stay for logs. Updated sections: - Architecture Overview > Services > Monitoring - Architecture Overview > Data Flow (metrics path) - Monitoring Access (URLs + new ntfy info) - Logging (drop Prometheus retention, add Netdata tiered retention) - Make Targets (new ntfy/netdata commands) - Resource Limits (drop prometheus/exporters, add netdata)

tests/test_makefile.sh asserted presence of prometheus dir + config, which Phase 2 (commit 16ae9c1) removed. Switched assertions to the new netdata structure: netdata.conf, go.d/postgres.conf, health_alarm_notify.conf, plus health.d/ directory. CI test_init_configs_creates_structure, test_init_configs_copies_templates, and test_init_configs_no_overwrite now reflect post-migration state.

v1.99.0 did not exist on Docker Hub (planning oversight - I picked a placeholder version without verifying). Netdata jumped from v1.47 straight to v2.0 - no v1.99 release line. v2 split [global] into [global] + [db], so the dbengine directives move: - memory mode -> [db] mode - page cache size -> [db] dbengine page cache size - dbengine multihost disk -> [db] dbengine tier 0 retention size space v2 maintains backwards-compat with v1 directive names (logs deprecation warnings) but cleaner to use current idiom upfront. Sizing kept identical: 512MB tier 0 retention, 32MB page cache, 1s update interval.

NETDATA_DISABLE_CLOUD=1 turns off the agent-side Cloud client entirely. Without it, Netdata v2's first-time dashboard pops a 'please connect your agent / docker exec netdata...' dialog even though the user is already authed via authserver+nginx. DO_NOT_TRACK and DISABLE_TELEMETRY (already set) only suppress anonymous-stats phone-home, not the Cloud claim prompt - those are separate code paths.

tests/test_makefile.sh Test 14e ran nginx container that created letsencrypt cert files as root (container default user). Teardown's bare rm failed on Ubuntu GHA (runner user != root) with 'Permission denied' - the test itself passed all assertions, only cleanup exit code was non-zero. macOS Docker Desktop uses a user-mapped VM, so files appear as runner-owned and rm works there. Ubuntu runs native Docker with no user mapping. This pre-existing failure has been red on main for every commit since the LE test was added. Fix: try sudo rm first (GHA has passwordless sudo), fall back to plain rm (no-op since files already gone, or harmless error message on dev machine without sudo). Applied to all rm sites in Test 14 (LE certs) and Test 15 (runtime ssl/) that touch dirs populated by docker run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two bugs caught on the deployment host: 1) scripts/grant-pg-monitor.sh sourced .env as bash, breaking on shell-unfriendly values like 'EMAIL=Name <addr@domain>' (the < is parsed as redirect). Switched to grep-based extraction matching the get_env_var helper pattern from init-configs.sh. 2) defaults/netdata/go.d/postgres.conf used ${PG_*} env var placeholders, assuming go.d.plugin would substitute them. It does not (verified on v2.10.3): the literal string went through to the URL parser which choked on ':${PG_PORT}'. Reworked as a template (.tpl) rendered host-side by ensure-config-files.sh on every make up/refresh - so password changes in .env propagate on next deploy. Removed now-useless PG_* env vars from compose (NTFY_* stay - those ARE used because health_alarm_notify.conf is bash-sourced). Auto-generated file lives at \$BPP_CONFIGS_DIR/netdata/go.d/postgres.conf with a clear DO-NOT-EDIT header.

Test 4 (init-configs copies templates) was asserting that $CONFIG_DIR/netdata/go.d/postgres.conf exists after init-configs. After the .tpl rendering refactor, that file is generated from the .tpl by ensure-config-files.sh - but only when .env exists, and init-configs.sh invoked ensure-config-files BEFORE creating .env (intentionally - to seed the directory layout first). Fix: invoke ensure-config-files.sh a second time at the end of init-configs, after .env is fully populated. Idempotent - just re-renders postgres.conf (and any other .env-dependent configs we add later).

…_log to Loki Dodaje monitorowanie nginx w netdacie oraz access_log w Grafanie/Loki. nginx (stub_status) -> netdata: - default.conf.template: wewnetrzny server { listen 8090; /stub_status } (port niepublikowany w compose, osiagalny tylko netdata->webserver:8090) - defaults/netdata/go.d/nginx.conf: kolektor live metryk polaczen access_log -> Loki + web_log: - 00-log-format.conf: format bpp_access (combined + request_time/ upstream_response_time/request_length), ladowany w kontekscie http - vhost.conf.template: dwa sinki access_log w bpp_access: /dev/stdout (-> Alloy -> Loki) ORAZ plik na wolumenie nginx_access_log - defaults/netdata/go.d/web_log.conf: kolektor metryk z access logu (kody HTTP, latencja) + alerty 5xx/latencja; log_type auto + escape-hatch - infrastructure.yml: mount 00-log-format.conf, wolumen nginx_access_log, skrypt rotacji + label Ofelia (04:10) - monitoring.yml: nginx_access_log RO do netdaty - scripts/nginx-access-log-rotate.sh: mv .1 + nginx -s reopen (Docker log driver nie rotuje plikow, tylko stdout/stderr) cleanup po migracji Prometheus->Netdata: - datasources.yaml.tpl: deleteDatasources Prometheus (kasuje martwy datasource z grafana_data na upgrade'owanych instalacjach) testy + docs: - test_makefile.sh: asercje dla nginx.conf/web_log.conf - CLAUDE.md: sekcje go.d collectors, nginx access log, data flow Zweryfikowane: nginx -t (realny kontener nginx:1.29.7), docker compose config (merge 7 plikow, wolumen rozwiazuje sie cross-file), make init-configs (nowe pliki go.d kopiuja sie). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two complementary channels for tracking slow PostgreSQL queries, both reusing existing infrastructure (Loki + Grafana + PostgreSQL datasource): - log_min_duration_statement=1000: every query >1s logged to dbserver log -> Alloy -> Loki (90d retention). Grafana dashboard 'Slow queries (log)' renders via LogQL with regex extraction of duration and query text. Natural time-windowing via UI time picker. - pg_stat_statements: aggregated stats per normalized query (calls, mean/total/stddev exec time, rows). Grafana dashboard 'Top 100 queries (pg_stat_statements)' via existing PostgreSQL datasource. Manual pg_stat_statements_reset() for rolling time windows. Bootstrap: make pg-monitoring-setup - ALTER SYSTEM SET log_min_duration_statement = 1000 + reload - Append pg_stat_statements to shared_preload_libraries (preserving existing libs), restart dbserver, CREATE EXTENSION - Idempotent, detects external DB mode (prints SQL for DBA)

Commit 566b146 added defaults/webserver/00-log-format.conf (log_format bpp_access ...) and wired it into production via docker-compose.infrastructure.yml. The test helper _run_nginx_t builds its own nginx container with explicit mounts and didn't propagate the new file - nginx -t failed with 'unknown log format bpp_access' in 6 different test 14 / 15 variants. Fix: mount the same file in the test container, matching the production configuration. Also create and mount /var/log/nginx-shared/ so the access_log file destination in vhost.conf.template can be opened. Pure test plumbing - no production behavior change.

…erride - Mount host root (/:/host/root:ro,rslave) so diskspace.plugin reports used/avail/% for ALL host partitions (df), not just container fs. No NETDATA_HOST_PREFIX needed — image knows the /host/root convention. - Remove custom healthcheck: it called `wget --spider`, but the netdata image ships no wget (only curl/nc), so the container was ALWAYS reported unhealthy despite a working agent. Image's built-in HEALTHCHECK /usr/sbin/health.sh is correct and maintained upstream. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…boards Error Monitoring: - data link na "Error Count Over Time": klik w serię serwera ustawia var-service i filtruje panel "Error Logs" - "Error Logs" wyzszy (h 16 -> 24) + enableInfiniteScrolling Top 100 queries (pg_stat_statements): - towarzyszacy bar chart "Top 15 by mean execution time"; klik w slupek ustawia zmienna qid i zaweza tabele do tego queryid - tabela honoruje $qid (puste = wszystkie 100); pole qid u gory do resetu - pg_stat_statements nie ma osi czasu, wiec filtr jest po queryid (migawka) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Dashboard "PostgreSQL: Storage & tables" (uid postgresql-storage): rozmiar bazy, najwieksze tabele/indeksy (top 20), dead tuples & autovacuum, szacowany bloat tabel i indeksow. Datasource grafana-postgresql-datasource. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ensure-config-files.sh: dashboardy Grafany (grafana/provisioning/dashboards/*) sa teraz force-syncowane z defaults/ przy kazdym make up/refresh/run (copy_always, overwrite tylko gdy tresc sie rozni). Wczesniej copy_if_missing pomijal istniejace pliki, wiec zaktualizowany dashboard nie trafial na zywy deployment bez recznego cp. User-tunable configi (loki/netdata/alloy) zostaja copy_if_missing. Docs: CLAUDE.md + README opisuja force-sync oraz komplet dashboardow (Error Monitoring z cross-filterem serwera, companion bar chart + klik-filtr na pg_stat_statements, Storage & tables). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Host) Redirect @bpp_login budowal URL z $http_host. Pod HTTP/3 (QUIC) nie ma naglowka Host: — jest pseudo-naglowek :authority — wiec $http_host jest PUSTY i przegladarka dostawala 302 na https:///__external_auth/login/?next=https:///... (bez domeny). Firefox po Alt-Svc przelaczal sie na h3 i trafial na bug; Safari (jeszcze h2) dzialal. $host bierze wartosc z :authority/Host/server_name, wiec poprawny we wszystkich protokolach. Wlaczamy h3 w vhost.conf.template (listen 443 quic + Alt-Svc), wiec to realny regres dla kazdego /grafana /netdata /dozzle /flower przy wygaslej sesji. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ce table Dashboard loguje tez INFO/WARN, nie tylko bledy -> "Error Monitoring" przemianowany na "Log Monitoring" (uid zostaje error-monitoring, zeby nie osierocic provisioned dashboardu / nie zepsuc zakladek). Gorny wykres "Log volume by level over time": rozbity po detected_level (stackowane slupki, kolory per poziom: error=czerwony, warn=pomaranczowy, info=zielony, debug=niebieski). Klik w serie poziomu ustawia var-level. Nowy panel-tabela "By service (click to filter)": liczba linii per serwer w zakresie czasu; klik w wiersz ustawia var-service. Tabela nie filtruje sie po $service (zostaje pelnym menu do przelaczania), respektuje container/level. Dolny panel przemianowany na "Logs". Efekt: filtrowanie po serwerze (klik w tabele) ORAZ po poziomie (klik w serie wykresu), plus wizualne rozroznienie poziomow na wykresie. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ighten netdata ACL - ntfy-test: nie drukuj sekretnego NTFY_TOPIC na stdout (historia/CI/tee) - health-netdata: curl zamiast wget (obraz netdaty nie ma wget -> zawsze failowalo) - netdata.conf: allow badges/streaming from = sieci Dockera+localhost zamiast * (single-agent, brak parent/child; * pozwalal kazdemu kontenerowi wstrzykiwac metryki) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…/Netdata Grafana datasource laczyl sie uzytkownikiem APLIKACJI (RW na produkcji), a GF_USERS_AUTO_ASSIGN_ORG_ROLE=Admin + panel SQL = kazdy zalogowany mogl wykonac dowolny DML/DDL. Teraz osobna read-only rola bpp_monitor (pg_monitor + pg_read_all_data, bez DDL/DML). - scripts/create-monitoring-user.sh (NOWY): idempotentny CREATE/ALTER ROLE + granty. Internal: psql przez docker exec jako superuser, PGPASSWORD przez -e (nie w argv), ON_ERROR_STOP=1. External: wypisuje SQL. --soft: nie blokuje make up gdy DB jeszcze nie wstala. Walidacja hasla [A-Za-z0-9] (literal SQL). - datasources.yaml.tpl + postgres.conf.tpl: lacza sie jako bpp_monitor (BEZ fallbacku do usera Django - rola ma istniec). - ensure-config-files.sh: self-heal sekretow (DJANGO_BPP_PG_MONITOR_PASSWORD, NTFY_TOPIC) append-only -> git pull && make up na starym .env dziala bez recznych krokow. _esc escapuje teraz backslash. postgres.conf renderowany atomowo (tmp+mv) + chmod 600 (haslo w DSN). - pg-monitoring-setup.sh + grant-pg-monitor.sh: tryb external wykrywany przez BPP_DATABASE_COMPOSE (nie obecnosc serwisu - sentinel tez zwie sie dbserver). PGPASSWORD + ON_ERROR_STOP. Walidacja shared_preload_libraries przed ALTER. grant-pg-monitor -> alias do create-monitoring-user. - up/refresh: wolaja create-monitoring-user.sh --soft (rola ma istniec). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tpl + review fixes Continues the Prometheus->Netdata migration; bundles in-flight datasource work with code-review fixes on PR #1. bpp_monitor (security): - Drop pg_read_all_data; keep only pg_monitor. Grafana auto-promotes every authenticated user to Admin and exposes an ad-hoc SQL panel, so a data-read grant would let any Grafana user read employee PII. All shipped dashboards query stat-views / catalog / size functions only (verified) - pg_monitor suffices; the Netdata postgres collector needs only it too. - pg-monitoring-setup external mode now also emits the bpp_monitor role SQL (was: printed slow-query SQL and exited before creating the monitor user). - ensure-config-files: warn loudly when the postgres.conf render is skipped, so a stale pre-migration DSN (app superuser) cannot silently persist. datasource / config rendering (in-flight): - Force-sync datasources.yaml.tpl (copy_always) so upgrades pick up the bpp_monitor switch + deleteDatasources: Prometheus cleanup. - Extract generate-grafana-datasources.sh (reads .env from disk, atomic render). - _ensure_secret treats empty 'VAR=' as missing; default PG port 5432. - NTFY_SERVER: $(or $(strip ...)) fallback for old .env; qid filter uses ${qid:sqlstring} (no ::bigint crash on non-numeric input). cleanup: - Remove `make health-netdata`: Netdata has a built-in image HEALTHCHECK, and the wrapper masked curl failure through the head pipe (always exited 0). - Remove prometheus.yml + stale health-netdata / grant-pg-monitor doc refs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

README still described the old Prometheus stack. Sync it to what ships now: - add /netdata/ to the monitoring access paths - config-dir tree: drop prometheus/, add loki/ + netdata/ (go.d, health.d, ntfy) - "Monitoring i logi": add `make logs-netdata` + `make ntfy-test` - configure-resources high-risk list: prometheus -> netdata - services table: replace prometheus row with netdata (metrics + ntfy push) - server-move section: prometheus_data -> netdata_lib + netdata_cache volumes Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ewhere Test 7 wrote a custom marker into netdata.conf and asserted it survived re-init (copy_if_missing). netdata.conf is now force-synced (rendered from netdata.conf.tpl for the registry-announce URL), so the marker is overwritten by design and the assertion failed in CI. Test preservation on health_alarm_notify.conf instead, which stays copy_if_missing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mpasternak and others added 30 commits May 31, 2026 11:06

feat(netdata): script + make target for pg_monitor grant

f80b07a

scripts/grant-pg-monitor.sh detects internal vs external DB mode. Internal: execs psql in dbserver and runs GRANT pg_monitor. External: prints the SQL for the DBA to run manually. Idempotent - GRANT can be re-run safely.

docs(make): document new netdata + ntfy targets in make help

67616bb

ntfy-test, health-netdata, logs-netdata, netdata-shell, grant-pg-monitor - all referenced in CLAUDE.md as the make-target source of truth but missing from 'make help' until now.

mpasternak and others added 3 commits May 31, 2026 22:14

mpasternak merged commit 28b3515 into main May 31, 2026
5 checks passed

mpasternak deleted the feat/netdata-monitoring branch May 31, 2026 21:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: replace Prometheus monitoring with Netdata + ntfy.sh alerts#1

feat: replace Prometheus monitoring with Netdata + ntfy.sh alerts#1
mpasternak merged 33 commits into
mainfrom
feat/netdata-monitoring

mpasternak commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mpasternak commented May 31, 2026

Summary

What's added

What's removed

Backwards compatibility

Test plan

Plan & history

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant