Telemon

Lightweight, self-managing system health monitor with intelligent alerts. Zero maintenance. Zero spam.

Telemon is a single-file Bash script that monitors your Linux server — CPU, memory, disk, containers, services, ports, SSL certs, hardware health, databases, and more — and only alerts when something actually changes. No spam, just signal. It runs via cron every 5 minutes and requires zero ongoing maintenance.

🚀 One-Line Install

curl -fsSL https://raw.githubusercontent.com/SwordfishTrumpet/telemon/main/install.sh | bash

That's it. The installer prompts for Telegram credentials and configures everything automatically.

For silent/CI/CD installs (no prompts):

TELEGRAM_BOT_TOKEN="xxx" TELEGRAM_CHAT_ID="yyy" \
  curl -fsSL https://raw.githubusercontent.com/SwordfishTrumpet/telemon/main/install.sh | bash -s -- --silent

📖 Full installation options | 🔧 Manual install

Why Telemon?

Key Strengths

Strength	Description
Zero Dependencies	Core monitoring works with just `bash` + `curl`. All optional features auto-detect and gracefully skip if tools are missing.
Stateful Alert Tracking	Only alerts on state changes (OK→WARNING→CRITICAL). Confirmation count + per-key cooldowns prevent false alarms and spam.
Self-Managing	Self-rotating logs, automatic stale lock cleanup, retry queues for failed alerts. Runs indefinitely without maintenance.
Security-First	Secrets never passed on command lines, input validation, SSRF protection, atomic file writes with symlink protection, HTML escaping.
Battle-Tested	Portable across GNU Linux and BSD, handles edge cases (hung commands, overlapping runs, flapping checks).
Auto-Discovery	Scans your system and suggests configuration for detected hardware, services, databases, and applications.
Enterprise Features	Fleet monitoring (multi-server), predictive resource exhaustion, config drift detection, audit logging, auto-remediation, maintenance windows.

How Alerting Works

Telemon uses stateful tracking — it remembers the previous state of each check and only notifies on transitions. This eliminates false alarms and alert spam:

Check 1: CPU=85% → count=1/3, silent (collecting evidence)
Check 2: CPU=88% → count=2/3, silent (still collecting)
Check 3: CPU=87% → count=3/3, ALERT SENT (confirmed problem)
Check 4: CPU=87% → silent (already alerted)
Check 5: CPU=40% → RESOLVED ALERT SENT (problem cleared)

Confirmation count: Problem must persist N consecutive checks (default: 3 = 15 min)
Rate limiting: Per-key cooldown prevents flapping floods (default: 15 min)
Recovery alerts: Get notified when problems clear (CRITICAL → OK transitions trigger alerts)
Retry queue: Failed Telegram alerts retry next cycle

Dependencies

Core (Always Required)

Dependency	Purpose	Can Skip?
`bash` 4.0+	Script execution	No
`curl`	Telegram API, HTTP checks	No
`/proc/*`	CPU, memory, I/O, network metrics	No (Linux-specific)

Alert Channels (At least one recommended)

Channel	Dependency	Required For
Telegram	`curl`	Primary alerts
Webhook	`python3`	Slack/Discord/ntfy/n8n integration
Email	`curl` (SMTP) or `sendmail`/`msmtp`	Email alerts

Optional Checks (Auto-Detect, Gracefully Skip)

Check	Dependency	Detection
Docker containers	`docker`	Auto-enabled if command found
PM2 processes	`pm2`, `python3`	Auto-enabled if both found
NVMe/SMART health	`smartctl`	Auto-detected via `telemon-admin.sh discover`
CPU temperature	`sensors` (lm-sensors)	Auto-detected via discover
GPU (NVIDIA)	`nvidia-smi`	Auto-detected via discover
GPU (Intel)	`intel_gpu_top`	Auto-detected via discover
UPS/Battery	`upower` or `apcaccess`	Auto-detected via discover
DNS	`dig`/`nslookup`/`host`	First available used
MySQL/MariaDB	`mysql`/`mariadb`	Auto-detected via discover
PostgreSQL	`psql`	Auto-detected via discover
Redis	`redis-cli`	Auto-detected via discover
SQLite3	`sqlite3`	Config-driven
ODBC	`isql` (unixODBC)	Config-driven
File integrity	`sha256sum`/`shasum`/`openssl`	Any available

Administrative (Optional)

Tool	Purpose	Fallback
`flock` (util-linux)	Atomic lock file	PID file mechanism
`python3`	Webhooks, JSON export, escalation	Features disabled
`awk`	Predictive alerts	Pure Bash math
`logrotate`	Log rotation	Self-rotating logs

Design Philosophy

"Graceful skip if dependency missing"

Every optional check follows this pattern:

if ! command -v mytool &>/dev/null; then
    log "INFO" "myfeature check: mytool not installed — skipping"
    return
fi

Bottom line: Telemon runs on virtually any Linux system with just bash and curl. All advanced features are opt-in and auto-detect available tools.

Features

Core System Monitoring

CPU Load — 1-minute load average as percentage of available cores
Memory — Available memory percentage (inverted thresholds: lower = worse)
Disk Space — Per-partition monitoring, auto-filters tmpfs/overlay/snap
Swap Usage — Swap partition monitoring, gracefully skips if no swap
I/O Wait — CPU time spent waiting for disk I/O (stateful differential sampling)
Zombie Processes — Detects processes stuck in Z state
Internet Connectivity — Ping-based reachability with configurable target

Process & Service Monitoring

System Processes — Monitors via pgrep with systemctl fallback
Failed Systemd Services — System-wide scan for failed units
Docker Containers — Status and health checks (gracefully skips if unavailable)
PM2 Processes — Node.js process monitoring via pm2 jlist

Website & Endpoint Monitoring

HTTP/HTTPS Health — Availability, HTTP status codes, response times
SSL Certificate Expiry — Cross-platform via openssl with date parsing fallback
TCP Port Checks — Reachability testing via /dev/tcp
DNS Resolution — Health checking via dig, nslookup, or host
DNS Record Validation — Verify A, AAAA, MX, TXT, CNAME, NS, SOA, PTR, SRV, CAA records

Extended Monitoring

CPU Temperature — Thermal monitoring via lm-sensors
GPU Monitoring — NVIDIA via nvidia-smi or Intel via intel_gpu_top
UPS / Battery — Charge level monitoring via upower or apcaccess
Network Bandwidth — Interface throughput monitoring
NVMe / SMART Health — Critical warning byte, endurance wear, temperature, media errors
Log Pattern Matching — Watch log files for regex patterns
File Integrity — SHA256 checksum monitoring for critical files
Config Drift Detection — Rich change tracking with unified diffs
Cron Job Heartbeats — Detect stale cron jobs via heartbeat file age

Predictive & Fleet Features

Predictive Resource Exhaustion — Linear regression to alert before disk/memory runs out
Fleet Monitoring — Multi-server heartbeat aggregation via shared directory
Auto-Remediation — Automatically restart failed systemd services
Maintenance Windows — Flag file or scheduled recurring windows

Plugin System

Directory-Based Plugins — Place executable scripts in checks.d/
Simple Output Format — Plugins output STATE|KEY|DETAIL
Security-First — Timeout protection, symlinks skipped, output validated

Database Health Checks

MySQL/MariaDB — Connection check and replication lag monitoring
PostgreSQL — Connection check and streaming replication lag
Redis — Connection check, authentication, master/replica status
SQLite3 — File integrity, size thresholds, corruption detection
ODBC — Universal support for SQL Server, Oracle, DB2, etc.

Alert Channels & Intelligence

Multi-Channel — Telegram (primary), webhooks (Slack/Discord/ntfy), email
Retry/Queue — Failed Telegram alerts queue to disk and retry
Rate Limiting — Per-key cooldown prevents alert floods
Escalation — Separate webhook for unresolved alerts after N minutes
Top Processes — Auto-capture CPU/memory hogs in alerts

Exports & Integrations

Prometheus — Textfile export for node_exporter
JSON Status — Machine-readable status API
Static HTML Status Page — Self-contained dashboard
Health Digest — Scheduled full health summaries
Audit Logging — Structured JSON logs for compliance

Quick Install (One-Liner)

Interactive Install (Recommended for First Time)

curl -fsSL https://raw.githubusercontent.com/SwordfishTrumpet/telemon/main/install.sh | bash

Or install to a custom directory:

curl -fsSL https://raw.githubusercontent.com/SwordfishTrumpet/telemon/main/install.sh | bash -s -- /opt/telemon

Silent/Automated Install (CI/CD, Ansible, Cloud Init)

# Basic silent install (auto-detects Docker/PM2)
TELEGRAM_BOT_TOKEN="123456789:ABCdefGHIjklMNOpqrSTUvwxyz" \
TELEGRAM_CHAT_ID="123456789" \
  curl -fsSL https://raw.githubusercontent.com/SwordfishTrumpet/telemon/main/install.sh | bash -s -- --silent

# Advanced silent install with all options
TELEGRAM_BOT_TOKEN="xxx" \
TELEGRAM_CHAT_ID="yyy" \
SERVER_LABEL="web-prod-01" \
ENABLE_DOCKER=true \
ENABLE_PM2=true \
ENABLE_SITES=true \
SITE_URLS="https://example.com https://api.example.com" \
  curl -fsSL https://raw.githubusercontent.com/SwordfishTrumpet/telemon/main/install.sh | bash -s -- --silent

Silent Mode Features:

✅ No interactive prompts — perfect for automation
✅ Auto-detects Docker and PM2 (enables if found)
✅ Uses sensible defaults for all settings
✅ Merges with existing .env if present (safe for updates)
✅ Fails gracefully with error codes for CI/CD

Silent Mode Environment Variables:

Variable	Required	Default	Description
`TELEGRAM_BOT_TOKEN`	Yes	—	Your Telegram bot token
`TELEGRAM_CHAT_ID`	Yes	—	Your Telegram chat ID
`SERVER_LABEL`	No	`hostname`	Server name in alerts
`ENABLE_DOCKER`	No	`auto`	`auto`/`true`/`false`
`ENABLE_PM2`	No	`auto`	`auto`/`true`/`false`
`ENABLE_SITES`	No	`false`	Enable website monitoring
`SITE_URLS`	No	—	Space-separated URLs
`TELEMON_SILENT`	No	`false`	Alternative to `--silent` flag
`TELEMON_SYSTEMD`	No	`false`	Alternative to `--systemd` flag

Systemd Timer Install (Alternative to Cron)

# Interactive install with systemd timer
curl -fsSL https://raw.githubusercontent.com/SwordfishTrumpet/telemon/main/install.sh | bash -s -- --systemd

# Silent install with systemd timer
TELEGRAM_BOT_TOKEN="xxx" TELEGRAM_CHAT_ID="yyy" \
  curl -fsSL https://raw.githubusercontent.com/SwordfishTrumpet/telemon/main/install.sh | bash -s -- --silent --systemd

Systemd Features:

✅ Works on systems without crontab
✅ Auto-detects user vs system install
✅ Uses user systemd by default (no root required)
✅ Journal integration for logging (journalctl -u telemon)

What the Installer Does

Downloads the latest Telemon files from GitHub
Configures your Telegram credentials
Sets up optional monitoring (Docker, PM2, websites — auto-detected)
Installs a cron job or systemd timer (runs every 5 minutes)
Validates the configuration and sends a test alert

Installer Options

bash install.sh [OPTIONS] [INSTALL_DIR]

Options:
  --silent      Non-interactive mode (uses env vars for config)
  --systemd     Use systemd timer instead of cron
  --skip-test   Skip the test notification at the end
  --help, -h    Show help message

Examples:
  bash install.sh                          # Interactive, default dir
  bash install.sh /opt/telemon             # Interactive, custom dir
  bash install.sh --silent                 # Silent mode
  bash install.sh --systemd                # Use systemd timer
  bash install.sh --silent --systemd       # Silent + systemd

Requirements for One-Line Install

Linux server with curl, bash, and standard /proc filesystem
Your Telegram bot token and chat ID (see below)

Quick Start (Manual Install)

Prerequisites

Linux server (Ubuntu, Debian, CentOS/RHEL, Alpine)
Bash 4.0+, curl
Telegram bot token and chat ID (see below)

Installation

# 1. Clone the repository
git clone https://github.com/SwordfishTrumpet/telemon.git
cd telemon

# 2. Copy the example config and edit it
cp .env.example .env
nano .env  # Add your Telegram bot token and chat ID

# 3. Run the installer
bash install.sh

That's it. Telemon sends you a test message and then monitors silently until something needs your attention.

Auto-Discovery

Telemon can automatically detect services, hardware, and infrastructure and suggest configuration:

bash telemon-admin.sh discover

Scans your system and generates .env suggestions for:

Category	Detected Items
Hardware	NVMe drives, NVIDIA/Intel GPUs, UPS (APC/NUT/upower), lm-sensors, RAID (mdadm, ZFS, LVM)
Infrastructure	Docker Swarm, Kubernetes, Proxmox VE, KVM/QEMU, NFS/SMB mounts, WireGuard, Tailscale, HAProxy
Databases	MySQL/MariaDB, PostgreSQL, Redis (only if servers are running, not just clients)
Applications	RabbitMQ, Mosquitto (MQTT), Fail2ban, CrowdSec
Core Services	Docker containers, PM2 processes, Nginx, Apache, Systemd services
Smart Thresholds	CPU and memory thresholds based on your actual hardware specs

Discovery Output Example

=== Hardware ===
✓ NVMe drives detected (2): /dev/nvme0n1, /dev/nvme1n1
✓ NVIDIA GPU detected: NVIDIA GeForce RTX 3080
✓ lm-sensors configured

=== Infrastructure ===
✓ Docker Swarm (manager node)
✓ ZFS pools detected: tank, rpool

=== Databases ===
✓ MySQL/MariaDB server running
✓ Redis server running

=== Smart Thresholds ===
✓ Thresholds suggested based on system specs: 64GB RAM, 16 cores

===============================================
Suggested Configuration
===============================================

# NVMe health monitoring
ENABLE_NVME_CHECK=true

# NVIDIA GPU monitoring  
ENABLE_GPU_CHECK=true

# CPU temperature monitoring
ENABLE_TEMP_CHECK=true

# Smart Thresholds (based on system specs: 64GB RAM, 16 cores)
MEM_THRESHOLD_WARN=10
MEM_THRESHOLD_CRIT=5
CPU_THRESHOLD_WARN=80
CPU_THRESHOLD_CRIT=90

# MySQL/MariaDB (detected running)
DB_MYSQL_HOST="localhost"
DB_MYSQL_PORT="3306"
...

Simply copy the suggested lines into your .env file, customize as needed, and validate with bash telemon.sh --validate.

Configuration

All configuration lives in .env. Key principles:

Every check has an ENABLE_* flag (core checks default true, extended checks default false)
Thresholds follow *_THRESHOLD_WARN / *_THRESHOLD_CRIT pattern
Lists are space-separated strings
See .env.example for all options with documentation

Minimal Config

# Telegram credentials (required)
TELEGRAM_BOT_TOKEN="your-bot-token"
TELEGRAM_CHAT_ID="your-chat-id"

Everything else has sensible defaults. Core checks (CPU, memory, disk, swap, I/O wait, zombies, internet, processes, systemd) are enabled by default.

Enable/Disable Checks

# Core checks (default: true)
ENABLE_CPU_CHECK=true
ENABLE_MEMORY_CHECK=true
ENABLE_DISK_CHECK=true
ENABLE_SWAP_CHECK=true
ENABLE_IOWAIT_CHECK=true
ENABLE_ZOMBIE_CHECK=true
ENABLE_INTERNET_CHECK=true
ENABLE_SYSTEM_PROCESSES=true
ENABLE_FAILED_SYSTEMD_SERVICES=true

# Service checks (default: false)
ENABLE_DOCKER_CONTAINERS=false
ENABLE_PM2_PROCESSES=false
ENABLE_SITE_MONITOR=false
ENABLE_NVME_CHECK=false

# Extended checks (default: false)
ENABLE_TCP_PORT_CHECK=false
ENABLE_TEMP_CHECK=false
ENABLE_DNS_CHECK=false
ENABLE_DNS_RECORD_CHECK=false
ENABLE_GPU_CHECK=false
ENABLE_UPS_CHECK=false
ENABLE_NETWORK_CHECK=false
ENABLE_LOG_CHECK=false
ENABLE_INTEGRITY_CHECK=false
ENABLE_CRON_CHECK=false

# Fleet monitoring (default: false)
ENABLE_HEARTBEAT=false
ENABLE_FLEET_CHECK=false

# Predictive alerts (default: false)
ENABLE_PREDICTIVE_ALERTS=false

# Exports (default: false)
ENABLE_PROMETHEUS_EXPORT=false
ENABLE_JSON_STATUS=false

# Audit logging (default: false)
ENABLE_AUDIT_LOGGING=false

Thresholds

# CPU: % of available cores (1-min load avg)
CPU_THRESHOLD_WARN=70
CPU_THRESHOLD_CRIT=80

# Memory: % free remaining (inverted — lower = worse)
MEM_THRESHOLD_WARN=15
MEM_THRESHOLD_CRIT=10

# Disk: % used
DISK_THRESHOLD_WARN=85
DISK_THRESHOLD_CRIT=90

# Swap: % used
SWAP_THRESHOLD_WARN=50
SWAP_THRESHOLD_CRIT=80

# I/O Wait: % CPU time
IOWAIT_THRESHOLD_WARN=30
IOWAIT_THRESHOLD_CRIT=50

# Zombies: process count
ZOMBIE_THRESHOLD_WARN=5
ZOMBIE_THRESHOLD_CRIT=20

# Internet connectivity
PING_TARGET="8.8.8.8"
PING_FAIL_THRESHOLD=3

# CPU temperature (°C)
TEMP_THRESHOLD_WARN=75
TEMP_THRESHOLD_CRIT=90

# GPU temperature (°C) — NVIDIA only
GPU_TEMP_THRESHOLD_WARN=80
GPU_TEMP_THRESHOLD_CRIT=95

# Intel GPU thresholds
GPU_INTEL_UTIL_THRESHOLD_WARN=80
GPU_INTEL_UTIL_THRESHOLD_CRIT=95
GPU_INTEL_TEMP_THRESHOLD_WARN=80
GPU_INTEL_TEMP_THRESHOLD_CRIT=95

# Network bandwidth (Mbit/s)
NETWORK_THRESHOLD_WARN=800
NETWORK_THRESHOLD_CRIT=950

# Battery/UPS charge (%) — inverted: lower = worse
UPS_THRESHOLD_WARN=30
UPS_THRESHOLD_CRIT=10

Alert Channels

# Telegram (required)
TELEGRAM_BOT_TOKEN="your-bot-token"
TELEGRAM_CHAT_ID="your-chat-id"

# Webhook — JSON POST to any URL (optional, requires python3)
WEBHOOK_URL="https://hooks.slack.com/services/xxx/yyy/zzz"

# Email — plain text via sendmail/msmtp (optional)
EMAIL_TO="admin@example.com"
EMAIL_FROM="telemon@myserver.com"

# Escalation — separate webhook for unresolved alerts (optional, requires python3)
ESCALATION_WEBHOOK_URL="https://hooks.slack.com/services/aaa/bbb/ccc"
ESCALATION_AFTER_MIN=30

Alert Tuning

# Consecutive checks required before alerting (default: 3)
CONFIRMATION_COUNT=3

# Per-key cooldown between alerts (default: 900s = 15 min)
ALERT_COOLDOWN_SEC=900

# Top processes included in CPU/memory alerts
TOP_PROCESS_COUNT=5

# Command timeout for external tools (seconds)
CHECK_TIMEOUT=30

What to Monitor

# System processes (checked via pgrep/systemctl)
CRITICAL_SYSTEM_PROCESSES="sshd cron nginx"

# Docker containers (use names from: docker ps --format '{{.Names}}')
CRITICAL_CONTAINERS="redis nginx myapp"

# PM2 processes
CRITICAL_PM2_PROCESSES="api worker scheduler"

# Websites / endpoints
CRITICAL_SITES="https://example.com https://api.example.com|max_response_ms=3000"

# TCP ports
CRITICAL_PORTS="localhost:22 db-server:5432 192.168.1.1:443"

# DNS check domain
DNS_CHECK_DOMAIN="example.com"

# Network interface (auto-detected if empty)
NETWORK_INTERFACE=""

# Log pattern matching
LOG_WATCH_FILES="/var/log/syslog /var/log/auth.log"
LOG_WATCH_PATTERNS="OOM|error|panic"
LOG_WATCH_LINES=100

# File integrity monitoring
INTEGRITY_WATCH_FILES="/etc/passwd /etc/ssh/sshd_config"

# Configuration drift detection
ENABLE_DRIFT_DETECTION=true
DRIFT_WATCH_FILES="/etc/nginx/nginx.conf /etc/ssh/sshd_config"
DRIFT_IGNORE_PATTERN="^[+-]?\s*#"
DRIFT_MAX_DIFF_LINES=20
DRIFT_SENSITIVE_FILES="/etc/shadow /etc/gshadow"

# Cron heartbeat tracking (name:touchfile:max_age_minutes)
CRON_WATCH_JOBS="backup:/tmp/backup_heartbeat:1440 report:/tmp/report_heartbeat:60"

# NVMe device
NVME_DEVICE="/dev/nvme0n1"
NVME_TEMP_THRESHOLD_WARN=70
NVME_TEMP_THRESHOLD_CRIT=80

# Auto-restart failed systemd services
AUTO_RESTART_SERVICES="nginx sshd"

Maintenance Windows

# Flag file — touch to silence, rm when done
MAINT_FLAG_FILE="/tmp/telemon_maint"

# Scheduled recurring windows (semicolon-separated)
MAINT_SCHEDULE="Sun 02:00-04:00;Sat 03:00-05:00"

Exports

# Prometheus textfile export
ENABLE_PROMETHEUS_EXPORT=true
PROMETHEUS_TEXTFILE_DIR="/var/lib/node_exporter/textfile_collector"

# JSON status file
ENABLE_JSON_STATUS=true
JSON_STATUS_FILE="/opt/telemon/status.json"

Predictive Resource Exhaustion

ENABLE_PREDICTIVE_ALERTS=true
PREDICT_HORIZON_HOURS=24
PREDICT_DATAPOINTS=48
PREDICT_MIN_DATAPOINTS=12

Telemon uses linear regression on historical datapoints to predict when a resource will reach 100%. If the trend line projects exhaustion within 24 hours, a WARNING alert fires.

Fleet Monitoring

# Server identity — used in alert headers and heartbeat files
SERVER_LABEL="web-prod-01"

# Heartbeat sender (all instances)
ENABLE_HEARTBEAT=true
HEARTBEAT_MODE="file"
HEARTBEAT_DIR="/shared/telemon/heartbeats"

# Fleet monitor (one designated instance)
ENABLE_FLEET_CHECK=true
FLEET_HEARTBEAT_DIR="/shared/telemon/heartbeats"
FLEET_STALE_THRESHOLD_MIN=15
FLEET_CRITICAL_MULTIPLIER=2
FLEET_EXPECTED_SERVERS="web-prod-01 db-prod-01 api-staging"

How it works:

Every instance writes a heartbeat file after each run
One instance monitors the directory and alerts on stale/missing servers
If a server's heartbeat goes stale → WARNING/CRITICAL alert

Plugin System

# Enable plugin system
ENABLE_PLUGINS=true
# CHECKS_DIR="/opt/telemon/custom-checks"

Plugins output STATE|KEY|DETAIL:

OK|my_check|Everything is working
WARNING|my_check|Resource at 85%
CRITICAL|my_check|Service not responding

See Plugin Examples below.

Database Health Checks

ENABLE_DATABASE_CHECKS=true

# MySQL/MariaDB
DB_MYSQL_HOST="localhost"
DB_MYSQL_PORT="3306"
DB_MYSQL_USER="telemon"
DB_MYSQL_PASS="secret"
DB_MYSQL_NAME="mysql"

# PostgreSQL
DB_POSTGRES_HOST="localhost"
DB_POSTGRES_PORT="5432"
DB_POSTGRES_USER="telemon"
DB_POSTGRES_PASS="secret"
DB_POSTGRES_NAME="postgres"

# Redis
DB_REDIS_HOST="localhost"
DB_REDIS_PORT="6379"
DB_REDIS_PASS=""

# SQLite3
DB_SQLITE_PATHS="/var/lib/app/data.db"
DB_SQLITE_SIZE_THRESHOLD_WARN=500
DB_SQLITE_SIZE_THRESHOLD_CRIT=1000

ODBC Database Connections

ENABLE_ODBC_CHECKS=true
ODBC_CONNECTIONS="mssql_prod oracle_dw"

# DSN-based
ODBC_MSSQL_PROD_DSN="MSSQL-Production-DSN"
ODBC_MSSQL_PROD_USER="telemon"
ODBC_MSSQL_PROD_PASS="secure_password"
ODBC_MSSQL_PROD_QUERY="SELECT 1"

# Connection string-based
ODBC_ORACLE_DW_DRIVER="Oracle ODBC Driver"
ODBC_ORACLE_DW_SERVER="oracle-dw.example.com:1521/ORCLDW"
ODBC_ORACLE_DW_USER="monitor"
ODBC_ORACLE_DW_PASS="secure_password"
ODBC_ORACLE_DW_QUERY="SELECT 1 FROM DUAL"

DNS Record Monitoring

ENABLE_DNS_RECORD_CHECK=true
DNS_CHECK_RECORDS="example.com:A:93.184.216.34,_dmarc.example.com:TXT:v=DMARC1*,example.com:MX:*"
DNS_CHECK_NAMESERVER=""

Enhanced Audit Logging

ENABLE_AUDIT_LOGGING=true
AUDIT_LOG_FILE="/var/log/telemon_audit.log"
AUDIT_EVENTS="all"

Paths

STATE_FILE="/tmp/telemon_sys_alert_state"
LOG_FILE="/opt/telemon/telemon.log"
LOG_LEVEL="INFO"
LOG_MAX_SIZE_MB=10
LOG_MAX_BACKUPS=5
BACKUP_KEEP_COUNT=5

Tip: For production, move STATE_FILE out of /tmp to a persistent path like /var/lib/telemon/state.

Common Configurations

Docker Host (Proxmox, NAS, home server)

ENABLE_DOCKER_CONTAINERS=true
CRITICAL_SYSTEM_PROCESSES="sshd dockerd"
CRITICAL_CONTAINERS="redis nginx myapp"

Web Server (Nginx/Apache + SSL)

ENABLE_SITE_MONITOR=true
SITE_CHECK_SSL=true
SITE_SSL_WARN_DAYS=14
CRITICAL_SYSTEM_PROCESSES="sshd nginx"
CRITICAL_SITES="https://example.com|max_response_ms=5000"

Node.js App Server (PM2-managed)

ENABLE_PM2_PROCESSES=true
ENABLE_SITE_MONITOR=true
CRITICAL_SYSTEM_PROCESSES="sshd"
CRITICAL_PM2_PROCESSES="api worker scheduler"

Full-Stack Server (everything enabled)

ENABLE_DOCKER_CONTAINERS=true
ENABLE_SITE_MONITOR=true
ENABLE_TCP_PORT_CHECK=true
ENABLE_TEMP_CHECK=true
ENABLE_DNS_CHECK=true
ENABLE_NETWORK_CHECK=true
ENABLE_INTEGRITY_CHECK=true
ENABLE_CRON_CHECK=true
ENABLE_PROMETHEUS_EXPORT=true
ENABLE_JSON_STATUS=true
ENABLE_HEARTBEAT=true
ENABLE_FLEET_CHECK=true
ENABLE_PREDICTIVE_ALERTS=true

SERVER_LABEL="prod-01"
CRITICAL_SYSTEM_PROCESSES="sshd cron nginx"
CRITICAL_CONTAINERS="redis postgres myapp"
AUTO_RESTART_SERVICES="nginx"
FLEET_HEARTBEAT_DIR="/shared/telemon/heartbeats"

How It Works

Confirmation Count

Alerts only fire after a problem persists:

Check 1: CPU=85% (CRITICAL) → count=1/3, no alert
Check 2: CPU=88% (CRITICAL) → count=2/3, no alert
Check 3: CPU=87% (CRITICAL) → count=3/3, ALERT SENT
Check 4: CPU=87% (CRITICAL) → count=3/3, silent
Check 5: CPU=40% (OK)       → RESOLVED ALERT SENT

Set CONFIRMATION_COUNT=1 for immediate alerts.

Alert Rate Limiting

Per-key cooldown prevents alert floods:

12:00 — CPU goes CRITICAL → alert sent
12:05 — CPU resolves to OK → recovery alert sent
12:10 — CPU goes CRITICAL again → cooldown active, no alert
12:15 — CPU still CRITICAL → cooldown expired, alert sent

Controlled by ALERT_COOLDOWN_SEC (default: 900s). Set to 0 to disable.

Alert Dispatch Chain

Normal cycle:     dispatch_with_retry() → Telegram (queue on fail) + Webhook + Email
Digest mode:      dispatch_alert()      → Telegram + Webhook + Email (no retry)
Escalation:       check_escalation()    → Escalation webhook only

State File

Default: /tmp/telemon_sys_alert_state

Format: key=STATE:count

cpu=CRITICAL:3
mem=OK:0
disk_root=WARNING:2
container_redis=OK:0

Related files:

File	Purpose
`${STATE_FILE}`	Current check states
`${STATE_FILE}.detail`	State detail text (HTML)
`${STATE_FILE}.queue`	Queued alerts from failed Telegram sends
`${STATE_FILE}.cooldown`	Per-key alert rate limiting
`${STATE_FILE}.escalation`	Escalation tracking
`${STATE_FILE}.trend`	Predictive trend data

CLI Reference

# Run a full monitoring check cycle
bash telemon.sh

# Validate configuration
bash telemon.sh --validate

# Validate + send test Telegram message
bash telemon.sh --test

# Send health digest summary
bash telemon.sh --digest

# Generate static HTML status page
bash telemon.sh --generate-status-page

# Show help
bash telemon.sh --help

Admin Utility

bash telemon-admin.sh status          # Show installation status
bash telemon-admin.sh validate        # Validate configuration
bash telemon-admin.sh backup          # Create backup
bash telemon-admin.sh restore <path>  # Restore from backup
bash telemon-admin.sh reset-state     # Reset alert state
bash telemon-admin.sh digest          # Send health digest
bash telemon-admin.sh fleet-status    # Show fleet overview
bash telemon-admin.sh logs            # View last 50 log lines
bash telemon-admin.sh logs 100        # View last 100 lines
bash telemon-admin.sh discover        # Auto-discover services

Update & Uninstall

bash update.sh           # Update to latest version
bash update.sh --check   # Check for updates
bash uninstall.sh        # Remove cron/systemd, keep config
bash uninstall.sh --full # Remove everything

Alternative Deployment

Systemd Timer

# Install with systemd timer
curl -fsSL https://raw.githubusercontent.com/SwordfishTrumpet/telemon/main/install.sh | bash -s -- --systemd

# Manual setup (user systemd)
mkdir -p ~/.config/systemd/user/
cp systemd/telemon.timer ~/.config/systemd/user/
cp systemd/telemon@.service ~/.config/systemd/user/telemon.service
systemctl --user daemon-reload
systemctl --user enable telemon.timer
systemctl --user start telemon.timer

See systemd/README.md for detailed reference.

Docker

# Build and run with docker-compose
docker-compose up -d

# Or build manually
docker build -t telemon .
docker run -v $(pwd)/.env:/opt/telemon/.env:ro telemon

Plugin Examples

Disk Usage Check

#!/usr/bin/env bash
# checks.d/custom-disk-check.sh

USAGE=$(df /data 2>/dev/null | awk 'NR==2 {print $5}' | tr -d '%')

if [[ -z "$USAGE" ]]; then
    echo "CRITICAL|data_disk|Mount /data not found"
elif [[ "$USAGE" -ge 90 ]]; then
    echo "CRITICAL|data_disk|Disk /data at ${USAGE}%"
elif [[ "$USAGE" -ge 80 ]]; then
    echo "WARNING|data_disk|Disk /data at ${USAGE}%"
else
    echo "OK|data_disk|Disk /data at ${USAGE}%"
fi

HTTP Service Health

#!/usr/bin/env bash
# checks.d/api-health.sh

HEALTH=$(curl -s --max-time 5 http://localhost:8080/health 2>/dev/null)

if [[ -z "$HEALTH" ]]; then
    echo "CRITICAL|api_health|API not responding"
elif echo "$HEALTH" | grep -q '"status":"ok"'; then
    echo "OK|api_health|API healthy"
else
    echo "WARNING|api_health|API degraded"
fi

Plugin tips:

Make it executable: chmod +x checks.d/my-plugin.sh
Handle missing dependencies
Keep checks under CHECK_TIMEOUT (default 30s)
Output exactly: STATE|KEY|DETAIL

Testing & Debugging

Validation

bash telemon.sh --validate
bash telemon.sh --test  # Send test alerts

Debug Logging

# Edit .env:
LOG_LEVEL="DEBUG"

# Run manually
bash telemon.sh 2>&1 | tee /tmp/telemon-debug.log

Understanding Log Files

Telemon produces two log files with different purposes:

Log File	Purpose	Rotation	Level Control
`telemon.log`	Main monitoring activity, check results, alerts	Self-rotating (`LOG_MAX_SIZE_MB`)	Respects `LOG_LEVEL` setting
`telemon_cron.log`	Cron stderr output, lock contention messages	Not rotated — managed by cron	Only WARN/ERROR from lock mechanism

Why two log files?

telemon.log is written via the log() function with level filtering and rotation
telemon_cron.log captures stderr from cron, including early-stage messages before the log() function is available (e.g., lock contention)

Managing log growth:

# Check log sizes
ls -lh telemon*.log

# Truncate cron log if it grows too large
> telemon_cron.log

# Enable logrotate (system-level)
sudo cp telemon-logrotate.conf /etc/logrotate.d/telemon

Reset State

bash telemon-admin.sh reset-state

Common Issues

Issue	Solution
Telegram not sending	Check bot token, chat ID, internet connectivity
SMTP auth fails	Verify password, check if 2FA requires app password
Docker not detected	Ensure user is in `docker` group
Plugin not loading	Check file is executable, check output format
State file errors	Ensure `/tmp` is writable, check disk space

Getting Telegram Credentials

Step 1: Create a Bot

Open Telegram and message @BotFather
Send /newbot
Follow prompts — pick a name and username (must end in bot)
Copy the token (e.g., 123456789:ABCdefGHIjklMNOpqrSTUvwxyz)

Step 2: Get Your Chat ID

Option A (Fastest):

Message @userinfobot
Copy the number it replies with

Option B:

Message your bot (send anything)
Visit https://api.telegram.org/bot<TOKEN>/getUpdates
Find "chat":{"id":123456789

Step 3: Test

curl -X POST "https://api.telegram.org/bot<TOKEN>/sendMessage" \
  -d "chat_id=<CHAT_ID>" -d "text=Test from Telemon"

Operating System Support

Distribution	Status	Notes
Ubuntu 20.04+	✅ Fully supported	Primary development target
Debian 11+	✅ Fully supported
CentOS/RHEL 8+	✅ Supported	May need EPEL
Alpine Linux	⚠️ Partial	BusyBox tools may differ
macOS	❌ Not supported	Requires Linux `/proc`
Windows WSL	⚠️ Partial	Some `/proc` metrics may differ

Why Linux only? Telemon reads from Linux-specific interfaces: /proc/loadavg, /proc/meminfo, /proc/stat, /proc/net/dev.

Documentation

Quick Reference — Command cheat sheet
Troubleshooting Guide — Common issues and solutions
Systemd Setup — Running with systemd instead of cron

License

MIT License — see LICENSE.

Made with code for headless servers everywhere.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github		.github
checks.d		checks.d
docs		docs
lib		lib
systemd		systemd
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.shellcheckrc		.shellcheckrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
install.sh		install.sh
telemon-admin.sh		telemon-admin.sh
telemon-logrotate.conf		telemon-logrotate.conf
telemon.sh		telemon.sh
timemachine-maintenance.sh		timemachine-maintenance.sh
uninstall.sh		uninstall.sh
update.sh		update.sh

Folders and files

Latest commit

History

Repository files navigation

Telemon

🚀 One-Line Install

Why Telemon?

Key Strengths

How Alerting Works

Dependencies

Core (Always Required)

Alert Channels (At least one recommended)

Optional Checks (Auto-Detect, Gracefully Skip)

Administrative (Optional)

Design Philosophy

Features

Core System Monitoring

Process & Service Monitoring

Website & Endpoint Monitoring

Extended Monitoring

Predictive & Fleet Features

Plugin System

Database Health Checks

Alert Channels & Intelligence

Exports & Integrations

Quick Install (One-Liner)

Interactive Install (Recommended for First Time)

Silent/Automated Install (CI/CD, Ansible, Cloud Init)

Systemd Timer Install (Alternative to Cron)

What the Installer Does

Installer Options

Requirements for One-Line Install

Quick Start (Manual Install)

Prerequisites

Installation

Auto-Discovery

Discovery Output Example

Configuration

Minimal Config

Enable/Disable Checks

Thresholds

Alert Channels

Alert Tuning

What to Monitor

Maintenance Windows

Exports

Predictive Resource Exhaustion

Fleet Monitoring

Plugin System

Database Health Checks

ODBC Database Connections

DNS Record Monitoring

Enhanced Audit Logging

Paths

Common Configurations

How It Works

Confirmation Count

Alert Rate Limiting

Alert Dispatch Chain

State File

CLI Reference

Admin Utility

Update & Uninstall

Alternative Deployment

Systemd Timer

Docker

Plugin Examples

Disk Usage Check

HTTP Service Health

Testing & Debugging

Validation

Debug Logging

Understanding Log Files

Reset State

Common Issues

Getting Telegram Credentials

Step 1: Create a Bot

Step 2: Get Your Chat ID

Step 3: Test

Operating System Support

Packages