Lightweight monitoring solution for mixed-architecture K3s cluster (AMD64 desktop + ARM64 Raspberry Pi Zero 2W).
Example K3s cluster monitoring dashboard showing node metrics, resource usage, and system health
┌─────────────────────────────────────────────────────────────┐
│ Desktop Node (amd64) │
│ ┌──────────────┐ ┌────────────┐ ┌─────────────────────┐ │
│ │ Prometheus │ │ Grafana │ │ Alertmanager │ │
│ │ (Storage) │◄─┤ (NodePort) │ │ │ │
│ │ 90d/50GB │ │ :32000 │ │ Alert Routing │ │
│ └──────┬───────┘ └────────────┘ └─────────────────────┘ │
│ │ │
│ │ Scrapes (60s interval) │
│ ┌──────▼───────┐ ┌──────────────┐ │
│ │ node-exporter│ │kube-state- │ │
│ │ │ │ metrics │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
│ Scrapes
▼
┌─────────────────────────────────────────────────────────────┐
│ Raspberry Pi Zero 2W (arm64) │
│ ┌──────────────┐ │
│ │ node-exporter│ (30Mi/50Mi memory limits) │
│ │ │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Prometheus v2.51.2: Metrics collection and storage (90 days or 50GB retention)
- Grafana v10.4.2: Visualization dashboards (exposed at NodePort 32000)
- Alertmanager v0.27.0: Alert routing and grouping
- node-exporter v1.7.0: Node-level metrics (CPU, memory, disk, network)
- kube-state-metrics v2.12.0: Cluster-level resource metrics
- CPU usage per core and total
- Memory usage (total, available, buffers, cache)
- Disk usage and I/O statistics
- Network traffic (receive/transmit bytes, errors, drops)
- System load averages
- Pod status and resource usage
- Deployment status and replicas
- Node status and capacity
- Persistent volume usage
- ConfigMap and Secret counts
- Container CPU usage
- Container memory working set
- Container network traffic
- Container filesystem usage
| Alert | Threshold | Duration | Severity |
|---|---|---|---|
| NodeDown | Node exporter unreachable | 2m | critical |
| HighMemoryUsage | Memory >85% | 5m | warning |
| CriticalMemoryUsage | Memory >95% | 2m | critical |
| HighCPUUsage | CPU >80% | 10m | warning |
| DiskSpaceLow | Root FS <15% free | 5m | warning |
| DiskSpaceCritical | Root FS <5% free | 2m | critical |
| PrometheusStorageHigh | TSDB >45GB | 5m | warning |
- K3s cluster running (control plane + worker nodes)
- kubectl configured to access the cluster
- Desktop node with:
/mnt/prometheus-datadirectory (will be created automatically)/mnt/grafana-datadirectory (will be created automatically)- Label:
kubernetes.io/arch=amd64
Before deploying, create a Kubernetes Secret for the Grafana admin credentials:
kubectl create namespace monitoring # skip if already exists
kubectl create secret generic grafana-admin-credentials \
--namespace monitoring \
--from-literal=admin-user=admin \
--from-literal=admin-password=<your-secure-password>Note: Replace
<your-secure-password>with a strong password. This secret is referenced bymanifests/observability/grafana/deployment.yaml.
Apply in order (dependencies matter):
# Navigate to project root
cd k3_cluster
# Apply namespace first
kubectl apply -f manifests/observability/namespace.yaml
# Apply RBAC
kubectl apply -f manifests/observability/kube-state-metrics/rbac.yaml
kubectl apply -f manifests/observability/prometheus/rbac.yaml
# Apply ConfigMaps
kubectl apply -f manifests/observability/prometheus/config.yaml
kubectl apply -f manifests/observability/prometheus/rules.yaml
kubectl apply -f manifests/observability/alertmanager/config.yaml
kubectl apply -f manifests/observability/grafana/config.yaml
# Apply workloads
kubectl apply -f manifests/observability/node-exporter/
kubectl apply -f manifests/observability/kube-state-metrics/
kubectl apply -f manifests/observability/prometheus/
kubectl apply -f manifests/observability/alertmanager/
kubectl apply -f manifests/observability/grafana/Or apply everything at once:
kubectl apply -f manifests/observability/Check all pods are running:
kubectl get pods -n monitoring -o wideExpected output:
NAME READY STATUS NODE
node-exporter-xxxxx 1/1 Running <control-plane-node> (desktop)
node-exporter-yyyyy 1/1 Running pi2 (Pi Zero)
kube-state-metrics-xxxxx 1/1 Running <control-plane-node>
prometheus-xxxxx 1/1 Running <control-plane-node>
alertmanager-xxxxx 1/1 Running <control-plane-node>
grafana-xxxxx 1/1 Running <control-plane-node>
Check services:
kubectl get svc -n monitoringPort-forward to Prometheus:
kubectl port-forward -n monitoring svc/prometheus 9090:9090Open browser to http://localhost:9090/targets
All targets should show state UP:
- prometheus (1/1 up)
- node-exporter (2/2 up - desktop + Pi)
- kube-state-metrics (1/1 up)
- kubelet (2/2 up)
- cadvisor (2/2 up)
Grafana is exposed via NodePort 32000. Access it at:
http://<CONTROL_PLANE_IP>:32000
Default credentials (change on first login):
- Username:
admin - Password: set via Kubernetes Secret (see Security Setup below)
You'll be prompted to change the password on first login.
In Grafana UI:
- Click "+" (Create) → Import
- Enter dashboard ID and click Load
- Select Prometheus as the datasource
- Click Import
Recommended Dashboards:
| Dashboard ID | Name | Description |
|---|---|---|
| 15661 | K3s Cluster Monitoring | Optimized for K3s, shows cluster overview |
| 1860 | Node Exporter Full | Comprehensive node metrics (CPU, memory, disk, network) |
| 13824 | Raspberry Pi Monitoring | ARM-specific metrics and Pi health |
| 12006 | Kubernetes API Server | API server performance |
| 315 | Kubernetes Cluster Monitoring | Simple cluster overview |
Import via JSON (alternative):
- Download dashboard JSON from https://grafana.com/grafana/dashboards/[ID]
- In Grafana: Create → Import → Upload JSON file
Check that data directories were created:
# On desktop node
ls -la /mnt/prometheus-data
ls -la /mnt/grafana-dataPrometheus should create TSDB blocks:
ls -la /mnt/prometheus-data/
# Should show: chunks_head/, queries.active, wal/Edit manifests/observability/prometheus/config.yaml:
global:
scrape_interval: 60s # Change to 30s or 120s as neededApply changes:
kubectl apply -f manifests/observability/prometheus/config.yaml
kubectl rollout restart -n monitoring deployment/prometheusEdit manifests/observability/prometheus/deployment.yaml:
args:
- '--storage.tsdb.retention.time=90d' # Change to 60d, 180d, etc.
- '--storage.tsdb.retention.size=50GB' # Change to 100GB, etc.Apply:
kubectl apply -f manifests/observability/prometheus/deployment.yamlEdit manifests/observability/alertmanager/config.yaml to add notification channels:
Slack Example:
receivers:
- name: 'critical-receiver'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
channel: '#alerts'
title: 'K3s Cluster Alert'
text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'Email Example:
receivers:
- name: 'default-receiver'
email_configs:
- to: 'your-email@example.com'
from: 'alertmanager@example.com'
smarthost: 'smtp.example.com:587'
auth_username: 'your-username'
auth_password: 'your-password'Webhook Example:
receivers:
- name: 'default-receiver'
webhook_configs:
- url: 'http://your-webhook-endpoint.com/alerts'Apply changes:
kubectl apply -f manifests/observability/alertmanager/config.yaml
kubectl rollout restart -n monitoring deployment/alertmanagerCheck pod logs:
kubectl logs -n monitoring -l app=prometheusCommon issues:
- RBAC permissions: Verify Prometheus ServiceAccount has ClusterRole
- Network policies: Ensure pods can communicate
- K3s endpoints: Check kubelet is accessible at
kubernetes.default.svc:443
Check memory limits:
kubectl describe pod -n monitoring -l app=node-exporterIf OOMKilled, reduce limits in manifests/observability/node-exporter/daemonset.yaml:
resources:
limits:
memory: 40Mi # Reduce from 50MiCheck datasource connection:
- In Grafana: Configuration → Data Sources → Prometheus
- Click Save & Test
- Should show: "Data source is working"
If failing:
- Verify Prometheus service DNS:
kubectl get svc -n monitoring prometheus - Check Prometheus is running:
kubectl get pods -n monitoring -l app=prometheus
Check current usage:
kubectl top nodes
kubectl top pods -n monitoringReduce scrape frequency:
Edit prometheus config to increase scrape_interval to 120s or 180s.
Disable cAdvisor metrics:
Remove the cadvisor scrape job from prometheus/config.yaml if not needed.
Check Prometheus storage:
kubectl port-forward -n monitoring svc/prometheus 9090:9090
# Visit http://localhost:9090/tsdb-statusManually compact:
# Exec into Prometheus pod
kubectl exec -n monitoring -it deployment/prometheus -- sh
# Trigger compaction
kill -HUP 1Reduce retention:
Lower storage.tsdb.retention.time or retention.size in deployment.
- Overview Dashboard: Import 15661 for cluster health at-a-glance
- Node Details: Import 1860 for deep-dive into node performance
- Pi-Specific: Import 13824 to monitor ARM-specific issues
- Custom Panels: Create dashboard for your specific workloads
Monitor alert frequency for 1-2 weeks, then adjust:
- Too many alerts: Increase thresholds or durations
- Missing issues: Decrease thresholds
- Flapping: Increase
forduration in alert rules
For Pi Zero stability:
- Keep scrape_interval at 60s or higher
- Monitor node-exporter memory usage
- Limit cAdvisor metrics to essentials
- Consider disabling kube-state-metrics if not needed
Prometheus data is ephemeral by design. For important metrics:
- Use Grafana snapshots for dashboards
- Export queries to JSON
- Consider Prometheus remote_write to long-term storage if needed
| Component | Desktop Node | Pi Zero Node |
|---|---|---|
| Prometheus | 1-2 GB | - |
| Grafana | 256-512 MB | - |
| Alertmanager | 128-256 MB | - |
| kube-state-metrics | 50-100 MB | - |
| node-exporter | 30-50 MB | 30-50 MB |
| Total | ~2-3 GB | ~30-50 MB |
- Total RAM: 512 MB
- OS + kubelet: ~150-200 MB
- node-exporter: ~50 MB
- Your workloads: ~200-260 MB available
K3s exposes kubelet metrics at /api/v1/nodes/{node}/proxy/metrics (not /metrics/cadvisor).
The configuration automatically uses K3s-compatible paths.
K3s embeds cAdvisor in kubelet. Access via /api/v1/nodes/{node}/proxy/metrics/cadvisor.
Metrics are filtered to essentials to reduce load.
Prometheus uses in-cluster ServiceAccount to access Kubernetes API via kubernetes.default.svc:443.
To scrape application metrics:
- Instrument your app with Prometheus client library
- Expose /metrics endpoint in your pod
- Add scrape config to
prometheus/config.yaml:
- job_name: 'my-app'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: my-app-name
- source_labels: [__meta_kubernetes_pod_container_port_number]
action: keep
regex: "8080" # Your metrics portFor issues or questions:
- Check pod logs:
kubectl logs -n monitoring <pod-name> - Describe resources:
kubectl describe -n monitoring <resource> - Prometheus status: http://localhost:9090 (port-forward first)
- Grafana explore: http://<CONTROL_PLANE_IP>:32000/explore
