Xander is a local Kubernetes telemetry demo for spotting pod resource pressure and hidden dependencies. It collects container metrics from a local k3s cluster, stores them in SQLite, builds rolling aggregates, and exposes a small HTTP API and a Streamlit UI for quick investigation.
telemetry-collector/— DaemonSet collector for pod discovery, cgroup metrics, events, and SQLite storage.aggregation-engine/— Reads collector metrics and writes rolling aggregate JSON files.telemetry-api/— Small read-only API over the collector metrics database.agent/— Python analysis agent that loads context files and produces analysis reports (supportsanalyze,daemon,watch).streamlit_app.py— Lightweight interactive UI for exploring metrics and aggregates.
- Go 1.21+
- Python 3.8+
- Docker, used by
k3dand for local collector image builds kubectlk3dfor the local k3s clustersqlite3(for manual inspection)
The easiest way to get a working local environment is the root Makefile:
cd /path/to/xander
make upThis installs Python deps into a virtualenv, downloads Go modules, creates or reuses a single-node k3d cluster, deploys the noisy-neighbor scenario, and deploys the collector.
Typical next steps:
source .venv/bin/activate
streamlit run streamlit_app.pyOpen the Streamlit UI and, in the sidebar, point it at /tmp/collector-metrics.db (or the path you copied from the collector pod).
To sanity-check that the scenario is producing pod metrics:
make verify-scenarioThese sections describe how to run each component individually for development and testing.
- Collector (k8s): Build and deploy the collector image into your local cluster.
cd telemetry-collector
make docker-build
make deploy-k3- Copy the metrics DB from the collector pod for local use:
POD_NAME=$(kubectl get pods -n telemetry-system -l app=telemetry-collector -o jsonpath='{.items[0].metadata.name}')
kubectl cp telemetry-system/$POD_NAME:/tmp/metrics.db ./metrics.db
sqlite3 ./metrics.db "SELECT COUNT(*) FROM metrics;"- Aggregation engine:
cd aggregation-engine
make run # 1-minute windows
make run-5m # 5-minute windowsBy default the engine reads ../telemetry-collector/metrics.db and writes aggregates_<window>.json.
- Telemetry API:
cd telemetry-api
make run DB=../telemetry-collector/metrics.dbEndpoints include GET /cluster-summary, GET /top-risk, and GET /incidents.
- Agent (analysis CLI):
The agent/ package includes a small CLI entrypoint at agent/main.py.
cd agent
source ../.venv/bin/activate # or use your Python env
# to test/analyze the agent working
python main.py analyze --context-file /path/to/context_123.json
# or analyze latest context in the configured directory
python main.py analyze --latest
# Run as a daemon monitoring for new context files (writes analyses to `analyses/` next to contexts)
python main.py daemon --poll-interval 60The analyze command supports --output-format (markdown or json) and --output-file.
Run the dashboard from the repo root (after activating the .venv created by start-proj.sh):
source .venv/bin/activate
streamlit run streamlit_app.pyThe sidebar lets you point the UI at a local metrics.db file and toggle live charting.
- If Docker is not reachable, ensure it is running and that your user has permission to access the daemon.
- If
k3dis missing, install it first and rerunmake up. - The collector stores the live DB inside the pod at
/tmp/metrics.db. Copy it out for local analysis. make sync-dbmerges collector DBs if a multi-node cluster already exists.
Contributions are welcome.
For development, use make up, make down, and make clean from the repo root.