Skip to content

Anjanamb/wattson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wattson

Your machine's personal assistant — a DL-workload-aware system monitor.

Status Version Python Textual NVIDIA License: MIT

A terminal UI for the bits of system monitoring that matter when you're running deep-learning workloads: GPU utilisation per training job, thermal headroom, throttling alerts, and the hardware details you forget every time someone asks "wait, what model GPU is in this rig?"

Status: v0.0.9 — Trends screen now uses real braille line charts (via textual-plotext / plotext) instead of sparkline bars, and Windows CPU temperature is read from WMI ACPI thermal zones when psutil has no Windows backend.

Planned features

Shipped

  • CPU usage, frequency, model, core count (v0.0.1)
  • NVIDIA GPU utilisation, VRAM, temperature with graceful no-GPU fallback (v0.0.1)
  • System memory + swap (v0.0.1)
  • Disk usage per partition (v0.0.1)
  • Live-refreshing TUI (textual) (v0.0.1)
  • GPU-aware process table — top processes by VRAM + CPU, with which-GPU attribution (v0.0.2)
  • GPU clocks + power draw — graphics/memory clock MHz, current/cap watts (v0.0.3)
  • GPU throttle alerts — surfaces active reasons (PowerCap, Thermal, HW slowdown, …) in yellow when present (v0.0.3)
  • CPU temperature — Linux/macOS via psutil sensors; Windows shows n/a until WMI/OHM backend lands (v0.0.3)
  • Kill selected processk on a row → confirmation modal → SIGTERM via psutil, with status notification (v0.0.3)
  • Hardware-inventory screeni opens a full-screen view of driver/CUDA versions, GPU UUID / serial / PCIe gen+width, CPU cache + AVX flags, OS / kernel / Python (v0.0.4)
  • Process criticality flagging marker + bold cyan styling on processes that hold VRAM (likely training jobs), sustain CPU > 50 %, or own > 10 % of system memory (v0.0.4)
  • Trends screent opens live sparklines for CPU %, memory %, and per-GPU util / temp / power over the last 60 s. Backed by a ring buffer in wattson.history (v0.0.5)
  • Watchdog mode — each tick, threshold checks for hot CPU/GPU, memory pressure, VRAM pressure, and active GPU throttle reasons. Events go to ~/.wattson/events.jsonl (JSONL, rate-limited 1 / category / 60 s); session count surfaces in the header sub-title; w opens the tailing screen (v0.0.6)
  • Process priority controln on the selected row opens Low / Normal / High buttons (or l / n / h shortcuts); psutil maps to nice values on Unix and PRIORITY_CLASS on Windows. High typically needs admin and surfaces an Access denied toast (v0.0.7)
  • GPU power-limit controlp opens a modal showing the current draw / cap / driver-reported min–max range and accepts a target wattage. Apply requires admin; failure path surfaces a clear error toast (v0.0.7)
  • Line-chart Trendst now renders real braille line charts via textual-plotext / plotext rather than sparkline bars. Per-metric colours (CPU cyan · GPU green · memory blue · temps red · power magenta) with current / min / max in each row's label (v0.0.9)
  • Windows CPU temperature — falls back to WMI MSAcpi_ThermalZoneTemperature when psutil has no Windows backend. (Works on systems with ACPI thermal zones; modern laptops often need LibreHardwareMonitor — TBD.) (v0.0.9)

Coming

  • CPU-affinity controls (currently priority only)
  • Per-GPU drill-in screen (focus on one GPU at a time, with picker for the power-limit dialog on multi-GPU rigs)
  • Multi-host — watch a small training cluster from one terminal
  • LibreHardwareMonitor backend for richer Windows temperatures (modern laptops hide CPU temp behind vendor EC; LHM exposes it via its own WMI namespace)

Install (dev)

git clone git@github.com:Anjanamb/wattson.git
cd wattson
python -m venv .venv
# Windows:  .venv\Scripts\Activate.ps1
# Unix:     source .venv/bin/activate
pip install -e ".[dev]"
wattson         # or: python -m wattson

Inside the TUI:

  • q — quit
  • r — force-refresh
  • / — move the row cursor in the process table
  • k — kill the selected process (with a confirmation modal — y / n / Esc)
  • n — change priority of the selected process (l / n / h for Low / Normal / High; Esc to cancel)
  • p — set GPU0 power limit (input is in watts; needs admin to apply)
  • i — open the hardware-inventory screen (i / q / Esc to return)
  • t — open the live trends screen (t / q / Esc to return)
  • w — open the watchdog event log (w / q / Esc to return)

Why "wattson"?

Watt (power) + Watson (assistant). Your hardware burns watts; wattson watches.

License

MIT — see anjanamb.github.io for more projects.

About

Your machine's personal assistant — DL-workload-aware system monitor (TUI)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages