Rent a GPU, run vLLM on it, and stop paying once the daily budget is gone. Works against Vast.ai and RunPod, with Tailscale handling the network so the pod is reachable by hostname from anywhere.
| What you get | How |
|---|---|
| Vendors | Vast.ai, RunPod (one CLI, identical command surface) |
| Inference server | vLLM, OpenAI-compatible at :8000 — any client that speaks OpenAI works |
| Network | Tailscale hostname (e.g. vllm-qwen27b-vast:8000) — no port-forward, no SSH tunnel per session |
| Cost control | Hard DAILY_BUDGET ($/day) and optional DAILY_HOURS cap with grace countdown before terminate |
| Reproducibility | Per-model + per-GPU profiles in src/profiles/*.env |
| Adding a vendor | One file: src/vendors/<name>.sh implementing create/ssh/terminate/status/logs |
New here? There's a course that walks you from zero to a personal coding agent running on a rented GPU, reachable from your laptop and your phone. Start at
docs/course/README.md.
# 1. One-time: secrets + caps in src/.env (gitignored).
cp src/.env.example src/.env
# edit src/.env: VAST_API_KEY (and/or RUNPOD_API_KEY), TAILSCALE_AUTHKEY,
# DAILY_BUDGET, DAILY_HOURS
# 2. Pick a profile and a vendor.
PROFILE=qwen3.6-27b-4b-vast ./src/gpu.sh vast search # find an offer (or set VAST_OFFER_ID in the profile)
# 3. Create the pod (provisions, installs vLLM, brings up Tailscale).
PROFILE=qwen3.6-27b-4b-vast ./src/gpu.sh vast create
# 4. In another terminal, watch the budget.
PROFILE=qwen3.6-27b-4b-vast ./src/gpu.sh vast supervise
# 5. Use it (Tailscale handles routing — no port-forward).
curl http://vllm-qwen27b-vast:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"cyankiwi/Qwen3.6-27B-AWQ-INT4","messages":[{"role":"user","content":"hi"}]}'
# 6. Tear it down.
PROFILE=qwen3.6-27b-4b-vast ./src/gpu.sh vast terminate| Path | Purpose |
|---|---|
src/gpu.sh |
Vendor dispatcher. gpu.sh <vendor> <command> → <vendor>_<command>. |
src/lib/gpu-common.sh |
Shared: env loading, spend log, supervise loop, tunnel mgmt. |
src/vendors/runpod.sh |
RunPod implementation. |
src/vendors/vast.sh |
Vast.ai implementation. |
src/profiles/*.env |
Per-model + per-GPU configs. Loaded via PROFILE=name. |
src/setup-vllm.sh |
Runs on the pod; installs vLLM, joins Tailscale, serves the model. |
src/remote-session.sh |
Mac-side tmux helper for mobile sessions over Tailscale. |
docs/course/ |
End-to-end course: zero → personal coding agent on a rented GPU. |
docs/ |
Course, cost analysis, GPU runbook, and archived spike notes. |
gpu.sh <vendor> create # provision pod, install vLLM, bring up Tailscale
gpu.sh <vendor> ssh # interactive shell on pod
gpu.sh <vendor> tunnel # SSH tunnel localhost:8000 -> pod:8000 (fallback if Tailscale absent)
gpu.sh <vendor> status # pod state + tunnel + vLLM health
gpu.sh <vendor> logs # tail vLLM log on the pod
gpu.sh <vendor> spend # today's accumulated spend + hours
gpu.sh <vendor> supervise # cost/hours watchdog, terminates on cap with grace countdown
gpu.sh <vendor> terminate # destroy the pod (network volume survives if used)
gpu.sh vast search # list affordable Vast offers (works around broken CLI filter)
Vendors: runpod, vast. Default vendor controlled by DEFAULT_GPU_VENDOR (fallback: vast).
The supervisor is a cost/hours watchdog. When it hits the budget or hours cap, it notifies you, counts down SUPERVISOR_GRACE_SECS (default 60s) so you can finish or Ctrl+C to abort, then terminates. It does not restart things.
DAILY_BUDGET=5 # $/day before terminate
DAILY_HOURS=5 # optional uptime cap; empty = no hours cap
SUPERVISOR_POLL_SECS=30 # check interval
SUPERVISOR_GRACE_SECS=60 # countdown before terminate
- Drop
src/vendors/<name>.sh. - Implement
<name>_create,<name>_terminate,<name>_ssh,<name>_status,<name>_logs.<name>_tunnelis optional. <name>_superviseand<name>_spendcan be one-liners — delegate tosupervise_loopandcompute_spend_todayinsrc/lib/gpu-common.sh.
- Course lesson 05: Pi at
~/.pi/agent/models.jsonpointed at the vLLM endpoint, plus Termius (phone → Tailscale → Mac → tmux) and Cline (VS Code → vLLM) configured against the same hostname. The full course (docs/course/README.md) walks through provisioning end to end.
See spec/project.yaml gotchas: for the full list, and docs/gpu-runbook.md for vendor-specific war stories.
- Cost analysis: $100/month with 27B: how the GPU and model choices shake out on a real budget