Skip to content

acaracappa/cloud-gpu-shopper

Repository files navigation

Cloud GPU Shopper

Go Version CI

A unified inventory and orchestration service for commodity GPU providers (Vast.ai, Blue Lobster, TensorDock). Acts as a "menu and provisioner" - select, provision, hand off credentials, ensure cleanup.

Table of Contents

Key Principle

Menu, not middleman. We provision and hand off direct access. We don't proxy traffic.

This design philosophy means Cloud GPU Shopper acts as a catalog and orchestrator, not a gateway. Once your GPU session is provisioned, you connect directly to the instance via SSH - no intermediary, no added latency, no single point of failure. Your workloads run with full performance and you maintain complete control.

Why Cloud GPU Shopper?

Managing GPU compute across multiple cloud providers is complex and risky:

  • Unified Interface: Browse and compare GPU offers across Vast.ai, Blue Lobster, and TensorDock from a single API. No need to learn multiple provider interfaces or maintain separate integrations.

  • Built-in Safety Systems: Prevent runaway costs with automatic 12-hour session limits, orphan instance detection, and verified destruction. The service is designed with "zero orphaned instances" as the primary goal.

  • Simple Provisioning Workflow: Create a session with one API call or CLI command. Get SSH credentials immediately. Signal when done and the instance is cleaned up automatically.

Features

  • Unified Inventory: Browse GPUs across multiple providers with filtering
  • Session Management: Provision, monitor, and destroy GPU sessions
  • Safety Systems: 12-hour hard max, orphan detection, verified destruction
  • Cost Tracking: Per-session and per-consumer cost aggregation with budget alerts

Supported Providers

Provider Status Features
Vast.ai Implemented Instance tags, spot pricing, Docker templates
Blue Lobster Implemented Fixed pricing, dedicated GPUs, direct SSH on port 22
TensorDock Implemented On-demand pricing, dedicated IPs

Blue Lobster Note: Instances run apt-get dist-upgrade on boot, which rebuilds NVIDIA DKMS kernel modules for 7-19 minutes after SSH becomes available. Cloud GPU Shopper handles this automatically with a readiness probe that waits for dpkg locks to clear and nvidia-smi to stabilize.

TensorDock Note: The ubuntu2404 image requires manual NVIDIA driver installation:

ssh user@<ip> "sudo apt-get update && sudo apt-get install -y nvidia-driver-550 && sudo reboot"

Quick Start

Prerequisites

  • Go 1.25+
  • Docker (optional, for containerized deployment)

Environment Variables

Create a .env file in the project root (automatically loaded by the server):

VASTAI_API_KEY=your-vastai-key
BLUELOBSTER_API_KEY=your-bluelobster-key
TENSORDOCK_API_TOKEN=your-tensordock-token
TENSORDOCK_AUTH_ID=your-tensordock-auth-id
DATABASE_PATH=./data/gpu-shopper.db

Or export them directly:

export VASTAI_API_KEY=your-vastai-key
export BLUELOBSTER_API_KEY=your-bluelobster-key
export TENSORDOCK_API_TOKEN=your-tensordock-token
export TENSORDOCK_AUTH_ID=your-tensordock-auth-id

Run the Server

# Build and run
go build -o bin/server ./cmd/server
./bin/server

# Or run directly
go run ./cmd/server

The server starts on http://localhost:8080.

Use the CLI

# Build CLI
go build -o bin/gpu-shopper ./cmd/cli

# List available GPUs
./bin/gpu-shopper inventory

# Provision a session
./bin/gpu-shopper provision --offer-id <offer-id> --consumer-id my-app --hours 2

# List active sessions
./bin/gpu-shopper sessions list

# Signal session complete
./bin/gpu-shopper sessions done <session-id>

# View costs
./bin/gpu-shopper costs --consumer-id my-app

Docker Deployment

cd deploy

# Start server only
docker-compose up -d server

# Start with monitoring (Prometheus + Grafana)
docker-compose --profile monitoring up -d

# View logs
docker-compose logs -f server

Access points:

Common Use Cases

Running LLM Inference Workloads

Deploy vLLM, Ollama, or Text Generation Inference on demand:

# Find an RTX 4090 with at least 24GB VRAM under $0.50/hour
./bin/gpu-shopper inventory --gpu-type "RTX 4090" --min-vram 24 --max-price 0.50

# Provision for 4 hours
./bin/gpu-shopper provision --offer-id vastai-12345 --consumer-id llm-service --hours 4

# SSH in and start your inference server
ssh -i session-key root@192.168.1.100 "docker run -d --gpus all vllm/vllm-openai ..."

# When done, signal completion for automatic cleanup
./bin/gpu-shopper sessions done sess-abc123

Training ML Models

Spin up high-memory GPUs for training runs:

# Find A100s for training
./bin/gpu-shopper inventory --gpu-type "A100" --min-vram 40

# Provision with longer duration
./bin/gpu-shopper provision --offer-id tensordock-67890 --consumer-id training-job-42 --hours 8

# Your training scripts connect directly via SSH

Batch GPU Processing Jobs

Process large datasets with burst GPU capacity:

# Find the cheapest available GPUs
./bin/gpu-shopper inventory --max-price 0.30

# Provision multiple sessions for parallel processing
for i in {1..4}; do
  ./bin/gpu-shopper provision --offer-id $OFFER_ID --consumer-id batch-job-$i --hours 2
done

# Sessions auto-terminate after reservation expires, or signal done when complete

Real-World Benchmark: DeepSeek-R1:32b on RTX 4090

This benchmark was performed on a TensorDock RTX 4090 provisioned through Cloud GPU Shopper.

Setup:

# Provision an RTX 4090 in Joplin, Missouri
./bin/gpu-shopper provision \
  --offer-id tensordock-071132ae-8c07-4d6b-9c37-041a55a85047-geforcertx4090-pcie-24gb \
  --consumer-id benchmark-test \
  --hours 1 \
  --save-key ~/.ssh/benchmark_key

# SSH and install Ollama
ssh -i ~/.ssh/benchmark_key user@<ip> "curl -fsSL https://ollama.com/install.sh | sh"

# Pull the model (19GB)
ssh -i ~/.ssh/benchmark_key user@<ip> "ollama pull deepseek-r1:32b"

Benchmark Results: DeepSeek-R1:32b (19GB model)

Metric Value
Single Request Speed ~44 tokens/sec
Concurrent Throughput ~41 tokens/sec
VRAM Usage 20.4 GB / 24 GB (83%)
GPU Power Draw ~360W
GPU Temperature 52°C

Performance by Task Type:

Task Output Tokens Speed Time
Short Q&A 12 48.0 tok/s 0.29s
Math Problem 150 44.4 tok/s 10.6s
Code Generation 256 44.2 tok/s 5.8s
Reasoning 200 44.3 tok/s 4.5s
Long Generation 400 44.1 tok/s 9.1s

Cost Analysis:

  • Hourly rate: $0.44/hr (TensorDock RTX 4090)
  • Tokens per hour: ~158,400 (at 44 tok/s)
  • Cost per 1M tokens: ~$0.003

Key Observations:

  1. Generation speed stays consistent (~44 tok/s) regardless of output length
  2. The 32B parameter model fits comfortably with 17% VRAM headroom
  3. GPU thermals remain cool (52°C) under sustained load
  4. Ollama serializes concurrent requests (no batching optimization)

Extended Stress Test (7 min, 67 requests):

The system maintained stable performance under sustained load:

  • Requests: 67 total, 0 errors (100% success rate)
  • Tokens Generated: ~17,000 (256 max per request)
  • Throughput: Consistent 44.13-44.29 tok/s across all requests
  • GPU Utilization: 95-97% average (brief dips during request transitions)
  • Temperature: Stabilized at 64°C (never exceeded safe levels)
  • Power Draw: Average 371W, peak 377W
  • Memory: Stable at 20,368 MiB (83% utilization)

This demonstrates the RTX 4090 handles sustained LLM inference workloads with excellent thermal and performance stability.

GPU Benchmarking

Cloud GPU Shopper includes comprehensive benchmarking infrastructure for evaluating LLM inference performance across different GPUs and providers. This enables data-driven hardware selection based on your specific model requirements.

Benchmark Matrix Results

We tested 8 models across 9 GPUs on 2 providers (49 benchmarks, 45 successful):

GPU Provider $/hr llama3.1:8b mistral:7b deepseek-r1:14b deepseek-r1:32b
RTX 5090 Vast.ai $0.21 149 TPS 72.5 TPS
RTX 5080 Vast.ai $0.12 168 TPS 89 TPS
RTX 4090 Vast.ai $0.08-0.32 94-97 TPS 44.5 TPS
RTX 4090 TensorDock $0.38-0.44 169 TPS 179 TPS 92-94 TPS 13 TPS
RTX 3090 Vast.ai $0.08 145 TPS 159 TPS 83 TPS 3.6 TPS
RTX 3090 TensorDock $0.20 80 TPS 11 TPS
RTX A6000 TensorDock $0.40 122 TPS 68 TPS
RTX 5060 Ti Vast.ai $0.06-0.07 83 TPS 89 TPS
A100 80GB Vast.ai $0.33 86 TPS 42 TPS

Key Findings:

  • RTX 3090 on Vast.ai ($0.08/hr) is the best value across the board: $0.14/M tokens for llama3.1:8b
  • RTX 5090 leads all consumer GPUs, handling 32b models 7x faster than RTX 4090
  • Vast.ai is 3-4x cheaper per hour than TensorDock for equivalent performance
  • Provider variance is significant - same GPU can differ 20-80% between providers
  • New quality metrics: TTFT (time to first token) ranging 4.4s-10.4s, match rate up to 100%

Cost Efficiency ($/Million Tokens)

GPU Provider llama3.1:8b mistral:7b deepseek-r1:14b deepseek-r1:32b
RTX 3090 Vast.ai $0.14 $0.14 $0.26 $6.20
RTX 5060 Ti Vast.ai $0.23 $0.21
RTX 5080 Vast.ai $0.19 $0.40
RTX 5090 Vast.ai $0.39 $0.80
RTX 4090 Vast.ai $0.46 $0.88
RTX 4090 TensorDock $0.65 $0.59-0.68 $1.14-1.30 $9.38
RTX 3090 TensorDock $0.69 $4.91

Benchmark API Endpoints

Query benchmark data programmatically:

# List all benchmarks
curl http://localhost:8080/api/v1/benchmarks

# Find best performing hardware for a model
curl "http://localhost:8080/api/v1/benchmarks/best?model=deepseek-r1:32b"

# Find most cost-effective hardware
curl "http://localhost:8080/api/v1/benchmarks/cheapest?model=qwen2:7b"

# Compare all hardware for a model
curl "http://localhost:8080/api/v1/benchmarks/compare?model=deepseek-r1:14b"

# Get hardware recommendations
curl "http://localhost:8080/api/v1/benchmarks/recommendations?model=qwen2:7b"

Automated Benchmark Runs

The benchmark runner provisions GPU instances, uploads the benchmark script, runs tests, and collects results automatically:

# Run automated benchmarks across Vast.ai GPUs
curl -X POST http://localhost:8080/api/v1/benchmark-runs -H 'Content-Type: application/json' -d '{
  "models": ["llama3.1:8b", "deepseek-r1:14b"],
  "gpu_types": ["RTX 3090", "RTX 4090", "RTX 5060 Ti"],
  "providers": ["vastai"],
  "max_budget": 1.00
}'

# Run benchmarks across Blue Lobster GPUs
curl -X POST http://localhost:8080/api/v1/benchmark-runs -H 'Content-Type: application/json' -d '{
  "models": ["qwen2:0.5b"],
  "gpu_types": ["RTX 5090", "RTX 8000", "RTX A4000", "RTX A5000"],
  "providers": ["bluelobster"]
}'

# Monitor progress
curl http://localhost:8080/api/v1/benchmark-runs/<run-id>

Features:

  • Auto-provisions instances with correct templates (Ollama for Vast.ai, readiness probe for Blue Lobster, cloud-init for TensorDock)
  • Uploads benchmark script via SCP, starts Ollama if needed
  • Collects TTFT, match rate, TPS, GPU stats, and cost data
  • Entry-level retry (2 attempts per GPU/model combo)
  • Structured error reporting with error_type and retry_suggested
  • Fail-fast on permanent SSH errors (auth_failed, key_parse_failed)

Running Your Own Benchmarks

1. Provision a GPU and install Ollama:

./bin/gpu-shopper provision -c benchmark-test -g RTX4090 -t 2 --save-key ~/.ssh/bench_key

ssh -i ~/.ssh/bench_key root@<ip> "curl -fsSL https://ollama.com/install.sh | sh"

2. Pull models and run benchmark:

# Pull models
ssh -i ~/.ssh/bench_key root@<ip> "ollama pull qwen2:7b && ollama pull deepseek-r1:14b"

# Run 5-minute benchmark per model
ssh -i ~/.ssh/bench_key root@<ip> 'MODEL=qwen2:7b DURATION=300 /tmp/bench.sh'

3. Collect and store results:

# Download benchmark results
scp -i ~/.ssh/bench_key -r root@<ip>:/tmp/benchmark_* ./results/

# Load into database
go run ./cmd/benchmark-loader -db ./data/gpu-shopper.db \
  -dir ./results/benchmark_qwen2_7b_* \
  -provider vastai -price 0.16 -location "US"

Benchmark Methodology

  • Duration: 5 minutes throughput + 5 quality prompts per model per GPU
  • Max Tokens: 500 per request
  • Concurrency: 1 (sequential requests)
  • Prompts: 6 types (reasoning, coding, knowledge, creative, instruction, throughput)
  • Quality Metrics: TTFT (time to first token), match rate (output correctness)
  • Runtime: Ollama (latest stable)
  • Metrics: TPS, TTFT, match rate, GPU utilization, temperature, power draw, error rates

See docs/BENCHMARKING.md for the complete benchmarking infrastructure documentation, collected results, and API reference. See docs/BENCHMARK_REPORT.md for the raw benchmark analysis.

CLI Reference

Global Flags

All commands support these global flags:

--server string    GPU Shopper server URL (default: $GPU_SHOPPER_URL or "http://localhost:8080")
-o, --output string    Output format: "table" or "json" (default: "table")

Tip: Set GPU_SHOPPER_URL environment variable to avoid passing --server repeatedly:

export GPU_SHOPPER_URL=http://gpu-shopper.internal:8080

inventory

List available GPU offers from all providers.

./bin/gpu-shopper inventory [flags]

Flags:
  -p, --provider string   Filter by provider ("vastai", "bluelobster", "tensordock")
  -g, --gpu string        Filter by GPU type (e.g., "RTX4090", "A100")
      --min-vram int      Minimum VRAM in GB
      --max-price float   Maximum price per hour in USD
      --min-gpus int      Minimum number of GPUs

Example: Find cheap RTX 4090s

$ ./bin/gpu-shopper inventory -g RTX4090 --max-price 0.50

ID              PROVIDER    GPU    COUNT  VRAM   PRICE/HR  LOCATION
vastai-12345    vastai      RTX4090  1    24GB   $0.42     us-west
vastai-12346    vastai      RTX4090  1    24GB   $0.45     us-east
tensordock-789  tensordock  RTX4090  1    24GB   $0.48     eu-west

Total: 3 offers

Example: Find multi-GPU A100 instances

$ ./bin/gpu-shopper inventory -g A100 --min-gpus 4 --min-vram 40

ID              PROVIDER    GPU    COUNT  VRAM   PRICE/HR  LOCATION
vastai-99001    vastai      A100     8    80GB   $8.50     us-central
tensordock-445  tensordock  A100     4    40GB   $4.80     us-east

Total: 2 offers

Example: JSON output for scripting

./bin/gpu-shopper inventory -g RTX4090 -o json | jq '.offers[0].id'

provision

Provision a new GPU session.

./bin/gpu-shopper provision [flags]

Flags:
  -c, --consumer string     Consumer ID - identifies your application (required)
  -i, --offer string        Specific offer ID to provision
  -g, --gpu string          GPU type to auto-select cheapest offer (e.g., "RTX4090", "A100")
  -w, --workload string     Workload type (default: "llm")
                            Options: llm, llm_vllm, llm_tgi, training, batch, interactive
  -t, --hours int           Reservation hours, 1-12 (default: 2)
      --idle-timeout int    Idle timeout in minutes, 0 = disabled (default: 0)
      --storage string      Storage policy: "destroy" or "preserve" (default: "destroy")
      --save-key string     Save SSH private key to this file path

Note: Either --offer or --gpu must be provided. Using --gpu auto-selects the cheapest available offer of that GPU type.

Example: Provision with auto-select

$ ./bin/gpu-shopper provision -c my-llm-service -g RTX4090 -t 4

Auto-selected offer vastai-12345 (RTX4090, $0.42/hr)

Session provisioned successfully!

Session ID:    sess-abc123
Provider:      vastai
GPU Type:      RTX4090
Status:        provisioning
Price/Hour:    $0.42
Expires At:    2026-02-02 18:00:00

SSH Connection:
  Host: 192.168.1.100
  Port: 22
  User: root

SSH Private Key (save this, shown only once):
---BEGIN---
-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAA...
-----END OPENSSH PRIVATE KEY-----
---END---

Note: The session is provisioning. Check status with:
  gpu-shopper sessions get sess-abc123

Example: Provision with key file saved

./bin/gpu-shopper provision -c training-job -i tensordock-789 -t 8 \
  -w training --save-key ~/.ssh/session_key
chmod 600 ~/.ssh/session_key
ssh -i ~/.ssh/session_key root@192.168.1.100

Example: Provision for vLLM inference

./bin/gpu-shopper provision -c vllm-api -g A100 -w llm_vllm -t 6 --idle-timeout 30

sessions

Manage active GPU sessions.

./bin/gpu-shopper sessions <subcommand> [flags]

Subcommands:
  list      List all sessions
  get       Get session details
  done      Signal session completion (graceful shutdown)
  extend    Extend session reservation
  delete    Force delete a session

sessions list

./bin/gpu-shopper sessions list [flags]

Flags:
  -c, --consumer string   Filter by consumer ID
  -s, --status string     Filter by status (provisioning, running, stopping, terminated, failed)

Example:

$ ./bin/gpu-shopper sessions list -c my-app

ID           CONSUMER  PROVIDER    GPU       STATUS   PRICE/HR  EXPIRES
sess-abc123  my-app    vastai      RTX4090   running  $0.42     2026-02-02 18:00:00
sess-def456  my-app    tensordock  A100      running  $1.20     2026-02-02 20:00:00

Total: 2 sessions

sessions get

$ ./bin/gpu-shopper sessions get sess-abc123

Session ID:     sess-abc123
Consumer ID:    my-app
Provider:       vastai
GPU Type:       RTX4090
GPU Count:      1
Status:         running
Workload Type:  llm
Price/Hour:     $0.42
Created At:     2026-02-02 14:00:00
Expires At:     2026-02-02 18:00:00

SSH Connection:
  ssh -p 22 root@192.168.1.100

sessions done

$ ./bin/gpu-shopper sessions done sess-abc123
Session sess-abc123 shutdown initiated.

sessions extend

./bin/gpu-shopper sessions extend <session-id> [flags]

Flags:
  -t, --hours int   Additional hours to extend, 1-12 (default: 1)

Example:

$ ./bin/gpu-shopper sessions extend sess-abc123 -t 2
Session sess-abc123 extended by 2 hours.
New expiration: 2026-02-02 20:00:00

sessions delete

$ ./bin/gpu-shopper sessions delete sess-abc123
Session sess-abc123 destroyed.

shutdown

Shutdown a GPU session (alternative to sessions done).

./bin/gpu-shopper shutdown <session-id> [flags]

Flags:
  -f, --force   Force immediate shutdown (skip graceful termination)

Example: Graceful shutdown

$ ./bin/gpu-shopper shutdown sess-abc123
Session sess-abc123 shutdown initiated.
The session will terminate gracefully.

Example: Force shutdown

$ ./bin/gpu-shopper shutdown sess-abc123 --force
Session sess-abc123 forcefully destroyed.

costs

View cost information and summaries.

./bin/gpu-shopper costs [flags]

Flags:
  -c, --consumer string   Filter by consumer ID
  -s, --session string    Get cost for specific session
  -p, --period string     Time period: "daily" or "monthly"
      --start string      Start date (YYYY-MM-DD)
      --end string        End date (YYYY-MM-DD)

Example: View all costs

$ ./bin/gpu-shopper costs

Cost Summary
============

Total Cost:    $145.67
Sessions:      28
Hours Used:    298.5

By Provider:
  vastai       $95.00
  tensordock   $50.67

By GPU Type:
  RTX4090      $85.00
  A100         $60.67

Example: Filter by consumer and date range

./bin/gpu-shopper costs -c my-app --start 2026-01-01 --end 2026-01-31

costs summary

./bin/gpu-shopper costs summary [flags]

Flags:
  -c, --consumer string   Filter by consumer ID

transfer

Transfer files to/from GPU sessions using SFTP.

./bin/gpu-shopper transfer <subcommand> [flags]

Subcommands:
  upload     Upload a file to a session
  download   Download a file from a session

Flags (all subcommands):
  -k, --key string       SSH private key file (required)
  -t, --timeout duration Transfer timeout (default: 5m)

Example: Upload model weights

./bin/gpu-shopper transfer upload ./model.bin sess-abc123:/workspace/model.bin \
  -k ~/.ssh/session_key

Example: Download training results

./bin/gpu-shopper transfer download sess-abc123:/workspace/output/checkpoint.pt \
  ./checkpoint.pt -k ~/.ssh/session_key

cleanup-orphans

Find and destroy orphan GPU instances directly from providers. Works without the API server.

This is a safety command for emergency cleanup when instances may have been orphaned due to server issues.

./bin/gpu-shopper cleanup-orphans [flags]

Flags:
      --execute           Actually destroy instances (default is dry-run)
      --force             Skip confirmation prompt when destroying
  -p, --provider string   Target specific provider ("vastai", "tensordock")

Requires environment variables:

  • VASTAI_API_KEY for Vast.ai
  • TENSORDOCK_AUTH_ID and TENSORDOCK_API_TOKEN for TensorDock

Example: Dry-run (default)

$ ./bin/gpu-shopper cleanup-orphans

Scanning for orphan instances...

Checking vastai...
  Found 2 shopper-managed instances
Checking tensordock...
  Found 1 shopper-managed instances

PROVIDER    INSTANCE ID  NAME                  STATUS   PRICE/HR  STARTED
--------    -----------  ----                  ------   --------  -------
vastai      12345        gpu-shopper-abc123    running  $0.420    2026-02-02 10:00
vastai      12346        gpu-shopper-def456    running  $0.450    2026-02-01 22:00
tensordock  td-789       gpu-shopper-session   running  $1.200    2026-02-02 08:00

Total: 3 instances, $2.070/hr combined cost

This was a dry-run. To actually destroy these instances, run:
  gpu-shopper cleanup-orphans --execute

Example: Execute cleanup

$ ./bin/gpu-shopper cleanup-orphans --execute

Scanning for orphan instances...
[... table output ...]

WARNING: You are about to destroy 3 instance(s).
Type 'yes' to confirm: yes

Destroying instances...
  Destroying vastai/12345... OK
  Destroying vastai/12346... OK
  Destroying tensordock/td-789... OK

Cleanup complete: 3 destroyed, 0 failed

Example: Target single provider with no confirmation

./bin/gpu-shopper cleanup-orphans -p vastai --execute --force

CLI Tips

Filtering inventory effectively:

# Find the absolute cheapest available GPU
./bin/gpu-shopper inventory --max-price 0.30 -o json | jq -r '.offers[0]'

# Compare prices across providers for same GPU
./bin/gpu-shopper inventory -g A100 | sort -t'$' -k6 -n

# Find high-VRAM GPUs for large models
./bin/gpu-shopper inventory --min-vram 48

Automation patterns:

# Provision and capture session ID
SESSION_ID=$(./bin/gpu-shopper provision -c batch-job -g RTX4090 -t 2 -o json | jq -r '.session.id')

# Wait for session to be running
while [ "$(./bin/gpu-shopper sessions get $SESSION_ID -o json | jq -r '.status')" != "running" ]; do
  sleep 5
done

# Run your workload...

# Clean up when done
./bin/gpu-shopper sessions done $SESSION_ID

Monitor costs in real-time:

watch -n 60 './bin/gpu-shopper costs -c my-app'

API Overview

Endpoint Method Description
/health GET Health check
/ready GET Readiness check
/metrics GET Prometheus metrics
/api/v1/inventory GET List available GPUs (supports min_cuda, template_hash_id filters)
/api/v1/inventory/:id GET Get specific offer
/api/v1/inventory/:id/compatible-templates GET Get compatible templates for offer
/api/v1/templates GET List available templates (Vast.ai)
/api/v1/templates/:hash_id GET Get specific template
/api/v1/sessions POST Create session (supports template_hash_id, disk_gb, auto_retry)
/api/v1/sessions GET List sessions
/api/v1/sessions/:id GET Get session
/api/v1/sessions/:id DELETE Force destroy session
/api/v1/sessions/:id/done POST Signal session complete
/api/v1/sessions/:id/extend POST Extend session
/api/v1/sessions/:id/diagnostics GET Post-provision runtime diagnostics
/api/v1/costs GET Get costs
/api/v1/costs/summary GET Monthly cost summary
/api/v1/offer-health GET Offer failure tracking status
/api/v1/benchmarks GET List benchmark results
/api/v1/benchmarks/:id GET Get specific benchmark
/api/v1/benchmarks POST Submit new benchmark result
/api/v1/benchmarks/best GET Best performing benchmark for model
/api/v1/benchmarks/cheapest GET Most cost-effective benchmark for model
/api/v1/benchmarks/compare GET Compare benchmarks for model across hardware
/api/v1/benchmarks/recommendations GET Hardware recommendations based on benchmarks
/api/v1/benchmark-runs POST Start automated benchmark run
/api/v1/benchmark-runs/:id GET Get benchmark run status
/api/v1/benchmark-runs/:id DELETE Cancel benchmark run
/api/v1/benchmark-schedules POST Create benchmark schedule
/api/v1/benchmark-schedules GET List benchmark schedules
/api/v1/benchmark-schedules/:id PUT Update benchmark schedule
/api/v1/benchmark-schedules/:id DELETE Delete benchmark schedule

See docs/API.md for full API documentation with request/response examples.

Configuration Reference

Environment Variables

Variable Required Description
VASTAI_API_KEY Yes* API key for Vast.ai provider
BLUELOBSTER_API_KEY Yes* API key for Blue Lobster provider
TENSORDOCK_API_TOKEN Yes* API token for TensorDock provider
TENSORDOCK_AUTH_ID Yes* Auth ID for TensorDock provider
DATABASE_PATH No SQLite database path (default: ./data/gpu-shopper.db)
SERVER_HOST No Server bind address (default: 0.0.0.0)
SERVER_PORT No Server port (default: 8080)
LOG_LEVEL No Logging level: debug, info, warn, error (default: info)

*At least one provider must be configured.

Lifecycle Configuration

Variable Default Description
LIFECYCLE_CHECK_INTERVAL 60s How often to check session status
HARD_MAX_HOURS 12 Maximum session duration before forced shutdown
ORPHAN_GRACE_PERIOD 15m Grace period before orphan detection triggers
RECONCILIATION_INTERVAL 5m How often to reconcile with providers

Inventory Configuration

Variable Default Description
INVENTORY_CACHE_TTL 60s How long to cache inventory responses
INVENTORY_BACKOFF_TTL 300s Cache TTL when provider is rate-limited

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    CLOUD GPU SHOPPER                         │
├─────────────────────────────────────────────────────────────┤
│  REST API (Gin)  │  CLI (Cobra)  │  Background Jobs          │
├─────────────────────────────────────────────────────────────┤
│  Inventory │ Provisioner │ Lifecycle │ Cost Tracker          │
├─────────────────────────────────────────────────────────────┤
│     Vast.ai Adapter  │  Blue Lobster Adapter  │  TensorDock Adapter │
├─────────────────────────────────────────────────────────────┤
│                     SQLite Storage                           │
└─────────────────────────────────────────────────────────────┘
                              │
                    Provider API + SSH Verification
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                      GPU NODE (Remote)                       │
├─────────────────────────────────────────────────────────────┤
│  Consumer Workload: vLLM, Training, Batch Jobs               │
└─────────────────────────────────────────────────────────────┘

See ARCHITECTURE.md for detailed design documentation.

Safety Systems

The service is designed with "zero orphaned instances" as the primary goal:

  1. Two-Phase Provisioning: Database record created before provider call
  2. Verified Destruction: Retries and confirms instance is gone
  3. Instance Tagging: All instances tagged for reconciliation
  4. Provider Reconciliation: Compares DB vs provider every 5 minutes
  5. 12-Hour Hard Max: Automatic shutdown (CLI override available)
  6. SSH Verification: Validates instance readiness via SSH connectivity
  7. Orphan Detection: Alerts and auto-destroys orphaned instances

Development

# Run tests
go test ./...

# Run tests with race detection (recommended)
go test -race ./...

# Run E2E tests
go test -tags=e2e ./test/e2e/...

# Run tests with coverage
go test -cover ./...

# Build all binaries
go build -o bin/server ./cmd/server
go build -o bin/gpu-shopper ./cmd/cli

Test Quality

All tests are designed to be:

  • Race-free: Pass with go test -race
  • Deterministic: Use require.Eventually() instead of time.Sleep()
  • Isolated: Proper cleanup with t.Cleanup() and deferred resource release
  • Time-injectable: Services support WithTimeFunc() for controlled time testing

Project Structure

├── cmd/
│   ├── server/           # API server
│   ├── cli/              # CLI tool (gpu-shopper)
│   └── benchmark-loader/ # Bulk benchmark result importer
├── internal/
│   ├── api/              # REST API handlers
│   ├── benchmark/        # Benchmark models, store, parser
│   ├── config/           # Configuration
│   ├── filetransfer/     # SCP/SFTP file transfer
│   ├── logging/          # Structured logging
│   ├── metrics/          # Prometheus metrics
│   ├── provider/         # Provider adapters (Vast.ai, Blue Lobster, TensorDock)
│   ├── service/          # Business logic
│   │   ├── benchmark/    #   Benchmark runner & scheduler
│   │   ├── cost/         #   Cost tracking & aggregation
│   │   ├── inventory/    #   Inventory cache & failure tracking
│   │   ├── lifecycle/    #   Lifecycle, reconciliation, startup recovery
│   │   └── provisioner/  #   Two-phase provisioning, SSH verification
│   ├── ssh/              # SSH verification, GPU/disk/OOM status
│   └── storage/          # SQLite persistence
├── pkg/
│   ├── models/           # Shared data models
│   └── client/           # Go client library
├── scripts/              # Benchmark & test scripts
├── deploy/               # Docker & compose files
├── test/                 # E2E, live, and mock provider tests
└── docs/                 # Documentation

Development Status

See PROGRESS.md for detailed implementation status.

Current Phase: Post-MVP Feature Development

  • MVP fully implemented with all safety systems
  • Comprehensive QA review completed (120+ issues addressed)
  • Automated benchmark infrastructure with 50 results across 9 GPUs, 8 models, 2 providers
  • Auto-retry, failure tracking, and structured error types for consumer apps
  • Benchmark scheduling, CI/CD with native ARM64 builds
  • 17 bugs tracked and resolved (5 provider-side mitigated)

Getting Help

  • API Reference: See docs/API.md for complete API documentation
  • Architecture Details: See ARCHITECTURE.md for internal design
  • Contributing: See CONTRIBUTING.md for contribution guidelines
  • Bug Reports: Open an issue on GitHub for bug reports and feature requests

License

MIT

About

A tool for deployment of $/hour gpu nodes - new providers welcome

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors