Feature/multi gpu by theodufort · Pull Request #84 · Andyyyy64/whichllm

theodufort · 2026-06-03T22:53:56Z

Allow simulating multiple GPUs with comma-separated names (RTX 5080,RTX 5060 Ti) or count shorthand (2x RTX 4090, 4x H100). VRAM fit uses conservative pooling — per-GPU framework overhead (~300MB each) and a utilization factor (95% homogeneous, 90% heterogeneous) are applied rather than naively summing all VRAM as one device. Speed estimation uses a conservative flat 30% overhead factor without claiming precision about the interconnect topology (PCIe vs NVLink); multi-GPU throughput is always marked low-confidence.

What

Add multi-GPU support to implement #65

Parse --gpu values: comma-separated heterogeneous GPUs and Nx count shorthand
Conservative VRAM fit that accounts for per-GPU overhead and heterogeneous split inefficiency
Low-confidence speed estimates that avoid faking PCIe/NVLink precision
Warnings surfaced for multi-GPU splits and heterogeneous configurations

Why

Users with multiple GPUs (e.g. 2x RTX 3090, mixed RTX 4090 + 3090) need to see which models fit across their combined VRAM. The fit simulation layer comes first; speed modeling is deliberately conservative until interconnect and tensor-split assumptions can be properly validated.

Testing

Tests pass (pytest)
New tests added (if applicable)
[] Tested on real hardware (if hardware-related)

Notes

NVLink detection from the initial commit was removed — it incorrectly assumed any NVIDIA GPU with compute capability >= 7.0 has NVLink (consumer GPUs like RTX 4090 do not)
vram_available_bytes still reports the raw physical total; the conservative budget is only used internally for fit decisions
Per-GPU --vram override is not yet supported for multi-GPU; the error message now says so clearly
Speed estimates for multi-GPU are always low-confidence with an explicit note about PCIe/NVLink dependence — this is intentional, not a gap

Allow simulating multiple GPUs with comma-separated names ("RTX 5080,RTX 5060 Ti") or count shorthand ("2x RTX 4090", "4x H100"). VRAM is pooled across all GPUs for fit determination. Speed estimation uses a tensor-parallel model where the slowest GPU is the bottleneck, with inter-GPU communication overhead (PCIe/NVLink) factored in. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The initial multi-GPU commit naively pooled all GPU VRAM as one device and assumed any NVIDIA GPU with compute capability >= 7.0 has NVLink (wrong for consumer GPUs like RTX 4090). This commit: - Applies per-GPU framework overhead (~300MB) and a utilization factor (95% homogeneous, 90% heterogeneous) to VRAM fit checks - Replaces the NVLink/PCIe sync model with a flat 30% overhead factor, avoiding false precision about interconnect topology - Adds warnings for multi-GPU and heterogeneous configurations - Fixes broken --vram error message grammar Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Andyyyy64 · 2026-06-09T03:02:47Z

Thanks @theodufort, this is a lot of careful work, and the conservative approach to VRAM pooling and speed looks right to me. It conflicts with main now, and since it touches the ranker, compatibility, and performance paths, I'd like it rebased on the latest main before I review it properly. Could you update it? I want to give the fit logic a careful read once it's clean.

theodufort and others added 2 commits May 31, 2026 13:40

theodufort mentioned this pull request Jun 3, 2026

Feature Request: Multi-GPU simulation via --gpu flag (heterogeneous setups) #65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/multi gpu#84

Feature/multi gpu#84
theodufort wants to merge 2 commits into
Andyyyy64:mainfrom
theodufort:feature/multi-gpu

theodufort commented Jun 3, 2026

Uh oh!

Andyyyy64 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

theodufort commented Jun 3, 2026

What

Why

Testing

Notes

Uh oh!

Andyyyy64 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants