New · Blackwell PRO 6000 addedv0.4.1

The GPU index
for local inference

Independent benchmarks across consumer cards, workstation Blackwell, Apple Silicon, and DGX Spark. Find the cheapest hardware that fits the model you actually want to run.

7
GPUs tracked
5
Models profiled
105
Benchmarks run
2h ago
Last update
01  //  Editorial picks · April 2026

The shortlist

See full index
02  //  Feature benchmark

Qwen3 32B
at Q4_K_M

Single-stream decode, 4096 context, batch=1. All numbers are median of 5 runs on bare metal.

# method
llama.cpp "b4732"
prompt 512 tokens
decode 512 tokens
temp 0.0
Methodology
03  //  Will it fit?

Pick a model.
See what runs it.

Hardware is wasted if it can't load the weights you care about. Start with the model — we'll tell you the cheapest GPU that fits.

Model
Quantization
Estimated VRAM required
78 GB
Compatible GPUs
4 / 7
04  //  Field notes

From the lab

All posts
05  //  Coming soon

gpuhunter CLI

Query the index from your terminal. Pipe results into your buying spreadsheet. Subscribe to price drops on cards you're tracking.

Waitlist opening Q3 2026
~/projects/inference-rig
$ gpuhunter fit qwen3-72b --quant q8 --budget 5000
→ analyzing 47 GPUs · 5 quantization levels…
→ 3 candidates within budget
┌─────────────────────┬──────┬────────┬─────────┐
gpuvramtok/sprice
├─────────────────────┼──────┼────────┼─────────┤
RTX PRO 6000 │ 96GB │ 96.0 │ $8,499 │
M3 Ultra 256 │128GB │ 44.0 │ $5,499 │
2× RTX 5090 │ 64GB │ 176.0 │ $3,998 │
└─────────────────────┴──────┴────────┴─────────┘
$ _