# GPU Hunter > The GPU index for local AI inference ## What is GPU Hunter? GPU Hunter is an independent benchmark database for GPUs used in local AI inference. We test consumer GPUs, workstation cards (NVIDIA Blackwell), Apple Silicon, and NVIDIA DGX Spark to help engineers find the cheapest hardware that runs the models they need. ## Key Data Points - GPUs tracked: 7 - Models profiled: 5+ - Benchmark methodology: llama.cpp, single-stream decode, 4096 context, batch=1, median of 5 runs ## GPU Index - RTX PRO 6000 Blackwell: 96GB VRAM, 1792GB/s bandwidth, 142 tok/s (Qwen3 32B Q4), $8499 - GeForce RTX 5090: 32GB VRAM, 1792GB/s bandwidth, 138 tok/s (Qwen3 32B Q4), $1999 - GeForce RTX 4090: 24GB VRAM, 1008GB/s bandwidth, 96 tok/s (Qwen3 32B Q4), $1799 - GeForce RTX 3090: 24GB VRAM, 936GB/s bandwidth, 64 tok/s (Qwen3 32B Q4), $749 - NVIDIA DGX Spark: 128GB VRAM, 273GB/s bandwidth, 38 tok/s (Qwen3 32B Q4), $3999 - Apple M3 Ultra: 512GB VRAM, 819GB/s bandwidth, 72 tok/s (Qwen3 32B Q4), $9499 - Apple M4 Max: 128GB VRAM, 546GB/s bandwidth, 48 tok/s (Qwen3 32B Q4), $4699 ## Model VRAM Requirements - Qwen3 32B: Q4=19GB, Q8=36GB, FP16=64GB VRAM required - Qwen3 72B: Q4=42GB, Q8=78GB, FP16=144GB VRAM required - Qwen3 235B: Q4=132GB, Q8=240GB, FP16=470GB VRAM required - Llama 3.3 70B: Q4=40GB, Q8=75GB, FP16=140GB VRAM required - DeepSeek V3: Q4=380GB, Q8=700GB, FP16=1300GB VRAM required ## Pages - /browse — Full GPU index with filters, sorting, and budget calculator - /compare — Side-by-side GPU comparison tool - /gpu/[id] — Detailed GPU benchmark pages with specs, model fit, and pricing - /blog — Hardware benchmarks, buying guides, and technical write-ups ## How to Use 1. Pick a model you want to run (e.g., Qwen3 72B) 2. Choose your quantization (Q4, Q8, FP16) 3. GPU Hunter shows you which GPUs fit and ranks them by performance and price 4. Use the compare tool to evaluate your top picks side-by-side ## Contact Website: https://gpuhunter.io