Data sources & methodology
Where the numbers come from. GPU Hunter aggregates published benchmark data from community testing — we don't run our own benchmarks.
How we source data
GPU Hunter is an aggregation index, not a testing lab. We collect inference benchmark results from published community sources — primarily llama.cpp GitHub discussions, hardware review sites, and public leaderboards. We normalize this data into a consistent format so you can compare GPUs across vendors without digging through dozens of threads and spreadsheets.
Where the benchmarks come from
Our primary benchmark data comes from llama.cpp community testing threads on GitHub, where contributors run standardized inference tests and share results publicly.
Additional references
We cross-reference primary data with hardware review sites and aggregated leaderboards to fill gaps and validate results.
What the sources measure
The community benchmarks we source typically use the following methodology. Exact parameters vary by contributor and hardware platform.
How we process the data
Raw benchmark numbers from different sources aren't directly comparable — different models, quantizations, and context lengths produce different results. We normalize by mapping each GPU's reported tok/s to a relative performance score, weighted by VRAM capacity and current market price. This gives a consistent ranking even when the underlying test configurations differ slightly.
Hardware prices
All prices reflect current market rates as of April 2026 — MSRP for new cards, and typical used market prices (eBay, r/hardwareswap) for older GPUs like the RTX 3090. Prices are updated periodically but may lag behind rapid market changes.
Quantization levels
We track three quantization levels to cover the full spectrum of quality vs. speed tradeoffs.
4-bit quantization with K-means optimization. Best speed-to-quality ratio for most users. ~60% smaller than FP16.
8-bit quantization. Minimal quality loss vs. full precision. ~50% smaller than FP16. Good for tasks requiring high accuracy.
Full 16-bit floating point. No quantization loss. Requires the most VRAM. Use when you need exact model fidelity.
Research behind the methodology
These research clusters explain why GPU Hunter weighs VRAM, memory bandwidth, quantization format, and runtime support instead of ranking GPUs by raw TFLOPS alone.