home/methodology

Data sources & methodology

Where the numbers come from. GPU Hunter aggregates published benchmark data from community testing — we don't run our own benchmarks.

01  //  Overview

How we source data

GPU Hunter is an aggregation index, not a testing lab. We collect inference benchmark results from published community sources — primarily llama.cpp GitHub discussions, hardware review sites, and public leaderboards. We normalize this data into a consistent format so you can compare GPUs across vendors without digging through dozens of threads and spreadsheets.

02  //  Primary sources

Where the benchmarks come from

Our primary benchmark data comes from llama.cpp community testing threads on GitHub, where contributors run standardized inference tests and share results publicly.

CUDA GPUs
NVIDIA consumer and workstation cards
Discussion
Apple Silicon
M1–M4 series, unified memory
Discussion
ROCm GPUs
AMD Radeon and Instinct series
Discussion
DGX Spark
NVIDIA Grace Blackwell desktop
Discussion
03  //  Secondary sources

Additional references

We cross-reference primary data with hardware review sites and aggregated leaderboards to fill gaps and validate results.

Hardware Corner GPU RankingVisit
AwesomeAgents Home GPU LeaderboardVisit
GPU Benchmarks on LLM Inference (XiongjieDai)Visit
04  //  Benchmark details

What the sources measure

The community benchmarks we source typically use the following methodology. Exact parameters vary by contributor and hardware platform.

Inference enginellama.cpp
Typical modelsLlama 7B/8B, Qwen 32B
QuantizationQ4_0, Q4_K_M, Q8_0
Metrictok/s (token generation)
Decode length128–512 tokens
ContextVaries by source
05  //  Normalization

How we process the data

Raw benchmark numbers from different sources aren't directly comparable — different models, quantizations, and context lengths produce different results. We normalize by mapping each GPU's reported tok/s to a relative performance score, weighted by VRAM capacity and current market price. This gives a consistent ranking even when the underlying test configurations differ slightly.

06  //  Pricing

Hardware prices

All prices reflect current market rates as of April 2026 — MSRP for new cards, and typical used market prices (eBay, r/hardwareswap) for older GPUs like the RTX 3090. Prices are updated periodically but may lag behind rapid market changes.

07  //  Model formats

Quantization levels

We track three quantization levels to cover the full spectrum of quality vs. speed tradeoffs.

Q4_K_MRecommended

4-bit quantization with K-means optimization. Best speed-to-quality ratio for most users. ~60% smaller than FP16.

Q8_0High quality

8-bit quantization. Minimal quality loss vs. full precision. ~50% smaller than FP16. Good for tasks requiring high accuracy.

FP16Full precision

Full 16-bit floating point. No quantization loss. Requires the most VRAM. Use when you need exact model fidelity.

research backing

Research behind the methodology

These research clusters explain why GPU Hunter weighs VRAM, memory bandwidth, quantization format, and runtime support instead of ranking GPUs by raw TFLOPS alone.

Browse the index