home/methodology

Data sources & methodology

Where the numbers come from. GPU Hunter aggregates published benchmark data from community testing — we don't run our own benchmarks.

01 // Overview

How we source data

GPU Hunter is an aggregation index, not a testing lab. We collect inference benchmark results from published community sources — primarily llama.cpp GitHub discussions, hardware review sites, and public leaderboards. We normalize this data into a consistent format so you can compare GPUs across vendors without digging through dozens of threads and spreadsheets.

02 // Primary sources

Where the benchmarks come from

Our primary benchmark data comes from llama.cpp community testing threads on GitHub, where contributors run standardized inference tests and share results publicly.

CUDA GPUs

NVIDIA consumer and workstation cards

Discussion

Apple Silicon

M1–M4 series, unified memory

Discussion

ROCm GPUs

AMD Radeon and Instinct series

Discussion

DGX Spark

NVIDIA Grace Blackwell desktop

Discussion

03 // Secondary sources

Additional references

We cross-reference primary data with hardware review sites and aggregated leaderboards to fill gaps and validate results.

Hardware Corner GPU RankingVisit

AwesomeAgents Home GPU LeaderboardVisit

GPU Benchmarks on LLM Inference (XiongjieDai)Visit

04 // Benchmark details

What the sources measure

The community benchmarks we source typically use the following methodology. Exact parameters vary by contributor and hardware platform.

Inference enginellama.cpp

Typical modelsLlama 7B/8B, Qwen 32B

QuantizationQ4_0, Q4_K_M, Q8_0

Metrictok/s (token generation)

Decode length128–512 tokens

ContextVaries by source

05 // Normalization

How we process the data

Raw benchmark numbers from different sources aren't directly comparable — different models, quantizations, and context lengths produce different results. We normalize by mapping each GPU's reported tok/s to a relative performance score, weighted by VRAM capacity and current market price. This gives a consistent ranking even when the underlying test configurations differ slightly.

06 // Pricing

Hardware prices

All prices reflect current market rates as of April 2026 — MSRP for new cards, and typical used market prices (eBay, r/hardwareswap) for older GPUs like the RTX 3090. Prices are updated periodically but may lag behind rapid market changes.

07 // Model formats

Quantization levels

We track three quantization levels to cover the full spectrum of quality vs. speed tradeoffs.

Q4_K_MRecommended

4-bit quantization with K-means optimization. Best speed-to-quality ratio for most users. ~60% smaller than FP16.

Q8_0High quality

8-bit quantization. Minimal quality loss vs. full precision. ~50% smaller than FP16. Good for tasks requiring high accuracy.

FP16Full precision

Full 16-bit floating point. No quantization loss. Requires the most VRAM. Use when you need exact model fidelity.

research backing

Research behind the methodology

These research clusters explain why GPU Hunter weighs VRAM, memory bandwidth, quantization format, and runtime support instead of ranking GPUs by raw TFLOPS alone.

memory bandwidth and GPU kernel research

Open

KV cache optimization papers

Open

LLM quantization research

Open

Full research library

Open

Browse the index