GPU HUNTER/v0.4.1
BrowseCompareCalculatorBlog
⌘K
Find your GPU
GPU HUNTER

Independent benchmarks for local AI inference. Built for engineers who run models on their own metal.

Last sync · 2h agoAPI operational
Hardware
  • All GPUs
  • Workstation
  • Consumer
  • Apple Silicon
Tools
  • Compare
  • Calculator
  • Model Fit
Resources
  • Blog
  • llms.txt
© 2026 GPU HUNTER · Not affiliated with NVIDIA, AMD, or AppleSome links are affiliate links. We may earn a commission at no extra cost to you.build a3f4c2 · 2026.04.30
Back to blog
rtx-3090buying-guideused-gpulocal-aibudgetnvidiaampereinferencehardware

The 2026 Used RTX 3090 Buyer's Guide: Mining Cards, OEM Pulls & What to Avoid

The RTX 3090 remains the best $/VRAM GPU for local AI in 2026. 24GB for under $800. Here is exactly what to look for, what to avoid, and where to buy.

2026-04-08T10:00:00.000Z

TL;DR — The RTX 3090 is the best value GPU for local AI in 2026

The RTX 3090 gives you 24GB of VRAM for around $749 on the used market. That is half the original $1,499 MSRP and less than half the price of a used RTX 4090 at $1,799. It pushes 64 tok/s on Qwen3 32B Q4 — fast enough for real-time conversations, code completion, and RAG pipelines. If your budget is under $1,000 and you need to run 30B+ parameter models locally, the 3090 is the card.

This guide covers everything we have learned buying, testing, and recommending used 3090s over the last two years: what to look for, what to avoid, where to buy, and how to set it up for inference once it arrives.

R3

GeForce RTX 3090

NVIDIAConsumer
VRAM
24 GB
Bandwidth
936 GB/s
Q4 tok/s
64
Price
$749
Buy on Amazon View benchmarks

This post contains affiliate links. If you buy through our links, we may earn a commission at no extra cost to you. We only recommend hardware we have tested ourselves. See our ethics policy for details.


Table of Contents

  1. Why the RTX 3090 in 2026?
  2. What to Look For
  3. What to Avoid
  4. Where to Buy
  5. RTX 3090 vs Alternatives
  6. What Can It Actually Run?
  7. Setting Up for Inference
  8. Thermal Management
  9. The Bottom Line

Why the RTX 3090 in 2026?

Because nothing else gives you 24GB of VRAM for under $800.

The RTX 3090 launched in September 2020 at $1,499. It was designed as NVIDIA's flagship gaming card for the Ampere generation — the "BFGPU," as Jensen called it. Six years later, it has found a second life as the community's favorite budget inference card, and for good reason.

Here is the math. The 3090 has 24GB of GDDR6X on a 384-bit bus, delivering 936 GB/s of memory bandwidth. That is enough to push 64 tokens per second on Qwen3 32B at Q4 quantization — comfortably above the 30 tok/s threshold where conversations start to feel responsive. It handles 39 tok/s at Q8, and even 18 tok/s at FP16 for cases where you need maximum quality.

For context, the RTX 4090 — the next step up with the same 24GB VRAM — costs $1,799 used and delivers 96 tok/s on the same Qwen3 32B Q4 benchmark. That is 50% more performance for 140% more money. The 3090 wins on value by a wide margin.

The used market has also matured. The crypto mining crash flooded the market with 3090s starting in late 2022, and prices have stabilized at $700–800 since mid-2025. Supply is plentiful. You are no longer competing with miners for inventory — you are buying from them.

Three reasons the 3090 still matters in 2026:

  1. 24GB is the sweet spot. Most serious open-source models (Qwen3 32B, Llama 3.3 70B at aggressive quants, Mistral variants) fit in 24GB at useful quantization levels. The 16GB cards (RTX 4070 Ti Super, RTX 4080) cut you off from 30B+ models entirely.

  2. 936 GB/s bandwidth is adequate. Inference is memory-bandwidth-bound for autoregressive decoding. The 3090's 936 GB/s is behind the 4090's 1,008 GB/s, but not catastrophically so. You lose roughly 30% on tok/s, not 3x.

  3. The ecosystem supports it. llama.cpp, Ollama, vLLM, and every other major inference stack has been optimized on 3090s for years. You will find CUDA kernels, community benchmarks, and troubleshooting threads for every scenario.

Best GPUs for Local AI in 2026

Our complete ranking of every GPU we tested.

Read more

What to Look For

Buy cards with known history, intact fans, and triple-fan coolers. Here is how to evaluate what you are looking at.

Mining cards vs gaming cards vs OEM pulls

Not all used 3090s have the same backstory, and understanding the provenance helps you assess risk.

Mining cards are the most common on the used market. Contrary to popular belief, mining cards are often in better condition than gaming cards. Here is why: miners optimized for efficiency, not performance. A mining 3090 typically ran at 300W or less (versus 350W TDP), with stable core and memory clocks, at a constant temperature in a ventilated rig. There were no thermal cycles — the card was on 24/7 at a steady 65–75°C. That is easier on the silicon and solder than a gaming card that spikes to 83°C during a session and cools to ambient when the game closes.

The wear items on a mining card are the fans and the thermal paste. Fans running 24/7 for 18+ months will have bearing wear. Thermal paste degrades over time regardless of use. Both are replaceable for $15–30.

Gaming cards have lower hours but more thermal stress. A card with 2 years of heavy gaming use may have 3,000–5,000 hours on it. The thermal cycling means more expansion and contraction of the solder joints. The fans will be in better shape (they were not running constantly), but the paste may be equally degraded.

OEM pulls are cards removed from prebuilt systems or workstations. These are often the best finds because prebuilt systems tend to have conservative power targets, good airflow, and light-to-moderate use. Look for cards from Dell, HP, or Lenovo workstations. The catch: OEM cards sometimes have non-standard cooler designs or blower-style coolers, which are louder and run hotter. Check the cooler type before buying.

Fan condition

Fans are the number one failure point. Here is what to check:

  • Spin test. If buying in person, power the card and watch the fans. All three should spin smoothly at low RPM without wobble, grinding, or clicking. One bad fan means the bearing is going.
  • Visual inspection. Look at the fan blades for chips, cracks, or warping. Heat can deform cheap plastic blades over time.
  • Noise. A healthy fan at idle speeds is nearly silent. A hum or whine at low RPM indicates bearing wear.
  • If buying online, ask the seller for a video of the fans spinning. Any reputable seller will provide this. If they refuse, move on.

Replacement fans for most 3090 models cost $10–20 on Amazon or AliExpress. The swap takes 15 minutes and a Phillips screwdriver. This is not a dealbreaker — it is a negotiating point. A card with one dead fan should be priced $50–80 below market.

Thermal paste age

Every 3090 from 2020–2021 is running on 4–6 year old thermal paste. Even high-quality paste (Thermal Grizzly Kryonaut, Noctua NT-H1) dries out and loses conductivity after 3–4 years. Budget $10 for a tube of paste and 30 minutes to repaste the card when it arrives.

Signs of degraded thermal paste:

  • GPU temps above 85°C under sustained load with a triple-fan cooler
  • Thermal throttling (clock speeds drop during benchmarks)
  • Hot spot delta of more than 20°C above edge temperature

We repaste every used 3090 we receive. It is standard maintenance, not a red flag.

PCB revision

The RTX 3090 had minor PCB revisions during its production run. The main one to be aware of is the Samsung vs Micron GDDR6X memory chips. Both work fine, but Micron-equipped cards tend to have slightly better memory overclocking headroom (irrelevant for inference) and marginally different thermal behavior. You can identify the memory manufacturer by checking GPU-Z after installing the card.

For inference purposes, the PCB revision does not matter. Do not pay a premium for one revision over another.

Dual-fan vs triple-fan vs blower designs

This matters more than most buyers realize.

Triple-fan open-air coolers (Founders Edition, EVGA FTW3, ASUS TUF, MSI Suprim X) are the gold standard. Three fans across a 300mm+ heatsink keep the GPU under 75°C at full load with acceptable noise levels around 35–40 dBA. These are what you want for a desktop inference setup.

Dual-fan coolers (EVGA XC3, Gigabyte Eagle, some Zotac models) save PCB space but run hotter and louder. Expect 5–10°C higher temps and more fan noise. Still workable, especially if you are undervolting for inference (more on that later), but they leave less thermal headroom.

Blower-style coolers (some OEM pulls, Quadro variants) exhaust heat out the back of the case. Pros: great for multi-GPU setups or cramped cases. Cons: louder (45–55 dBA under load) and hotter (85°C+ is common). For a single-GPU inference box, avoid blowers unless your case has no airflow or you are stacking multiple GPUs.

Our recommendation: target a triple-fan card. EVGA FTW3, ASUS TUF OC, and MSI Suprim X are the three most common, best-cooled 3090 variants on the used market. The Founders Edition is also excellent but commands a $50–100 premium due to collector demand.

Warranty status

Most manufacturer warranties on 3090s have expired by now (EVGA's was 3 years, ASUS and MSI were 3–4 years). A few cards from late production runs (early 2022) may still have residual warranty. It is a nice bonus but should not drive your purchase decision.

EVGA exited the GPU market in 2022 and is no longer honoring new warranty claims. Cards from ASUS, MSI, Gigabyte, and Zotac may still be serviced if within warranty — check with the manufacturer using the serial number before purchase.


What to Avoid

Dying fans, blower coolers for single-GPU builds, modded BIOS, and prices that are too good to be true.

Cards with dying fans

If a seller says "one fan doesn't spin but the other two work fine" — this is not fine. The 3090 is a 350W card. Two fans cannot adequately cool it under sustained inference loads. You will thermal throttle, and the remaining fans will burn out faster from the extra load.

Buy it only if the price reflects the repair cost ($15 for fans + $50 discount for the hassle). Otherwise, keep scrolling.

Cheap cooler designs that run hot

Some budget AIB partners (certain Zotac Twin Edge, some Palit models) shipped 3090s with undersized heatsinks. These cards were loud at stock settings and needed aggressive fan curves or undervolting to stay under 80°C. They work, but they are not the ideal choice when better-cooled cards are available at the same price.

Look up the specific model before buying. A quick search for "[model name] thermal review" will tell you if the cooler is adequate.

Cards with modded BIOS

Some overclockers and miners flashed custom BIOS to increase power limits or change fan curves. This is detectable: GPU-Z shows the BIOS version, which you can cross-reference against the manufacturer's official BIOS repository on TechPowerUp.

A modded BIOS is not dangerous per se — it will not damage the card. But it indicates a card that was pushed beyond stock specifications, which means more wear on the VRMs and memory. You can flash the card back to stock BIOS yourself, but the accumulated wear remains.

If the seller discloses the mod and the price is right, it is fine. If the seller does not mention it and you discover it after purchase, that is a red flag about what else they are not disclosing.

Suspiciously low prices

In April 2026, the market rate for a working used RTX 3090 is $700–800, depending on the model and condition. If you see a "RTX 3090 WORKS PERFECT" listing for $450, one of three things is happening:

  1. It's a scam. Fake eBay listings with stolen photos are common. Check seller history, feedback score, and whether the listing has realistic photos.
  2. It's not a 3090. Some scammers list a 3090 but ship a 3060 or an old Quadro card. Only buy from sellers with return policies.
  3. Something is wrong with the card. VRAM errors, thermal throttling, or damaged PCB that the seller is not disclosing.

If the deal seems too good, it is. Budget $750 and get a card from a reputable seller with a return policy.


Where to Buy

Amazon Renewed or eBay with buyer protection for the safest transactions. r/hardwareswap for the best deals if you're comfortable with peer-to-peer.

Amazon

Amazon has both new-old-stock and Amazon Renewed (refurbished) RTX 3090s. Renewed cards come with a 90-day return policy, which is significant — you have three months to stress test the card and return it if anything is wrong. Prices are typically at the higher end of the range ($780–850) but the return policy is worth the premium.

Buy GeForce RTX 3090 on Amazon

eBay

The largest selection of used 3090s. Filter for sellers with 99%+ positive feedback and 100+ ratings. Use eBay's buyer protection — if the card is not as described, you get a refund. Pay with PayPal for an additional layer of protection.

Watch for auction sniping opportunities. Many 3090 auctions end at $680–720, below Buy It Now prices. Set a maximum bid of $750 and walk away.

r/hardwareswap

Reddit's hardware trading community. Prices are 10–15% below eBay because there are no platform fees. The trade-off is less buyer protection — disputes are resolved through PayPal claims rather than a platform.

Rules: always use PayPal Goods & Services (never Friends & Family), check the seller's trade history (flair system), and ask for timestamped photos. Most r/hardwareswap sellers are enthusiasts who take care of their hardware.

Local deals (Facebook Marketplace, Craigslist)

Cash deals with no buyer protection. Bring a test system — a basic PC with a PSU and motherboard — and verify the card posts to BIOS and renders a desktop. Check GPU-Z for the correct GPU die (GA102 for the 3090) and 24GB VRAM. If the seller won't let you test it, walk away.

The advantage: lowest prices ($650–700) and no shipping risk. The disadvantage: limited selection and no recourse if the card dies a week later.


RTX 3090 vs Alternatives

The 3090 wins on $/VRAM. The 4090 wins on performance. The 5090 wins on both but costs 2.7x as much.

Here's how the RTX 3090 stacks up against other options for local AI inference:

SpecRTX 3090RTX 4090RTX 5090
VRAM24GB GDDR6X24GB GDDR6X32GB GDDR7
Bandwidth936 GB/s1,008 GB/s1,792 GB/s
Qwen3 32B Q464 tok/s96 tok/s138 tok/s
Qwen3 32B Q839 tok/s58 tok/s88 tok/s
Qwen3 32B FP1618 tok/s31 tok/s44 tok/s
TDP350W450W575W
Used/Street Price~$749~$1,799~$1,999
$/VRAM$31.21/GB$74.96/GB$62.47/GB
ArchitectureAmpereAda LovelaceBlackwell
PCIeGen 4 x16Gen 4 x16Gen 5 x16

RTX 3090 vs RTX 3090 Ti: Both have 24GB VRAM. The Ti bumps bandwidth to 1,008 GB/s (same as the 4090) and adds ~10% more CUDA cores. In practice, expect 5–10% more tok/s on the Ti. The Ti typically sells for $50–100 more than the standard 3090. If you find them at the same price, take the Ti. Otherwise, the standard 3090 is the better value.

RTX 3090 vs RTX 4070 Ti Super (16GB): The 4070 Ti Super is newer, more power-efficient, and available new for around $800. But it only has 16GB of VRAM. That is the dealbreaker. You cannot run Qwen3 32B at Q4 (19GB) on 16GB. The 3090's 24GB opens up an entire tier of models that 16GB cards cannot touch. For gaming, take the 4070 Ti Super. For AI inference, the 3090 wins.

RTX 3090 vs used RTX 4090: If you can afford $1,799, the 4090 is better in every way — 50% more bandwidth, 50% more tok/s, newer architecture with better power efficiency. But at 2.4x the price for 50% more performance, the 3090 is the better value. The 4090 makes sense if tok/s matters more than cost — interactive applications, real-time agents, or production serving.


What Can It Actually Run?

Anything up to 32B parameters at Q4, and 70B at aggressive quantization.

The RTX 3090's 24GB of VRAM determines what models fit. Here is the practical breakdown:

ModelQuantizationVRAM RequiredFits on 3090?Estimated tok/s
Qwen3 7BQ4~5GBYes, easily120+
Qwen3 7BFP16~14GBYes80+
Qwen3 14BQ4~9GBYes90+
Qwen3 32BQ419GBYes64
Qwen3 32BQ836GBNo—
Llama 3.3 70BQ2~22GBTight fit20–25
Llama 3.3 70BQ440GBNo—
Qwen3 72BQ442GBNo—
DeepSeek V3Q4380GBNo—

The sweet spot is Qwen3 32B at Q4 quantization. At 19GB, it fits comfortably in 24GB with room for KV cache context. Our benchmarks show 64 tok/s — fast enough for real-time conversation and well above the threshold for code completion tools.

For 70B models, you are limited to Q2 or IQ2 quantizations, which fit in 24GB but sacrifice noticeable quality. At Q2, Llama 3.3 70B loses coherence on complex reasoning tasks compared to Q4. We recommend sticking with Qwen3 32B Q4 rather than trying to squeeze a 70B model into VRAM at the cost of quality.

If you need 70B+ models at Q4, you need either a 48GB card (dual 3090 with NVLink — an option if you can find the NVLink bridge), or a single 4090/5090 with CPU offloading (slower but workable).

Context length note: VRAM usage increases with context length. The numbers above assume 4K–8K context windows. If you need 32K+ context, subtract 2–4GB from the available VRAM for KV cache. With Qwen3 32B Q4 at 32K context, you will use approximately 22–23GB — still fits, but barely.


Setting Up for Inference

Install Ollama for the simplest path, or llama.cpp for maximum control.

Once your 3090 arrives, repaste it (see Thermal Management below), install it, and get running.

Driver setup

Install the latest NVIDIA driver. On Ubuntu:

sudo apt-get update
sudo apt-get install -y nvidia-driver-550
sudo reboot

On Windows, download the latest Game Ready or Studio driver from nvidia.com. After reboot, verify with:

nvidia-smi

You should see your RTX 3090 with 24GB VRAM listed.

Option 1: Ollama (recommended for most users)

Ollama wraps llama.cpp in a clean CLI and handles model downloads, quantization selection, and GPU offloading automatically.

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull Qwen3 32B (automatically selects Q4_K_M)
ollama pull qwen3:32b

# Run it
ollama run qwen3:32b

That is it. Ollama detects your 3090, loads the model onto the GPU, and you are generating at 64 tok/s.

Option 2: llama.cpp (maximum control)

For users who want to tune batch sizes, context lengths, and quantization formats:

# Clone and build
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j$(nproc)

# Download a GGUF model (e.g., from HuggingFace)
# Then run:
./build/bin/llama-server \
  -m ./models/qwen3-32b-q4_k_m.gguf \
  -ngl 99 \
  -c 8192 \
  --host 0.0.0.0 \
  --port 8080

Key flags for the 3090:

  • -ngl 99 — Offload all layers to GPU. The 3090 can hold all 64 layers of Qwen3 32B Q4 in VRAM.
  • -c 8192 — Context window. Start at 8K. You can push to 16K on Qwen3 32B Q4 with 24GB, but 32K is tight.
  • -b 512 — Batch size for prompt processing. Default is fine; increase to 1024 if you do a lot of large-context ingestion.
  • --flash-attn — Enable Flash Attention. Reduces VRAM usage for KV cache and improves performance at long contexts. Use this.
  • -t 4 — CPU threads for non-GPU operations. Match to your CPU core count, but 4–8 is usually optimal.

Verifying performance

Run a quick benchmark after setup:

# With Ollama
ollama run qwen3:32b "Write a 500 word essay about distributed systems" --verbose

# Check the eval rate in the output — should be ~64 tok/s on RTX 3090

If you are seeing significantly lower numbers (under 50 tok/s), check:

  1. All layers are on GPU (nvidia-smi should show ~19GB VRAM used)
  2. PCIe is running at x16, not x8 (check nvidia-smi -q | grep "Link Width")
  3. Power limit is not being throttled (check nvidia-smi -q | grep "Power")

Thermal Management

Repaste on arrival, undervolt to 280W, and your 3090 will run cool and quiet for inference.

The RTX 3090 is a 350W card, but inference does not need 350W. Autoregressive decoding uses primarily the memory subsystem, not the CUDA cores. You can significantly reduce power and thermals without impacting tok/s.

Repasting

Budget 30 minutes. You need:

  • A Phillips #1 screwdriver
  • Thermal paste (Thermal Grizzly Kryonaut or Noctua NT-H2 — $8–12)
  • Isopropyl alcohol (90%+) and lint-free wipes
  • Optional: thermal pads for GDDR6X memory (1.5mm, 12 W/mK — the stock pads degrade too)

Steps:

  1. Remove the backplate screws (usually 4–8 Phillips screws around the perimeter)
  2. Carefully separate the cooler from the PCB. Go slowly — thermal pads may stick.
  3. Clean old paste from the GPU die and cooler contact surface with isopropyl alcohol
  4. Apply new paste (pea-sized dot on the GA102 die)
  5. If replacing memory thermal pads, cut them to match the old pads and place them on each GDDR6X module
  6. Reassemble and re-screw. Do not overtighten — snug plus a quarter turn.

Expected improvement: 5–15°C drop in GPU temperature, depending on how degraded the original paste was.

Undervolting for inference

The 3090's stock voltage/frequency curve targets gaming clocks of 1700–1900 MHz. For inference, you do not need those clocks — the bottleneck is memory bandwidth, not compute.

In NVIDIA's command line (Linux):

# Set power limit to 280W (from 350W stock)
sudo nvidia-smi -pl 280

# This persists until reboot. Add to a startup script for permanence.

On Windows, use MSI Afterburner:

  1. Open Afterburner → Ctrl+F to open the V/F curve
  2. Find the 800mV point and drag it up to 1700 MHz
  3. Flatten everything above 800mV to 1700 MHz
  4. Apply

Expected results:

  • Power draw drops from ~320W to ~240W during sustained inference
  • GPU temperature drops 8–12°C
  • Fan noise drops significantly — often silent at 30% fan speed
  • Token throughput stays within 2–3% of stock settings

We run all our 3090 test cards at 280W. The performance delta is negligible and the noise reduction is substantial.

Power supply requirements

The RTX 3090 has a 350W TDP and recommends a 750W PSU. For an inference-focused build with undervolting:

  • 650W PSU is workable if you are running a modest CPU (Ryzen 5, i5) and no other power-hungry components
  • 750W PSU gives comfortable headroom
  • 850W+ PSU if you plan to run at stock power or add a second GPU later

Use a quality unit from Corsair, Seasonic, or EVGA (they still make PSUs). The 3090 uses two 8-pin PCIe connectors — do not daisy-chain a single cable. Use two separate cables from the PSU.

Case airflow

The 3090 is a triple-slot card at 313mm long (Founders Edition). Make sure your case can physically fit it and has adequate front-to-back airflow. For a dedicated inference box, a mid-tower like the Fractal Meshify C or Corsair 4000D Airflow is ideal — good mesh front panels and plenty of 120mm/140mm fan mounts.

Minimum fan setup: two front intake fans and one rear exhaust. The GPU cooler does the heavy lifting, but it needs fresh air to work with.


The Bottom Line

Four takeaways:

  1. The RTX 3090 at $749 is the best $/VRAM GPU you can buy in 2026. Nothing else gives you 24GB — and access to 30B+ parameter models — for under $800. It scores a 78 in our GPU rankings, but per-dollar, it is unmatched.

  2. Mining cards are fine. Repaste and check the fans. The silicon does not care what workload it ran. The thermal paste and fan bearings are the wear items, and both are cheap to replace.

  3. Target a triple-fan AIB card. EVGA FTW3, ASUS TUF, or MSI Suprim X. Avoid blower coolers for single-GPU builds. Budget $750 and buy from a seller with a return policy.

  4. Undervolt to 280W for inference. The 3090 does not need 350W for token generation. Drop power, drop temps, drop noise — keep 97% of the performance.

The RTX 3090 is not the fastest card. It is not the most efficient card. But it is the card that puts 24GB of VRAM and 64 tok/s on Qwen3 32B Q4 in your hands for the price of a mid-range gaming GPU. For anyone starting with local AI on a budget, it is the obvious choice.

Buy GeForce RTX 3090 on Amazon
Best GPUs for Local AI in 2026

Our complete ranking of every GPU we tested.

Read more

Sources

  • NVIDIA RTX 3090 official specifications — nvidia.com
  • GPU Hunter benchmark data — tested with llama.cpp b4532, CUDA 12.4, driver 550.54
  • GDDR6X thermal pad replacement guide — igorslab.de
  • Qwen3 model cards and VRAM requirements — huggingface.co/Qwen
  • llama.cpp GPU offloading documentation — github.com/ggerganov/llama.cpp
  • r/LocalLLaMA community benchmarks — reddit.com/r/LocalLLaMA