Optional. Without CPU, offload uses ~50 GB/s (DDR4-3200 dual channel).
How much text the model reads at once. Impacts speed, VRAM, and which models fit.
A precision tool for running AI locally. Tell us your GPU — we rank 328 models across 1,720 benchmarked GPUs by real speed, VRAM fit, and quality.
~ 2 seconds to first recommendation · no signup · free forever
↑ SWITCH GPUs — RESULTS COMPUTED FROM LIVE BENCHMARK DATA
Four routes in. Pick one.
New models · GPU deals · benchmark updates. Once a week. Unsubscribe with one click.
NO SPAM · NO TRACKERS · POWERED BY BUTTONDOWN
100% private — no cloud, no subscriptions
Optional. Without CPU, offload uses ~50 GB/s (DDR4-3200 dual channel).
How much text the model reads at once. Impacts speed, VRAM, and which models fit.
Side-by-side comparison of models, GPUs, and performance
Optional. Without CPU, offload uses ~50 GB/s (DDR4-3200 dual channel).
FitMyLLM helps you find and run AI models on your own hardware. Enter your GPU — whether it's an NVIDIA RTX 4090, RTX 3090, RTX 3060, AMD RX 7900 XTX, or Apple M4 — and get instant recommendations for the best models that fit your VRAM, with speed estimates and ready-to-run Ollama commands.
Our database covers 330+ open-source LLMs including Llama 4, Qwen 3.5, DeepSeek R1, DeepSeek V3, Gemma 3, Phi-4, Mistral, and more. Each model includes benchmarks (MMLU-PRO, HumanEval, MATH, IFEval), VRAM requirements at every quantization level (Q4_K_M, Q5_K_M, Q6_K, Q8_0, FP16), and compatibility data for 1720 GPUs.
Whether you need an AI for coding (Qwen 2.5 Coder, DeepSeek Coder), creative writing, chat, reasoning (DeepSeek R1), or document analysis with RAG — FitMyLLM finds the optimal model for your specific hardware in seconds.
Running LLMs locally requires GPU VRAM (video memory). The amount depends on model size and quantization: a 7B parameter model at Q4 quantization needs about 4GB VRAM, while a 70B model needs 40GB+. Modern GPUs like the RTX 4060 (8GB), RTX 4070 Ti (12GB), RTX 4080 (16GB), and RTX 4090 (24GB) can run increasingly powerful models.
Speed depends on memory bandwidth, not just compute power. That's why the RTX 3090 (936 GB/s) still competes with the RTX 4090 (1,008 GB/s) for LLM inference. The new RTX 5090 with 1,792 GB/s GDDR7 bandwidth is the fastest consumer GPU for local AI.
Apple Silicon users benefit from unified memory — an M4 Max with 128GB can run 70B models that would require a $2,000+ GPU on PC. FitMyLLM supports all platforms: NVIDIA, AMD, Intel Arc, and Apple M1/M2/M3/M4 chips.
Side-by-side comparison of any models. Benchmark scores, VRAM usage, speed estimates, and radar charts. Compare Llama 4 vs Qwen 3.5, DeepSeek R1 vs Gemma 3, or any combination.
Every GPU ranked S-tier to F-tier for running local AI. Based on VRAM, bandwidth, and real model compatibility data — not opinions. Includes NVIDIA RTX, AMD Radeon, Intel Arc, and Apple Silicon.
Plan production LLM deployments with GPU sizing, P95 latency estimation, and cloud vs on-prem TCO analysis. Supports vLLM, TRT-LLM, and SGLang serving engines.