What can your GPU run?

Detecting...

Scanning hardware...

#ModelSizeQuant / VRAMVRAM%ContextSpeed

FitMyLLM | 8GB VRAM | 272 GB/s • fitmyllm.com/tier

SRuns great60 models

DeepSeek Coder 6.7B

6.7BQ4 4.6GB57%16K30 tok/s 2

OPT 6.7B

6.7BQ4 4.6GB57%2K30 tok/s 3

Yi-1.5 6B

6.06BQ4 4.2GB52%4K33 tok/s 4

Gemma 3n E2B

6BQ4 4.2GB52%32K33 tok/s 5

Yi 6B

6BQ4 4.2GB52%4K33 tok/s 6

Gemma 4 E2B

5.1BQ4 3.6GB45%128K39 tok/s 7

Qwen3.5-4B

4.7BQ4 3.4GB42%256K43 tok/s 8

Gemma 3 4B

4.3BQ4 3.1GB39%128K47 tok/s

ARuns well70 models

Qwen3.5-9B

9.7BQ4 6.4GB80%256K21 tok/s 2

glm-4-9b

9.4BQ4 6.2GB78%128K21 tok/s 3

gemma-2-9b

9.2BQ4 6.1GB76%4K22 tok/s 4

RecurrentGemma 9B

9BQ4 6.0GB75%8K22 tok/s 5

Qwen 3.5 9B

9BQ4 6.0GB75%256K22 tok/s 6

Yi 1.5 9B

9BQ4 6.0GB75%4K22 tok/s 7

Yi Coder 9B

9BQ4 6.0GB75%125K22 tok/s 8

NVIDIA-Nemotron-Nano-9B-v2

8.9BQ4 5.9GB74%128K22 tok/s

BDecent4 models

Falcon2 11B

11BQ4 7.2GB90%8K18 tok/s 2

Llama-3.2-11B-Vision-Instruct

11BQ4 7.2GB90%128K18 tok/s 3

SOLAR-10.7B

10.7BQ4 7.0GB88%4K19 tok/s 4

Falcon3-10B

10.3BQ4 6.8GB85%32K19 tok/s

CTight fit6 models

Mistral-Nemo 12.2B

12.2BQ4 (tight) 7.9GB99%128K16 tok/s 2

Dolly v2 12B

12BQ4 (tight) 7.8GB98%2K17 tok/s 3

gemma-3-12b

12BQ4 (tight) 7.8GB98%128K17 tok/s 4

TranslateGemma 12B

12BQ4 (tight) 7.8GB98%128K17 tok/s 5

Pixtral 12B

12BQ4 (tight) 7.8GB98%128K17 tok/s 6

StableLM 2 12B

12BQ4 (tight) 7.8GB98%4K17 tok/s

DBarely runs35 models

Mistral-Small-24B

24BQ4 (offload) 15.2GB189%32K8 tok/s 2

Mistral-Small-3.1-24B

24BQ4 (offload) 15.2GB189%128K8 tok/s 3

Magistral Small 24B

24BQ4 (offload) 15.2GB189%128K8 tok/s 4

Devstral Small 2 24B

24BQ4 (offload) 15.2GB189%384K8 tok/s 5

Codestral 22B

22.2BQ4 (offload) 14.1GB176%32K9 tok/s 6

Devstral Small 22B

22.2BQ4 (offload) 14.1GB176%128K9 tok/s 7

Mistral Small 22B

22.2BQ4 (offload) 14.1GB176%32K9 tok/s 8

SOLAR-Pro 22B

22.1BQ4 (offload) 14.0GB175%4K9 tok/s

Q4_K_M quantizationS Runs greatA Runs wellB DecentC Tight fitD Barely runsF Too heavy

create your own at fitmyllm.com/tier • Based on Q4_K_M quantization. Your actual results may vary.

Find Models for Your GPU Browse All Models