NVIDIA GeForce RTX 2080 SUPER Max-Q — 8 GB VRAM.
- BRAND
- NVIDIA
- VRAM
- 8 GB GDDR6
- BANDWIDTH
- 352 GB/s
- FP16 COMPUTE
- 12 TFLOPS
- FP32 COMPUTE
- 6 TFLOPS
- CUDA CORES
- 3,072
- TENSOR CORES
- 384
- TDP
- 80 W
- ARCHITECTURE
- Turing
- MSRP
- $180
With 8 GB VRAM and 352 GB/s bandwidth, this GPU handles models up to 10.7B parameters.
Speed ≈ bandwidth / model_size × efficiency. A 7B model at Q4 runs at ~40 tok/s.
| MODEL | SIZE | VRAM Q4 | TOK/S | AVG |
|---|---|---|---|---|
| SOLAR-10.7B | 10.7B | 7.0 GB | 26 | 28.2 |
| Falcon3-10B | 10.3B | 6.8 GB | 27 | 38.2 |
| Qwen3.5-9B | 9.7B | 6.4 GB | 29 | 38.3 |
| glm-4-9b | 9.4B | 6.2 GB | 30 | 20.5 |
| gemma-2-9b | 9.2B | 6.1 GB | 31 | 30.2 |
| RecurrentGemma 9B | 9B | 6.0 GB | 31 | 35.0 |
| Qwen 3.5 9B | 9B | 6.0 GB | 31 | 50.6 |
| Yi 1.5 9B | 9B | 6.0 GB | 31 | 30.3 |
| Yi Coder 9B | 9B | 6.0 GB | 31 | 35.8 |
| NVIDIA-Nemotron-Nano-9B-v2 | 8.9B | 5.9 GB | 32 | 44.2 |
| CodeGemma 7B | 8.54B | 5.7 GB | 33 | 40.2 |
| Qwen2.5-VL-7B | 8.3B | 5.6 GB | 34 | 39.1 |
| Qwen2-VL 7B | 8.29B | 5.6 GB | 34 | 46.6 |
| Qwen3-8B | 8.2B | 5.5 GB | 34 | 43.3 |
| Granite 3.0 8B | 8.17B | 5.5 GB | 34 | 36.4 |
| Granite 3.1 8B | 8.17B | 5.5 GB | 34 | 38.6 |
| Aya Expanse 8B | 8B | 5.4 GB | 35 | 27.8 |
| Cogito 8B | 8B | 5.4 GB | 35 | 17.2 |
| DeepSeek R1 Distill Llama 8B | 8B | 5.4 GB | 35 | 36.2 |
| Gemma 3n E4B | 8B | 5.4 GB | 35 | 28.8 |