▸ DEVICE UNDER TEST
NVIDIA Quadro RTX 8000 — 48 GB VRAM.
▸ QUADRO RTX 8000 SPEC
- BRAND
- NVIDIA
- VRAM
- 48 GB GDDR6
- BANDWIDTH
- 672 GB/s
- FP16 COMPUTE
- 32.6 TFLOPS
- FP32 COMPUTE
- 16.3 TFLOPS
- CUDA CORES
- 4,608
- TENSOR CORES
- 576
- TDP
- 260 W
- ARCHITECTURE
- Turing
▸ AI CAPABILITY
263/ 331 models @ Q4
With 48 GB VRAM and 672 GB/s bandwidth, this GPU handles models up to 66B parameters.
Speed ≈ bandwidth / model_size × efficiency. A 7B model at Q4 runs at ~77 tok/s.
§ 01TOP MODELS FOR QUADRO RTX 8000
263 FIT · SHOWING 20| MODEL | SIZE | VRAM Q4 | TOK/S | AVG |
|---|---|---|---|---|
| OPT 66B | 66B | 40.8 GB | 8 | — |
| LLaMA 1 65B | 65.2B | 40.3 GB | 8 | 42.6 |
| Jamba 1.5 Mini 52B | 51.6B | 32.0 GB | 45 | 24.2 |
| Kimi-Linear-48B-A3B | 48B | 29.8 GB | 179 | 26.6 |
| Nemotron-H 47B | 47B | 29.2 GB | 11 | 84.6 |
| Mixtral-8x7B | 46.7B | 29.0 GB | 41 | 18.8 |
| Nous-Hermes-2-Mixtral-8x7B-DPO | 46.7B | 29.0 GB | 41 | 27.4 |
| Dolphin 2.6 Mixtral 8x7B | 46.7B | 29.0 GB | 41 | 23.8 |
| Phi-3.5 MoE 42B | 41.9B | 26.1 GB | 81 | 56.7 |
| Falcon 40B | 40B | 24.9 GB | 13 | 20.9 |
| Qwen3.5-35B-A3B | 36B | 22.5 GB | 15 | 48.5 |
| c4ai-command-r-v01 35B | 35B | 21.9 GB | 15 | 27.5 |
| Qwen 3.5 35B A3B | 35B | 21.9 GB | 179 | 53.3 |
| Qwen 3.6 35B A3B | 35B | 21.9 GB | 179 | 62.7 |
| Nous Capybara 34B | 34.4B | 21.5 GB | 16 | 42.0 |
| Yi-1.5 34B | 34.4B | 21.5 GB | 16 | 45.3 |
| Falcon-H1 34B | 34B | 21.3 GB | 16 | 66.1 |
| CodeLlama 34B | 34B | 21.3 GB | 16 | 25.4 |
| Nous Hermes 2 34B | 34B | 21.3 GB | 16 | 47.0 |
| Phind CodeLlama 34B | 34B | 21.3 GB | 16 | 68.1 |