▸ DEVICE UNDER TEST
NVIDIA L4 24GB — 24 GB VRAM.
▸ L4 24GB SPEC
- BRAND
- NVIDIA
- VRAM
- 24 GB GDDR6
- BANDWIDTH
- 300 GB/s
- FP16 COMPUTE
- 121 TFLOPS
- FP32 COMPUTE
- 30.3 TFLOPS
- CUDA CORES
- 7,680
- TENSOR CORES
- 240
- TDP
- 72 W
- ARCHITECTURE
- Ada Lovelace
- MSRP
- $2500
▸ AI CAPABILITY
249/ 331 models @ Q4
With 24 GB VRAM and 300 GB/s bandwidth, this GPU handles models up to 34.4B parameters.
Speed ≈ bandwidth / model_size × efficiency. A 7B model at Q4 runs at ~34 tok/s.
§ 01TOP MODELS FOR L4 24GB
249 FIT · SHOWING 20| MODEL | SIZE | VRAM Q4 | TOK/S | AVG |
|---|---|---|---|---|
| Nous Capybara 34B | 34.4B | 21.5 GB | 7 | 42.0 |
| Yi-1.5 34B | 34.4B | 21.5 GB | 7 | 45.3 |
| Falcon-H1 34B | 34B | 21.3 GB | 7 | 66.1 |
| CodeLlama 34B | 34B | 21.3 GB | 7 | 25.4 |
| Nous Hermes 2 34B | 34B | 21.3 GB | 7 | 47.0 |
| Phind CodeLlama 34B | 34B | 21.3 GB | 7 | 68.1 |
| LLaVA-1.6 Yi 34B | 34B | 21.3 GB | 7 | 47.4 |
| WizardCoder Python 34B | 34B | 21.3 GB | 7 | 73.2 |
| Yi 34B | 34B | 21.3 GB | 7 | 33.4 |
| DeepSeek Coder 33B | 33B | 20.7 GB | 7 | 26.0 |
| Vicuna 33B | 33B | 20.7 GB | 7 | 17.2 |
| LLaMA 1 30B | 33B | 20.7 GB | 7 | 17.8 |
| DeepSeek-R1-Distill-Qwen-32B | 32.8B | 20.5 GB | 7 | 46.9 |
| Qwen3 32B | 32.8B | 20.5 GB | 7 | 54.9 |
| Qwen2.5-32B | 32.5B | 20.4 GB | 7 | 54.3 |
| Qwen 2.5 Coder 32B | 32.5B | 20.4 GB | 7 | 48.0 |
| QwQ-32B | 32.5B | 20.4 GB | 7 | 45.1 |
| OLMo-2-0325-32B | 32.2B | 20.2 GB | 7 | 59.1 |
| Aya Expanse 32B | 32B | 20.0 GB | 8 | 35.9 |
| Cogito 32B | 32B | 20.0 GB | 8 | 39.4 |