Llama-3.1-8B — 8B Parameter Dense LLM
Model Specifications
- Parameters
- 8B
- Architecture
- Dense Transformer
- Context Length
- 4K tokens
- Capabilities
- chat
- Release Date
- 2024-07-23
- Provider
- Meta
- Family
- llama
VRAM Requirements
| Quantization | BPW | VRAM | Quality |
|---|---|---|---|
| Q4_K_M | 4.89 | 5.4 GB | 94% |
| Q5_K_S | 5.57 | 6.1 GB | 96% |
| Q5_K_M | 5.7 | 6.2 GB | 96% |
| Q6_K | 6.56 | 7.0 GB | 97% |
| Q8_0 | 8.5 | 9.0 GB | 100% |
| FP16 | 16 | 16.5 GB | 100% |
Benchmark Scores
HumanEval62.8
MMLU-PRO31.1
MATH15.6
IFEval49.2
BBH29.4
GPQA8.7
MUSR8.6
MBPP55.6
BigCodeBench32.8
Arena Elo1191.0
GPQA Diamond27.0
LiveCodeBench8.5
MATH-50021.8
HLE4.3
AA Intelligence7.6
How to Run Llama-3.1-8B
Run Llama-3.1-8B locally with Ollama (needs 5.4 GB VRAM at Q4_K_M):
ollama run llama3.1:8bCompatible GPUs (30)
GPUs that can run Llama-3.1-8B at Q4_K_M quantization:
NVIDIA RTX 3050 6GB(6GB, 168 GB/s)Intel Arc A380(6GB, 186 GB/s)NVIDIA RTX 2060 6GB(6GB, 336 GB/s)NVIDIA GTX 1660 SUPER(6GB, 336 GB/s)NVIDIA GTX 1660 Ti(6GB, 288 GB/s)NVIDIA GTX 1060 6GB(6GB, 192 GB/s)NVIDIA Tesla C2070(6GB, 143 GB/s)NVIDIA Tesla C2075(6GB, 150 GB/s)NVIDIA Tesla C2090(6GB, 177 GB/s)NVIDIA Tesla M2070(6GB, 150 GB/s)NVIDIA Tesla M2070-Q(6GB, 150 GB/s)NVIDIA Tesla M2075(6GB, 150 GB/s)NVIDIA Tesla M2090(6GB, 177 GB/s)NVIDIA Tesla X2070(6GB, 177 GB/s)NVIDIA Tesla X2090(6GB, 177 GB/s)