HuggingFace/Dense

HuggingFaceSmolLM3-3B

chat
3.1B
Parameters
64K
Context length
7
Benchmarks
6
Quantizations
1.1M
HF downloads
Architecture
Dense
Released
2025-07-01
Layers
36
KV Heads
4
Head Dim
128
Family
smollm

Quantization Options

QuantBitsVRAMQuality
Q4_K_M4.892.4 GBgood
Q5_K_S5.572.6 GBgood
Q5_K_M5.72.7 GBgood
Q6_K6.563.0 GBexcellent
Q8_08.53.8 GBlossless
FP16166.7 GBlossless

Select your GPU above to see speed estimates and compatibility for each quantization.

Benchmarks (7)

IFEval76.7
MATH46.1
GPQA35.7
HumanEval30.5
BBH10.9
MMLU-PRO10.7
MUSR2.8

Run this model

Easiest way to get starteddocs →
curl -fsSL https://ollama.com/install.sh | sh
$ollama run smollm:3b-q4_k_m

Downloads and runs automatically. Add --verbose for speed stats.

Setup guide

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Find the best GPU for SmolLM3-3B

Build Hardware for SmolLM3-3B

SmolLM3-3B3.1B Parameter Dense LLM

Model Specifications

Parameters
3.1B
Architecture
Dense Transformer
Context Length
64K tokens
Capabilities
chat
Release Date
2025-07-01
Provider
HuggingFace
Family
smollm

VRAM Requirements

QuantizationBPWVRAMQuality
Q4_K_M4.892.4 GB94%
Q5_K_S5.572.6 GB96%
Q5_K_M5.72.7 GB96%
Q6_K6.563.0 GB97%
Q8_08.53.8 GB100%
FP16166.7 GB100%

Benchmark Scores

HumanEval30.5
MMLU-PRO10.7
MATH46.1
IFEval76.7
BBH10.9
GPQA35.7
MUSR2.8

How to Run SmolLM3-3B

Run SmolLM3-3B locally with Ollama (needs 2.4 GB VRAM at Q4_K_M):

ollama run smollm:3b

Compatible GPUs (30)

GPUs that can run SmolLM3-3B at Q4_K_M quantization: