olmo/Dense

olmoOLMo 3 32B

chatreasoningcodingmath
32B
Parameters
63K
Context length
3
Benchmarks
14
Quantizations
Architecture
Dense
Released
2025-12-01
Layers
64
KV Heads
8
Head Dim
128
Family
olmo

Quantization Options

QuantBitsVRAMQuality
IQ3_XXS3.2513.5 GBlow
IQ3_XS3.514.5 GBlow
Q3_K_S3.6415.0 GBlow
IQ3_M3.7615.5 GBlow
Q3_K_M416.5 GBlow
Q3_K_L4.317.7 GBmoderate
IQ4_XS4.4618.3 GBmoderate
Q4_K_S4.6719.2 GBmoderate
Q4_K_M4.8920.0 GBgood
Q5_K_S5.5722.8 GBgood
Q5_K_M5.723.3 GBgood
Q6_K6.5626.7 GBexcellent
Q8_08.534.5 GBlossless
FP161664.5 GBlossless

Select your GPU above to see speed estimates and compatibility for each quantization.

Benchmarks (3)

MATH96.2
IFEval93.8
HumanEval91.5

Run this model

Easiest way to get starteddocs →
curl -fsSL https://ollama.com/install.sh | sh
$ollama run olmo-3:32b:q4_k_m

Downloads and runs automatically. Add --verbose for speed stats.

Setup guide

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Find the best GPU for OLMo 3 32B

Build Hardware for OLMo 3 32B

OLMo 3 32B32B Parameter Dense LLM

Model Specifications

Parameters
32B
Architecture
Dense Transformer
Context Length
63K tokens
Capabilities
chat, reasoning, coding, math
Release Date
2025-12-01
Family
olmo

VRAM Requirements

QuantizationBPWVRAMQuality
IQ3_XXS3.2513.5 GB82%
IQ3_XS3.514.5 GB84%
Q3_K_S3.6415.0 GB85%
IQ3_M3.7615.5 GB86%
Q3_K_M416.5 GB88%
Q3_K_L4.317.7 GB90%
IQ4_XS4.4618.3 GB92%
Q4_K_S4.6719.2 GB93%
Q4_K_M4.8920.0 GB94%
Q5_K_S5.5722.8 GB96%
Q5_K_M5.723.3 GB96%
Q6_K6.5626.7 GB97%
Q8_08.534.5 GB100%
FP161664.5 GB100%

Benchmark Scores

HumanEval91.5
MATH96.2
IFEval93.8

How to Run OLMo 3 32B

Run OLMo 3 32B locally with Ollama (needs 20.0 GB VRAM at Q4_K_M):

ollama run olmo-3:32b

Compatible GPUs (30)

GPUs that can run OLMo 3 32B at Q4_K_M quantization: