Allen Institute/Dense

OLMo-2-0325-32B

chat

32.2B

Parameters

4K

Context length

15

Benchmarks

14

Quantizations

7K

HF downloads

Architecture

Dense

Released

2025-06-15

Layers

64

KV Heads

8

Head Dim

128

Family

olmo

Quantization Options

Quant	Bits	VRAM	Quality
IQ3_XXS	3.25	13.6 GB	low
IQ3_XS	3.5	14.6 GB	low
Q3_K_S	3.64	15.1 GB	low
IQ3_M	3.76	15.6 GB	low
Q3_K_M	4	16.6 GB	low
Q3_K_L	4.3	17.8 GB	moderate
IQ4_XS	4.46	18.4 GB	moderate
Q4_K_S	4.67	19.3 GB	moderate
Q4_K_M	4.89	20.2 GB	good
Q5_K_S	5.57	22.9 GB	good
Q5_K_M	5.7	23.4 GB	good
Q6_K	6.56	26.9 GB	excellent
Q8_0	8.5	34.7 GB	lossless
FP16	16	64.9 GB	lossless

Select your GPU above to see speed estimates and compatibility for each quantization.

Benchmarks (15)

Arena Elo1222

MATH93.4

IFEval88.8

HumanEval86.7

BBH84.0

AIME77.3

AA Math77.3

LiveCodeBench69.5

MBPP65.1

alpacaeval59.8

GPQA Diamond53.9

GPQA48.6

AA Intelligence12.2

AA Coding5.6

HLE4.9

Run this model

Easiest way to get starteddocs →

curl -fsSL https://ollama.com/install.sh | sh

$ollama run olmo:32b-q4_k_m

Downloads and runs automatically. Add --verbose for speed stats.

HuggingFace Ollama Library GGUF Downloads Build Hardware

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

NVIDIA RTX 4090

24 GB VRAM • 1008 GB/s

NVIDIA RTX 3090 Ti

24 GB VRAM • 1008 GB/s

NVIDIA RTX 3090

24 GB VRAM • 936 GB/s

AMD RX 7900 XTX

24 GB VRAM • 960 GB/s

Apple M4 Pro (24GB)

24 GB VRAM • 273 GB/s

24 GB VRAM • 300 GB/s

NVIDIA A10 24GB

24 GB VRAM • 600 GB/s

Apple M2 (24GB)

24 GB VRAM • 100 GB/s

Apple M3 (24GB)

24 GB VRAM • 100 GB/s

Apple M4 (24GB)

24 GB VRAM • 120 GB/s

NVIDIA Tesla M40 24 GB

24 GB VRAM • 288 GB/s

NVIDIA Tesla P10

24 GB VRAM • 694 GB/s

NVIDIA Tesla P40

24 GB VRAM • 347 GB/s

NVIDIA Quadro RTX 6000

24 GB VRAM • 672 GB/s

NVIDIA Quadro RTX 6000 Passive

24 GB VRAM • 624 GB/s

NVIDIA GeForce RTX 3090

24 GB VRAM • 936 GB/s

NVIDIA A10 PCIe

24 GB VRAM • 600 GB/s

24 GB VRAM • 600 GB/s

NVIDIA RTX A5000

24 GB VRAM • 768 GB/s

NVIDIA GeForce RTX 3090 Ti

24 GB VRAM • 1010 GB/s

NVIDIA GeForce RTX 4090

24 GB VRAM • 1010 GB/s

24 GB VRAM • 864 GB/s

24 GB VRAM • 864 GB/s

AMD Radeon RX 7900 XTX

24 GB VRAM • 960 GB/s

NVIDIA GeForce RTX 4090 D

24 GB VRAM • 1010 GB/s

NVIDIA GeForce RTX 5090 D V2

24 GB VRAM • 1340 GB/s

NVIDIA TITAN RTX

24 GB VRAM • 672 GB/s

NVIDIA A30 PCIe

24 GB VRAM • 933 GB/s

24 GB VRAM • 1220 GB/s

NVIDIA PG506-207

24 GB VRAM • 933 GB/s

Find the best GPU for OLMo-2-0325-32B

Build Hardware for OLMo-2-0325-32B

OLMo-2-0325-32B — 32.2B Parameter Dense LLM

Model Specifications

Parameters: 32.2B
Architecture: Dense Transformer
Context Length: 4K tokens
Capabilities: chat
Release Date: 2025-06-15
Provider: Allen Institute
Family: olmo

VRAM Requirements

Quantization	BPW	VRAM	Quality
IQ3_XXS	3.25	13.6 GB	82%
IQ3_XS	3.5	14.6 GB	84%
Q3_K_S	3.64	15.1 GB	85%
IQ3_M	3.76	15.6 GB	86%
Q3_K_M	4	16.6 GB	88%
Q3_K_L	4.3	17.8 GB	90%
IQ4_XS	4.46	18.4 GB	92%
Q4_K_S	4.67	19.3 GB	93%
Q4_K_M	4.89	20.2 GB	94%
Q5_K_S	5.57	22.9 GB	96%
Q5_K_M	5.7	23.4 GB	96%
Q6_K	6.56	26.9 GB	97%
Q8_0	8.5	34.7 GB	100%
FP16	16	64.9 GB	100%

Benchmark Scores

HumanEval86.7

MATH93.4

IFEval88.8

BBH84.0

GPQA48.6

MBPP65.1

alpacaeval59.8

Arena Elo1222.0

GPQA Diamond53.9

HLE4.9

AA Intelligence12.2

AA Coding5.6

LiveCodeBench69.5

AIME77.3

AA Math77.3

How to Run OLMo-2-0325-32B

Run OLMo-2-0325-32B locally with Ollama (needs 20.2 GB VRAM at Q4_K_M):

ollama run olmo:32b

Compatible GPUs (30)

GPUs that can run OLMo-2-0325-32B at Q4_K_M quantization:

NVIDIA RTX 4090(24GB, 1008 GB/s)NVIDIA RTX 3090 Ti(24GB, 1008 GB/s)NVIDIA RTX 3090(24GB, 936 GB/s)AMD RX 7900 XTX(24GB, 960 GB/s)Apple M4 Pro (24GB)(24GB, 273 GB/s)NVIDIA L4 24GB(24GB, 300 GB/s)NVIDIA A10 24GB(24GB, 600 GB/s)Apple M2 (24GB)(24GB, 100 GB/s)Apple M3 (24GB)(24GB, 100 GB/s)Apple M4 (24GB)(24GB, 120 GB/s)NVIDIA Tesla M40 24 GB(24GB, 288 GB/s)NVIDIA Tesla P10(24GB, 694 GB/s)NVIDIA Tesla P40(24GB, 347 GB/s)NVIDIA Quadro RTX 6000(24GB, 672 GB/s)NVIDIA Quadro RTX 6000 Passive(24GB, 624 GB/s)