nemotron/Mixture of Experts

nemotronNemotron Cascade 2 30B-A3B

chatreasoningcodingmathtool_use
32B
Parameters (3B active)
250K
Context length
5
Benchmarks
14
Quantizations
Architecture
MoE
Released
2026-03-19
Layers
52
KV Heads
2
Head Dim
128
Family
nemotron

Quantization Options

QuantBitsVRAMQuality
IQ3_XXS3.2513.5 GBlow
IQ3_XS3.514.5 GBlow
Q3_K_S3.6415.0 GBlow
IQ3_M3.7615.5 GBlow
Q3_K_M416.5 GBlow
Q3_K_L4.317.7 GBmoderate
IQ4_XS4.4618.3 GBmoderate
Q4_K_S4.6719.2 GBmoderate
Q4_K_M4.8920.0 GBgood
Q5_K_S5.5722.8 GBgood
Q5_K_M5.723.3 GBgood
Q6_K6.5626.7 GBexcellent
Q8_08.534.5 GBlossless
FP161664.5 GBlossless

Select your GPU above to see speed estimates and compatibility for each quantization.

Benchmarks (5)

AIME92.4
LiveCodeBench87.2
MMLU-PRO79.8
GPQA Diamond76.1
SWE-bench50.2

Run this model

Easiest way to get starteddocs →
curl -fsSL https://ollama.com/install.sh | sh
$ollama run nemotron-cascade-2:q4_k_m

Downloads and runs automatically. Add --verbose for speed stats.

Setup guide

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Find the best GPU for Nemotron Cascade 2 30B-A3B

Build Hardware for Nemotron Cascade 2 30B-A3B

Nemotron Cascade 2 30B-A3B32B Parameter Mixture of Experts LLM

Model Specifications

Parameters
32B (3B active)
Architecture
Mixture of Experts
Context Length
250K tokens
Capabilities
chat, reasoning, coding, math, tool_use
Release Date
2026-03-19
Family
nemotron

VRAM Requirements

QuantizationBPWVRAMQuality
IQ3_XXS3.2513.5 GB82%
IQ3_XS3.514.5 GB84%
Q3_K_S3.6415.0 GB85%
IQ3_M3.7615.5 GB86%
Q3_K_M416.5 GB88%
Q3_K_L4.317.7 GB90%
IQ4_XS4.4618.3 GB92%
Q4_K_S4.6719.2 GB93%
Q4_K_M4.8920.0 GB94%
Q5_K_S5.5722.8 GB96%
Q5_K_M5.723.3 GB96%
Q6_K6.5626.7 GB97%
Q8_08.534.5 GB100%
FP161664.5 GB100%

Benchmark Scores

MMLU-PRO79.8
LiveCodeBench87.2
SWE-bench50.2
AIME92.4
GPQA Diamond76.1

How to Run Nemotron Cascade 2 30B-A3B

Run Nemotron Cascade 2 30B-A3B locally with Ollama (needs 20.0 GB VRAM at Q4_K_M):

ollama run nemotron-cascade-2

Compatible GPUs (30)

GPUs that can run Nemotron Cascade 2 30B-A3B at Q4_K_M quantization: