NVIDIA/Dense

Nemotron 3 Nano 4B

chatcodingreasoningmathtool_useThinkingTool Use
3.97B
Parameters
256K
Context length
6
Benchmarks
6
Quantizations
100K
HF downloads
Architecture
Dense
Released
2026-03-16
Layers
42
KV Heads
8
Head Dim
128
Family
nemotron

NVIDIA-Nemotron-3-Nano-4B-BF16

Model Developer: NVIDIA Corporation

Model Dates:

Dec 2025 - Jan 2026

Data Freshness:

September 2024

The pretraining data has a cutoff date of September 2024.

Model Overview

NVIDIA-Nemotron-3-Nano-4B-BF16 is a small language model (SLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.

The model has been compressed from NVIDIA-Nemotron-Nano-9B-v2 using the Nemotron Elastic framework. The details of the parent model NVIDIA-Nemotron-Nano-9B-v2 can be found in (Nemotron-H tech report). The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers.

The supported languages include: English. Improved using Qwen.

This model is ready for commercial use.

Quantizations & VRAM

Q3_K_M3.5 bpw
1.7 GB
VRAM required
90%
Quality
Q4_K_M4.5 bpw
2.2 GB
VRAM required
94%
Quality
Q5_K_M5.5 bpw
2.7 GB
VRAM required
96%
Quality
Q6_K6.5 bpw
3.2 GB
VRAM required
97%
Quality
Q8_08 bpw
4.0 GB
VRAM required
100%
Quality
FP1616 bpw
7.9 GB
VRAM required
100%
Quality

Benchmarks (6)

MATH95.4
IFEval88.0
GPQA53.2
MMLU-PRO18.1
BBH14.2
MUSR4.6

Run with Ollama

$ollama run nemotron-3-nano:4b

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Find the best GPU for Nemotron 3 Nano 4B

Build Hardware for Nemotron 3 Nano 4B