Nemotron 3 Nano 4B
Model Card
View on HuggingFaceNVIDIA-Nemotron-3-Nano-4B-BF16
Model Developer: NVIDIA Corporation
Model Dates:
Dec 2025 - Jan 2026
Data Freshness:
September 2024
The pretraining data has a cutoff date of September 2024.
Model Overview
NVIDIA-Nemotron-3-Nano-4B-BF16 is a small language model (SLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.
The model has been compressed from NVIDIA-Nemotron-Nano-9B-v2 using the Nemotron Elastic framework. The details of the parent model NVIDIA-Nemotron-Nano-9B-v2 can be found in (Nemotron-H tech report). The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers.
The supported languages include: English. Improved using Qwen.
This model is ready for commercial use.
Quantizations & VRAM
Benchmarks (6)
Run with Ollama
ollama run nemotron-3-nano:4bGPUs that can run this model
At Q4_K_M quantization. Sorted by minimum VRAM.
Find the best GPU for Nemotron 3 Nano 4B
Build Hardware for Nemotron 3 Nano 4B