NVIDIA/Dense

Nemotron-H 47B

chatcodingreasoningmathThinkingDistilled
47B
Parameters
125K
Context length
5
Benchmarks
4
Quantizations
30K
HF downloads
Architecture
Dense
Released
2025-06-06
Layers
98
KV Heads
8
Head Dim
128
Family
nemotron

Nemotron-H-47B-Reasoning-128K

Model Developer: NVIDIA

Model Dates:

October 2024 - March 2025

Data Freshness:

September 2024

The pretraining data has a cutoff date of September 2024.

Model Overview

NVIDIA Nemotron-H-47B-Reasoning-128K is a large language model (LLM) developed by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks.It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.

The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just five Attention layers. It is based on Nemotron-H-47B-Base-8K, which is a pruned and distilled from Nemotron-H-56B-Base-8K.

The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.

We provide a BF16 checkpoint which can be used with HuggingFace-Transformers or TensorRT-LLM, and a FB8 checkpoint which can be used with TensorRT-LLM.

This model is for research and development only.

Quantizations & VRAM

Q4_K_M4.5 bpw
26.9 GB
VRAM required
94%
Quality
Q6_K6.5 bpw
38.7 GB
VRAM required
97%
Quality
Q8_08 bpw
47.5 GB
VRAM required
100%
Quality
FP1616 bpw
94.5 GB
VRAM required
100%
Quality

Benchmarks (5)

Arena Elo1312
MATH96.2
MBPP91.8
IFEval84.5
GPQA65.7

Run with Ollama

$ollama run nemotron-h:47b

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

AMD Radeon PRO V710
28 GB VRAM • 504 GB/s
AMD
NVIDIA RTX 5090
32 GB VRAM • 1792 GB/s
NVIDIA
$1999
Apple M1 Max (32GB)
32 GB VRAM • 400 GB/s
APPLE
$1499
Apple M2 Max (32GB)
32 GB VRAM • 400 GB/s
APPLE
$1799
NVIDIA V100 SXM2 32GB
32 GB VRAM • 900 GB/s
NVIDIA
$3500
Apple M2 Pro (32GB)
32 GB VRAM • 200 GB/s
APPLE
$1499
NVIDIA Tesla V100 DGXS 32 GB
32 GB VRAM • 897 GB/s
NVIDIA
NVIDIA Tesla V100 PCIe 32 GB
32 GB VRAM • 897 GB/s
NVIDIA
NVIDIA Tesla V100 SXM2 32 GB
32 GB VRAM • 898 GB/s
NVIDIA
NVIDIA Tesla V100 SXM3 32 GB
32 GB VRAM • 981 GB/s
NVIDIA
AMD Radeon Instinct MI60
32 GB VRAM • 1020 GB/s
AMD
NVIDIA Tesla V100S PCIe 32 GB
32 GB VRAM • 1130 GB/s
NVIDIA
AMD Radeon Instinct MI100
32 GB VRAM • 1230 GB/s
AMD
NVIDIA RTX 5000 Ada Generation
32 GB VRAM • 576 GB/s
NVIDIA
NVIDIA GeForce RTX 5090
32 GB VRAM • 1790 GB/s
NVIDIA
$1999
NVIDIA GeForce RTX 5090 D
32 GB VRAM • 1790 GB/s
NVIDIA
$1999
NVIDIA Jetson AGX Xavier 32 GB
32 GB VRAM • 136 GB/s
NVIDIA
NVIDIA Quadro GV100
32 GB VRAM • 868 GB/s
NVIDIA
NVIDIA TITAN V CEO Edition
32 GB VRAM • 868 GB/s
NVIDIA
NVIDIA Tesla PG500-216
32 GB VRAM • 1130 GB/s
NVIDIA
NVIDIA Tesla PG503-216
32 GB VRAM • 1130 GB/s
NVIDIA
AMD Radeon Pro Vega II
32 GB VRAM • 825 GB/s
AMD
AMD Radeon Pro Vega II Duo
32 GB VRAM • 1020 GB/s
AMD
AMD Radeon PRO V620
32 GB VRAM • 512 GB/s
AMD
AMD Radeon PRO W6800
32 GB VRAM • 512 GB/s
AMD
AMD Radeon Pro W6800X
32 GB VRAM • 512 GB/s
AMD
AMD Radeon Pro W6800X Duo
32 GB VRAM • 512 GB/s
AMD
AMD Radeon Pro W6900X
32 GB VRAM • 512 GB/s
AMD
NVIDIA Jetson AGX Orin 32 GB
32 GB VRAM • 205 GB/s
NVIDIA
AMD Radeon PRO W7800
32 GB VRAM • 576 GB/s
AMD

Find the best GPU for Nemotron-H 47B

Build Hardware for Nemotron-H 47B