Nemotron-H 8B
Model Card
View on HuggingFaceNemotron-H-8B-Reasoning-128K
Model Developer: NVIDIA
Model Dates:
October 2024 - March 2025
Data Freshness:
September 2024
The pretraining data has a cutoff date of September 2024.
Model Overview
NVIDIA Nemotron-H-8B-Reasoning-128K is a large language model (LLM) developed by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks.It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.
The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers. It is based on Nemotron-H-8B-Base-8K.
The supported languages include: English, German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, and Chinese.
We provide a BF16 checkpoint which can be used with HuggingFace-Transformers or TensorRT-LLM, and a FP8 checkpoint which can be used with TensorRT-LLM.
This model is for research and development only.
Quantizations & VRAM
Benchmarks (4)
Run with Ollama
ollama run nemotron-h:8bGPUs that can run this model
At Q4_K_M quantization. Sorted by minimum VRAM.
Find the best GPU for Nemotron-H 8B
Build Hardware for Nemotron-H 8B