NVIDIA/Dense

Llama-3.1-Nemotron-70B

chatreasoning
70.6B
Parameters
128K
Context length
9
Benchmarks
4
Quantizations
200K
HF downloads
Architecture
Dense
Released
2024-10-03
Layers
80
KV Heads
8
Head Dim
128
Family
nemotron

Model Overview

Description:

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.

This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo

As of 1 Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

As of Oct 24th, 2024 the model has Elo Score of 1267(+-7), rank 9 and style controlled rank of 26 on ChatBot Arena leaderboard.

This model was trained using RLHF (specifically, REINFORCE), Llama-3.1-Nemotron-70B-Reward and HelpSteer2-Preference prompts on a Llama-3.1-70B-Instruct model as the initial policy.

Llama-3.1-Nemotron-70B-Instruct-HF has been converted from Llama-3.1-Nemotron-70B-Instruct to support it in the HuggingFace Transformers codebase. Please note that evaluation results might be slightly different from the Llama-3.1-Nemotron-70B-Instruct as evaluated in NeMo-Aligner, which the evaluation results below are based on.

Try hosted inference for free at build.nvidia.com - it comes with an OpenAI-compatible API interface.

See details on our paper at https://arxiv.org/abs/2410.01257 - as a preview, this model can correctly the question How many r in strawberry? without specialized prompting or additional reasoning tokens:

A sweet question!
Let’s count the “R”s in “strawberry”:
1. S
2. T
3. R
4. A
5. W
6. B
7. E
8. R
9. R
10. Y
There are **3 “R”s** in the word “strawberry”.

Note: This model is a demonstration of our techniques for improving helpfulness in general-domain instruction following. It has not been tuned for performance in specialized domains such as math.

Quantizations & VRAM

Q4_K_M4.5 bpw
41.2 GB
VRAM required
94%
Quality
Q6_K6.5 bpw
58.8 GB
VRAM required
97%
Quality
Q8_08 bpw
72.1 GB
VRAM required
100%
Quality
FP1616 bpw
142.7 GB
VRAM required
100%
Quality

Benchmarks (9)

Arena Elo1284
IFEval85.0
BBH80.0
HumanEval77.0
MATH72.0
MMLU-PRO60.0
GPQA51.9
BigCodeBench38.7
MUSR13.2

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Apple M3 Max (48GB)
48 GB VRAM • 400 GB/s
APPLE
$2899
Apple M4 Pro (48GB)
48 GB VRAM • 273 GB/s
APPLE
$1799
Apple M4 Max (48GB)
48 GB VRAM • 546 GB/s
APPLE
$2499
NVIDIA L40S 48GB
48 GB VRAM • 864 GB/s
NVIDIA
$7500
NVIDIA L40 48GB
48 GB VRAM • 864 GB/s
NVIDIA
$5500
NVIDIA RTX 6000 Ada 48GB
48 GB VRAM • 960 GB/s
NVIDIA
$6800
NVIDIA A40 48GB
48 GB VRAM • 696 GB/s
NVIDIA
$4650
NVIDIA RTX A6000 48GB
48 GB VRAM • 768 GB/s
NVIDIA
$4650
NVIDIA Quadro RTX 8000
48 GB VRAM • 672 GB/s
NVIDIA
NVIDIA Quadro RTX 8000 Passive
48 GB VRAM • 624 GB/s
NVIDIA
NVIDIA A40 PCIe
48 GB VRAM • 696 GB/s
NVIDIA
NVIDIA RTX 6000 Ada Generation
48 GB VRAM • 960 GB/s
NVIDIA
NVIDIA L20
48 GB VRAM • 864 GB/s
NVIDIA
AMD Radeon PRO W7800 48 GB
48 GB VRAM • 864 GB/s
AMD
AMD Radeon PRO W7900
48 GB VRAM • 864 GB/s
AMD
Intel Data Center GPU Max 1100
48 GB VRAM • 1230 GB/s
INTEL
NVIDIA RTX 5880 Ada Generation
48 GB VRAM • 864 GB/s
NVIDIA
NVIDIA RTX PRO 5000 Blackwell
48 GB VRAM • 1340 GB/s
NVIDIA
AMD Radeon PRO W7900D
48 GB VRAM • 864 GB/s
AMD
Apple M1 Ultra (64GB)
64 GB VRAM • 800 GB/s
APPLE
$2499
Apple M2 Ultra (64GB)
64 GB VRAM • 800 GB/s
APPLE
$2999
Apple M4 Max (64GB)
64 GB VRAM • 546 GB/s
APPLE
$2899
Apple M2 Max (64GB)
64 GB VRAM • 400 GB/s
APPLE
$2299
Apple M3 Max (64GB)
64 GB VRAM • 300 GB/s
APPLE
$2799
Apple M4 Pro (64GB)
64 GB VRAM • 273 GB/s
APPLE
$2599
AMD Radeon Instinct MI200
64 GB VRAM • 1640 GB/s
AMD
AMD Radeon Instinct MI210
64 GB VRAM • 1640 GB/s
AMD
NVIDIA H100 SXM5 64 GB
64 GB VRAM • 2020 GB/s
NVIDIA
NVIDIA Jetson AGX Orin 64 GB
64 GB VRAM • 205 GB/s
NVIDIA
NVIDIA Jetson T4000
64 GB VRAM • 273 GB/s
NVIDIA

Find the best GPU for Llama-3.1-Nemotron-70B

Build Hardware for Llama-3.1-Nemotron-70B