Mistral AI/Dense

Ministral 8B

chatTool Use
8B
Parameters
128K
Context length
8
Benchmarks
4
Quantizations
100K
HF downloads
Architecture
Dense
Released
2024-10-16
Layers
36
KV Heads
8
Head Dim
128
Family
mistral

Model Card for Ministral-8B-Instruct-2410

We introduce two new state-of-the-art models for local intelligence, on-device computing, and at-the-edge use cases. We call them les Ministraux: Ministral 3B and Ministral 8B.

The Ministral-8B-Instruct-2410 Language Model is an instruct fine-tuned model significantly outperforming existing models of similar size, released under the Mistral Research License.

If you are interested in using Ministral-3B or Ministral-8B commercially, outperforming Mistral-7B, reach out to us.

For more details about les Ministraux please refer to our release blog post.

Ministral 8B Key features

  • Released under the Mistral Research License, reach out to us for a commercial license
  • Trained with a 128k context window with interleaved sliding-window attention
  • Trained on a large proportion of multilingual and code data
  • Supports function calling
  • Vocabulary size of 131k, using the V3-Tekken tokenizer

Basic Instruct Template (V3-Tekken)

<s>[INST]user message[/INST]assistant response</s>[INST]new user message[/INST]

For more information about the tokenizer please refer to mistral-common

Ministral 8B Architecture

FeatureValue
ArchitectureDense Transformer
Parameters8,019,808,256
Layers36
Heads32
Dim4096
KV Heads (GQA)8
Hidden Dim12288
Head Dim128
Vocab Size131,072
Context Length128k
Attention PatternRagged (128k,32k,32k,32k)

Benchmarks

Base Models

<u>Knowledge & Commonsense</u>

ModelMMLUAGIEvalWinograndeArc-cTriviaQA
Mistral 7B Base62.542.574.267.962.5
Llama 3.1 8B Base64.744.474.646.060.2
Ministral 8B Base<u>65.0</u><u>48.3</u><u>75.3</u><u>71.9</u><u>65.5</u>
Gemma 2 2B Base52.433.868.742.647.8
Llama 3.2 3B Base56.237.459.643.150.7
Ministral 3B Base<u>60.9</u><u>42.1</u><u>72.7</u><u>64.2</u><u>56.7</u>

<u>Code & Math</u>

ModelHumanEval pass@1GSM8K maj@8
Mistral 7B Base26.832.0
Llama 3.1 8B Base<u>37.8</u>42.2
Ministral 8B Base34.8<u>64.5</u>
Gemma 2 2B20.135.5
Llama 3.2 3B14.633.5
Ministral 3B<u>34.2</u><u>50.9</u>

<u>Multilingual</u>

ModelFrench MMLUGerman MMLUSpanish MMLU
Mistral 7B Base50.649.651.4
Llama 3.1 8B Base50.852.854.6
Ministral 8B Base<u>57.5</u><u>57.4</u><u>59.6</u>
Gemma 2 2B Base41.040.141.7
Llama 3.2 3B Base42.342.243.1
Ministral 3B Base<u>49.1</u><u>48.3</u><u>49.5</u>

Instruct Models

<u>Chat/Arena (gpt-4o judge)</u>

ModelMTBenchArena HardWild bench
Mistral 7B Instruct v0.36.744.333.1
Llama 3.1 8B Instruct7.562.437.0
Gemma 2 9B Instruct7.668.7<u>43.8</u>
Ministral 8B Instruct<u>8.3</u><u>70.9</u>41.3
Gemma 2 2B Instruct7.551.732.5
Llama 3.2 3B Instruct7.246.027.2
Ministral 3B Instruct<u>8.1</u><u>64.3</u><u>36.3</u>

<u>Code & Math</u>

ModelMBPP pass@1HumanEval pass@1Math maj@1
Mistral 7B Instruct v0.350.238.413.2
Gemma 2 9B Instruct68.567.747.4
Llama 3.1 8B Instruct69.767.149.3
Ministral 8B Instruct<u>70.0</u><u>76.8</u><u>54.5</u>
Gemma 2 2B Instruct54.542.722.8
Llama 3.2 3B Instruct64.661.038.4
Ministral 3B Instruct<u>67.7</u><u>77.4</u><u>51.7</u>

<u>Function calling</u>

ModelInternal bench
Mistral 7B Instruct v0.36.9
Llama 3.1 8B InstructN/A
Gemma 2 9B InstructN/A
Ministral 8B Instruct<u>31.6</u>
Gemma 2 2B InstructN/A
Llama 3.2 3B InstructN/A
Ministral 3B Instruct<u>28.4</u>

Usage Examples

vLLM (recommended)

We recommend using this model with the vLLM library to implement production-ready inference pipelines.

[!IMPORTANT] Currently vLLM is capped at 32k context size because interleaved attention kernels for paged attention are not yet implemented in vLLM. Attention kernels for paged attention are being worked on and as soon as it is fully supported in vLLM, this model card will be updated. To take advantage of the full 128k context size we recommend Mistral Inference

Installation

Make sure you install vLLM >= v0.6.4:

pip install --upgrade vllm

Also make sure you have mistral_common >= 1.4.4 installed:

pip install --upgrade mistral_common

You can also make use of a ready-to-go docker image.

Offline

from vllm import LLM
from vllm.sampling_params import SamplingParams

model_name = "mistralai/Ministral-8B-Instruct-2410"

sampling_params = SamplingParams(max_tokens=8192)

# note that running Ministral 8B on a single GPU requires 24 GB of GPU RAM
# If you want to divide the GPU requirement over multiple devices, please add *e.g.* `tensor_parallel=2`
llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")

prompt = "Do we need to think for 10 seconds to find the answer of 1 + 1?"

messages = [
    {
        "role": "user",
        "content": prompt
    },
]

outputs = llm.chat(messages, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)
# You don't need to think for 10 seconds to find the answer to 1 + 1. The answer is 2,
# and you can easily add these two numbers in your mind very quickly without any delay.

Server

You can also use Ministral-8B in a server/client setting.

  1. Spin up a server:
vllm serve mistralai/Ministral-8B-Instruct-2410 --tokenizer_mode mistral --config_format mistral --load_format mistral

...

Quantizations & VRAM

Q4_K_M4.5 bpw
5.0 GB
VRAM required
94%
Quality
Q6_K6.5 bpw
7.0 GB
VRAM required
97%
Quality
Q8_08 bpw
8.5 GB
VRAM required
100%
Quality
FP1616 bpw
16.5 GB
VRAM required
100%
Quality

Benchmarks (8)

IFEval68.0
HumanEval62.0
BBH60.0
MMLU-PRO42.0
BigCodeBench19.5
GPQA10.4
MATH6.9
MUSR5.6

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

NVIDIA Tesla K20c
5 GB VRAM • 208 GB/s
NVIDIA
NVIDIA Tesla K20m
5 GB VRAM • 208 GB/s
NVIDIA
NVIDIA Tesla K20s
5 GB VRAM • 208 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 5 GB
5 GB VRAM • 160 GB/s
NVIDIA
NVIDIA P102-100
5 GB VRAM • 440 GB/s
NVIDIA
NVIDIA RTX 3050 6GB
6 GB VRAM • 168 GB/s
NVIDIA
$169
Intel Arc A380
6 GB VRAM • 186 GB/s
INTEL
$129
NVIDIA RTX 2060 6GB
6 GB VRAM • 336 GB/s
NVIDIA
$150
NVIDIA GTX 1660 SUPER
6 GB VRAM • 336 GB/s
NVIDIA
$150
NVIDIA GTX 1660 Ti
6 GB VRAM • 288 GB/s
NVIDIA
$140
NVIDIA GTX 1060 6GB
6 GB VRAM • 192 GB/s
NVIDIA
$80
NVIDIA Tesla C2070
6 GB VRAM • 143 GB/s
NVIDIA
NVIDIA Tesla C2075
6 GB VRAM • 150 GB/s
NVIDIA
NVIDIA Tesla C2090
6 GB VRAM • 177 GB/s
NVIDIA
NVIDIA Tesla M2070
6 GB VRAM • 150 GB/s
NVIDIA
NVIDIA Tesla M2070-Q
6 GB VRAM • 150 GB/s
NVIDIA
NVIDIA Tesla M2075
6 GB VRAM • 150 GB/s
NVIDIA
NVIDIA Tesla M2090
6 GB VRAM • 177 GB/s
NVIDIA
NVIDIA Tesla X2070
6 GB VRAM • 177 GB/s
NVIDIA
NVIDIA Tesla X2090
6 GB VRAM • 177 GB/s
NVIDIA
NVIDIA Tesla K20X
6 GB VRAM • 250 GB/s
NVIDIA
NVIDIA Tesla K20Xm
6 GB VRAM • 250 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 6 GB
6 GB VRAM • 192 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 6 GB 9Gbps
6 GB VRAM • 217 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 6 GB GDDR5X
6 GB VRAM • 192 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 6 GB GP104
6 GB VRAM • 192 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 6 GB Rev. 2
6 GB VRAM • 192 GB/s
NVIDIA
NVIDIA GeForce GTX 1660
6 GB VRAM • 192 GB/s
NVIDIA
NVIDIA GeForce GTX 1660 SUPER
6 GB VRAM • 336 GB/s
NVIDIA
NVIDIA GeForce GTX 1660 Ti
6 GB VRAM • 288 GB/s
NVIDIA

Find the best GPU for Ministral 8B

Build Hardware for Ministral 8B