Mistral AI/Dense

Mistral-Small-3.1-24B

chatcodingreasoningvisiontool_useThinkingTool Use
24B
Parameters
128K
Context length
15
Benchmarks
4
Quantizations
0
Architecture
Dense
Released
2024-09-18
Layers
56
KV Heads
8
Head Dim
128
Family
mistral

Model Card for Mistral-Small-3.1-24B-Instruct-2503

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
This model is an instruction-finetuned version of: Mistral-Small-3.1-24B-Base-2503.

Mistral Small 3.1 can be deployed locally and is exceptionally "knowledge-dense," fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized.

It is ideal for:

  • Fast-response conversational agents.
  • Low-latency function calling.
  • Subject matter experts via fine-tuning.
  • Local inference for hobbyists and organizations handling sensitive data.
  • Programming and math reasoning.
  • Long document understanding.
  • Visual understanding.

For enterprises requiring specialized capabilities (increased context, specific modalities, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.

Learn more about Mistral Small 3.1 in our blog post.

Key Features

  • Vision: Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text.
  • Multilingual: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi.
  • Agent-Centric: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
  • Advanced Reasoning: State-of-the-art conversational and reasoning capabilities.
  • Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes.
  • Context Window: A 128k context window.
  • System Prompt: Maintains strong adherence and support for system prompts.
  • Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.

Benchmark Results

When available, we report numbers previously published by other model providers, otherwise we re-evaluate them using our own evaluation harness.

Pretrain Evals

ModelMMLU (5-shot)MMLU Pro (5-shot CoT)TriviaQAGPQA Main (5-shot CoT)MMMU
Small 3.1 24B Base81.01%56.03%80.50%37.50%59.27%
Gemma 3 27B PT78.60%52.20%81.30%24.30%56.10%

Instruction Evals

Text

ModelMMLUMMLU Pro (5-shot CoT)MATHGPQA Main (5-shot CoT)GPQA Diamond (5-shot CoT )MBPPHumanEvalSimpleQA (TotalAcc)
Small 3.1 24B Instruct80.62%66.76%69.30%44.42%45.96%74.71%88.41%10.43%
Gemma 3 27B IT76.90%67.50%89.00%36.83%42.40%74.40%87.80%10.00%
GPT4o Mini82.00%61.70%70.20%40.20%39.39%84.82%87.20%9.50%
Claude 3.5 Haiku77.60%65.00%69.20%37.05%41.60%85.60%88.10%8.02%
Cohere Aya-Vision 32B72.14%47.16%41.98%34.38%33.84%70.43%62.20%7.65%

Vision

ModelMMMUMMMU PROMathvistaChartQADocVQAAI2DMM MT Bench
Small 3.1 24B Instruct64.00%49.25%68.91%86.24%94.08%93.72%7.3
Gemma 3 27B IT64.90%48.38%67.60%76.00%86.60%84.50%7
GPT4o Mini59.40%37.60%56.70%76.80%86.70%88.10%6.6
Claude 3.5 Haiku60.50%45.03%61.60%87.20%90.00%92.10%6.5
Cohere Aya-Vision 32B48.20%31.50%50.10%63.04%72.40%82.57%4.1

Multilingual Evals

ModelAverageEuropeanEast AsianMiddle Eastern
Small 3.1 24B Instruct71.18%75.30%69.17%69.08%
Gemma 3 27B IT70.19%74.14%65.65%70.76%
GPT4o Mini70.36%74.21%65.96%70.90%
Claude 3.5 Haiku70.16%73.45%67.05%70.00%
Cohere Aya-Vision 32B62.15%64.70%57.61%64.12%

Long Context Evals

ModelLongBench v2RULER 32KRULER 128K
Small 3.1 24B Instruct37.18%93.96%81.20%
Gemma 3 27B IT34.59%91.10%66.00%
GPT4o Mini29.30%90.20%65.8%
Claude 3.5 Haiku35.19%92.60%91.90%

Basic Instruct Template (V7-Tekken)

<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]

<system_prompt>, <user message> and <assistant response> are placeholders.

Please make sure to use mistral-common as the source of truth

Usage

The model can be used with the following frameworks;

Note 1: We recommend using a relatively low temperature, such as temperature=0.15.

Note 2: Make sure to add a system prompt to the model to best tailer it for your needs. If you want to use the model as a general assistant, we recommend the following system prompt:

system_prompt = """You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
You power an AI assistant called Le Chat.
Your knowledge base was last updated on 2023-10-01.
The current date is {today}.

...

Quantizations & VRAM

Q4_K_M4.5 bpw
14.0 GB
VRAM required
94%
Quality
Q6_K6.5 bpw
20.0 GB
VRAM required
97%
Quality
Q8_08 bpw
24.5 GB
VRAM required
100%
Quality
FP1616 bpw
48.5 GB
VRAM required
100%
Quality

Benchmarks (15)

Arena Elo1268
MATH-50071.5
IFEval62.8
GPQA Diamond46.2
BBH40.6
BigCodeBench36.1
MMLU-PRO34.4
LiveCodeBench25.2
MATH20.4
AA Intelligence12.7
GPQA11.1
MUSR10.2
AIME4.3
AA Math4.3
HLE4.1

Run with Ollama

$ollama run mistral-small:24b

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Find the best GPU for Mistral-Small-3.1-24B

Build Hardware for Mistral-Small-3.1-24B