Mistral AI/Dense

Mistral-Small-3.1-24B

chatcodingreasoningvisiontool_useThinkingTool Use

24B

Parameters

128K

Context length

Benchmarks

Quantizations

Architecture

Dense

Released

2024-09-18

Layers

KV Heads

Head Dim

128

Family

mistral

Model Card

View on HuggingFace

Model Card for Mistral-Small-3.1-24B-Instruct-2503

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
This model is an instruction-finetuned version of: Mistral-Small-3.1-24B-Base-2503.

Mistral Small 3.1 can be deployed locally and is exceptionally "knowledge-dense," fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized.

It is ideal for:

Fast-response conversational agents.
Low-latency function calling.
Subject matter experts via fine-tuning.
Local inference for hobbyists and organizations handling sensitive data.
Programming and math reasoning.
Long document understanding.
Visual understanding.

For enterprises requiring specialized capabilities (increased context, specific modalities, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.

Learn more about Mistral Small 3.1 in our blog post.

Key Features

Vision: Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text.
Multilingual: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi.
Agent-Centric: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
Advanced Reasoning: State-of-the-art conversational and reasoning capabilities.
Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes.
Context Window: A 128k context window.
System Prompt: Maintains strong adherence and support for system prompts.
Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.

Benchmark Results

When available, we report numbers previously published by other model providers, otherwise we re-evaluate them using our own evaluation harness.

Pretrain Evals

Model	MMLU (5-shot)	MMLU Pro (5-shot CoT)	TriviaQA	GPQA Main (5-shot CoT)	MMMU
Small 3.1 24B Base	81.01%	56.03%	80.50%	37.50%	59.27%
Gemma 3 27B PT	78.60%	52.20%	81.30%	24.30%	56.10%

Instruction Evals

Text

Model	MMLU	MMLU Pro (5-shot CoT)	MATH	GPQA Main (5-shot CoT)	GPQA Diamond (5-shot CoT )	MBPP	HumanEval	SimpleQA (TotalAcc)
Small 3.1 24B Instruct	80.62%	66.76%	69.30%	44.42%	45.96%	74.71%	88.41%	10.43%
Gemma 3 27B IT	76.90%	67.50%	89.00%	36.83%	42.40%	74.40%	87.80%	10.00%
GPT4o Mini	82.00%	61.70%	70.20%	40.20%	39.39%	84.82%	87.20%	9.50%
Claude 3.5 Haiku	77.60%	65.00%	69.20%	37.05%	41.60%	85.60%	88.10%	8.02%
Cohere Aya-Vision 32B	72.14%	47.16%	41.98%	34.38%	33.84%	70.43%	62.20%	7.65%

Vision

Model	MMMU	MMMU PRO	Mathvista	ChartQA	DocVQA	AI2D	MM MT Bench
Small 3.1 24B Instruct	64.00%	49.25%	68.91%	86.24%	94.08%	93.72%	7.3
Gemma 3 27B IT	64.90%	48.38%	67.60%	76.00%	86.60%	84.50%	7
GPT4o Mini	59.40%	37.60%	56.70%	76.80%	86.70%	88.10%	6.6
Claude 3.5 Haiku	60.50%	45.03%	61.60%	87.20%	90.00%	92.10%	6.5
Cohere Aya-Vision 32B	48.20%	31.50%	50.10%	63.04%	72.40%	82.57%	4.1

Multilingual Evals

Model	Average	European	East Asian	Middle Eastern
Small 3.1 24B Instruct	71.18%	75.30%	69.17%	69.08%
Gemma 3 27B IT	70.19%	74.14%	65.65%	70.76%
GPT4o Mini	70.36%	74.21%	65.96%	70.90%
Claude 3.5 Haiku	70.16%	73.45%	67.05%	70.00%
Cohere Aya-Vision 32B	62.15%	64.70%	57.61%	64.12%

Long Context Evals

Model	LongBench v2	RULER 32K	RULER 128K
Small 3.1 24B Instruct	37.18%	93.96%	81.20%
Gemma 3 27B IT	34.59%	91.10%	66.00%
GPT4o Mini	29.30%	90.20%	65.8%
Claude 3.5 Haiku	35.19%	92.60%	91.90%

Basic Instruct Template (V7-Tekken)

<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]

<system_prompt>, <user message> and <assistant response> are placeholders.

Please make sure to use mistral-common as the source of truth

Usage

The model can be used with the following frameworks;

vllm (recommended): See here

Note 1: We recommend using a relatively low temperature, such as temperature=0.15.

Note 2: Make sure to add a system prompt to the model to best tailer it for your needs. If you want to use the model as a general assistant, we recommend the following system prompt:

system_prompt = """You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
You power an AI assistant called Le Chat.
Your knowledge base was last updated on 2023-10-01.
The current date is {today}.

...