Mistral-Small-3.1-24B
Model Card
View on HuggingFaceModel Card for Mistral-Small-3.1-24B-Instruct-2503
Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance.
With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
This model is an instruction-finetuned version of: Mistral-Small-3.1-24B-Base-2503.
Mistral Small 3.1 can be deployed locally and is exceptionally "knowledge-dense," fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized.
It is ideal for:
- Fast-response conversational agents.
- Low-latency function calling.
- Subject matter experts via fine-tuning.
- Local inference for hobbyists and organizations handling sensitive data.
- Programming and math reasoning.
- Long document understanding.
- Visual understanding.
For enterprises requiring specialized capabilities (increased context, specific modalities, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
Learn more about Mistral Small 3.1 in our blog post.
Key Features
- Vision: Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text.
- Multilingual: Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi.
- Agent-Centric: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
- Advanced Reasoning: State-of-the-art conversational and reasoning capabilities.
- Apache 2.0 License: Open license allowing usage and modification for both commercial and non-commercial purposes.
- Context Window: A 128k context window.
- System Prompt: Maintains strong adherence and support for system prompts.
- Tokenizer: Utilizes a Tekken tokenizer with a 131k vocabulary size.
Benchmark Results
When available, we report numbers previously published by other model providers, otherwise we re-evaluate them using our own evaluation harness.
Pretrain Evals
| Model | MMLU (5-shot) | MMLU Pro (5-shot CoT) | TriviaQA | GPQA Main (5-shot CoT) | MMMU |
|---|---|---|---|---|---|
| Small 3.1 24B Base | 81.01% | 56.03% | 80.50% | 37.50% | 59.27% |
| Gemma 3 27B PT | 78.60% | 52.20% | 81.30% | 24.30% | 56.10% |
Instruction Evals
Text
| Model | MMLU | MMLU Pro (5-shot CoT) | MATH | GPQA Main (5-shot CoT) | GPQA Diamond (5-shot CoT ) | MBPP | HumanEval | SimpleQA (TotalAcc) |
|---|---|---|---|---|---|---|---|---|
| Small 3.1 24B Instruct | 80.62% | 66.76% | 69.30% | 44.42% | 45.96% | 74.71% | 88.41% | 10.43% |
| Gemma 3 27B IT | 76.90% | 67.50% | 89.00% | 36.83% | 42.40% | 74.40% | 87.80% | 10.00% |
| GPT4o Mini | 82.00% | 61.70% | 70.20% | 40.20% | 39.39% | 84.82% | 87.20% | 9.50% |
| Claude 3.5 Haiku | 77.60% | 65.00% | 69.20% | 37.05% | 41.60% | 85.60% | 88.10% | 8.02% |
| Cohere Aya-Vision 32B | 72.14% | 47.16% | 41.98% | 34.38% | 33.84% | 70.43% | 62.20% | 7.65% |
Vision
| Model | MMMU | MMMU PRO | Mathvista | ChartQA | DocVQA | AI2D | MM MT Bench |
|---|---|---|---|---|---|---|---|
| Small 3.1 24B Instruct | 64.00% | 49.25% | 68.91% | 86.24% | 94.08% | 93.72% | 7.3 |
| Gemma 3 27B IT | 64.90% | 48.38% | 67.60% | 76.00% | 86.60% | 84.50% | 7 |
| GPT4o Mini | 59.40% | 37.60% | 56.70% | 76.80% | 86.70% | 88.10% | 6.6 |
| Claude 3.5 Haiku | 60.50% | 45.03% | 61.60% | 87.20% | 90.00% | 92.10% | 6.5 |
| Cohere Aya-Vision 32B | 48.20% | 31.50% | 50.10% | 63.04% | 72.40% | 82.57% | 4.1 |
Multilingual Evals
| Model | Average | European | East Asian | Middle Eastern |
|---|---|---|---|---|
| Small 3.1 24B Instruct | 71.18% | 75.30% | 69.17% | 69.08% |
| Gemma 3 27B IT | 70.19% | 74.14% | 65.65% | 70.76% |
| GPT4o Mini | 70.36% | 74.21% | 65.96% | 70.90% |
| Claude 3.5 Haiku | 70.16% | 73.45% | 67.05% | 70.00% |
| Cohere Aya-Vision 32B | 62.15% | 64.70% | 57.61% | 64.12% |
Long Context Evals
| Model | LongBench v2 | RULER 32K | RULER 128K |
|---|---|---|---|
| Small 3.1 24B Instruct | 37.18% | 93.96% | 81.20% |
| Gemma 3 27B IT | 34.59% | 91.10% | 66.00% |
| GPT4o Mini | 29.30% | 90.20% | 65.8% |
| Claude 3.5 Haiku | 35.19% | 92.60% | 91.90% |
Basic Instruct Template (V7-Tekken)
<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]
<system_prompt>, <user message> and <assistant response> are placeholders.
Please make sure to use mistral-common as the source of truth
Usage
The model can be used with the following frameworks;
vllm (recommended): See here
Note 1: We recommend using a relatively low temperature, such as temperature=0.15.
Note 2: Make sure to add a system prompt to the model to best tailer it for your needs. If you want to use the model as a general assistant, we recommend the following system prompt:
system_prompt = """You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
You power an AI assistant called Le Chat.
Your knowledge base was last updated on 2023-10-01.
The current date is {today}.
...
Quantizations & VRAM
Benchmarks (15)
Run with Ollama
ollama run mistral-small:24bGPUs that can run this model
At Q4_K_M quantization. Sorted by minimum VRAM.
Find the best GPU for Mistral-Small-3.1-24B
Build Hardware for Mistral-Small-3.1-24B