Mistral AI/Dense

Pixtral Large 124B

chatvisionreasoningThinkingTool Use
124B
Parameters
128K
Context length
16
Benchmarks
4
Quantizations
20K
HF downloads
Architecture
Dense
Released
2024-11-18
Layers
88
KV Heads
8
Head Dim
128
Family
mistral

Model Card for Pixtral-Large-Instruct-2411

Pixtral-Large-Instruct-2411 is a 124B multimodal model built on top of Mistral Large 2, i.e., Mistral-Large-Instruct-2407. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. Particularly, the model is able to understand documents, charts and natural images, while maintaining the leading text-only understanding of Mistral Large 2.

For more details about this model please refer to the Pixtral Large blog post and the Pixtral 12B blog post.

[!IMPORTANT] ❗ The Transformers implementation is not yet working (see here), please use the vLLM implementation as shown below.

Key features

  • Frontier-class multimodal performance
  • State-of-the-art on MathVista, DocVQA, VQAv2
  • Extends Mistral Large 2 without compromising text performance
  • 123B multimodal decoder, 1B parameter vision encoder
  • 128K context window: fits minimum of 30 high-resolution images
<!-- - **Multi-lingual by design:** Dozens of languages supported, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch and Polish. - **Proficient in coding:** Trained on 80+ coding languages such as Python, Java, C, C++, Javacsript, and Bash. Also trained on more specific languages such as Swift and Fortran. - **Agentic-centric:** Best-in-class agentic capabilities with native function calling and JSON outputting. - **Advanced Reasoning:** State-of-the-art mathematical and reasoning capabilities. - **Mistral Research License:** Allows usage and modification for research and non-commercial usages. - **Large Context:** A large 128k context window. - **System Prompt:** Maintains strong adherence and support for more reliable system prompts. - **Vision:** A 1B parameter Vision Encoder achieving SOTA vision capabilities. -->

System Prompt Handling

We appreciate the feedback received from our community regarding our system prompt handling.
In response, we have implemented stronger support for system prompts.
To achieve optimal results, we recommend always including a system prompt that clearly outlines the bot's purpose, even if it is minimal.

Basic Instruct Template (V7)

<s>[SYSTEM_PROMPT] <system prompt>[/SYSTEM_PROMPT][INST] <user message>[/INST] <assistant response></s>[INST] <user message>[/INST]

Be careful with subtle missing or trailing white spaces!

Please make sure to use mistral-common as the source of truth

Metrics

ModelMathVista (CoT)MMMU (CoT)ChartQA (CoT)DocVQA (ANLS)VQAv2 (VQA Match)AI2D (BBox)MM MT-Bench
Pixtral Large (124B)<u>69.4</u>64.088.1<u>93.3</u><u>80.9</u>93.8<u>7.4</u>
Gemini-1.5 Pro (measured)67.866.383.892.370.6<u>94.6</u>6.8
GPT-4o (measured)65.4<u>68.6</u>85.288.576.493.26.7
Claude-3.5 Sonnet (measured)67.168.4<u>89.1</u>88.669.576.97.3
Llama-3.2 90B (measured)49.153.770.885.767.0-5.5

Specific model versions evaluated: Claude-3.5 Sonnet (new) [Oct 24], Gemini-1.5 Pro (002) [Sep 24], GPT-4o (2024-08-06) [Aug 24].

See mistral-evals for open-source MM MT-Bench evaluation scripts.

Usage

The model can be used with the following frameworks

vLLM

We recommend using Pixtral-Large-Instruct-2411 with the vLLM library to implement production-ready inference pipelines with Pixtral-Large-Instruct-2411.

Installation

Make sure you install vLLM >= v0.6.4.post1:

pip install --upgrade vllm

Also make sure you have mistral_common >= 1.5.0 installed:

pip install --upgrade mistral_common

You can also make use of a ready-to-go docker image or on the docker hub.

Server (Image)

We recommend to use Pixtral-Large-Instruct-2411 in a server/client setting.

  1. Spin up a server:
vllm serve mistralai/Pixtral-Large-Instruct-2411 --config-format mistral --load-format mistral --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8
  1. And ping the client:
import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Pixtral-Large-Instruct-2411"


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]

data = {"model": model, "messages": messages}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
# Determining which country has the "best" food can be subjective and depends on personal preferences. However, based on popular culinary reputations, here are some countries known for their cuisine:

#1. **Italy** (Brown) - Known for its pasta, pizza, and diverse regional dishes.
#   - City: Milan

#2. **France** (Dark Brown) - Renowned for its fine dining, pastries, and wine.
#   - City: Lyon

#3. **Spain** (Yellow) - Famous for tapas, paella, and a variety of seafood dishes.
#   - City: Barcelona

#4. **Greece** (Yellow) - Known for its Mediterranean cuisine, including moussaka, souvlaki, and fresh seafood.
#   - City: Thessaloniki

#These rankings are based on general culinary reputations and can vary widely depending on individual tastes.

Server (Text-only)

You can also ping the client with a text-only example. The following example shows how the system prompt can be used to make sure the model always knows the current date.

import requests
import json
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

url = "http://<your-server-url>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "mistralai/Pixtral-Large-Instruct-2411"


def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)


SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")

image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": "Without browsing the web, how many days ago was Mistral founded?"
    },
]

data = {"model": model, "messages": messages}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])
# Mistral AI was founded in April 2023. Since the current date is November 18, 2024, we can calculate the number of days between April 2023 and November 18, 2024.

#First, calculate the days from April 2023 to the end of 2023:
#- April: 27 days (30 - 3)
#- May: 31 days
#- June: 30 days
#- July: 31 days
#- August: 31 days
#- September: 30 days
#- October: 31 days
#- November: 30 days
#- December: 31 days

#Total days from April 2023 to December 31, 2023: 27 + 31 + 30 + 31 + 31 + 30 + 31 + 30 + 31 = 272 days

#Next, calculate the days from January 1, 2024, to November 18, 2024:
#- January: 31 days
#- February: 29 days (2024 is a leap year)
#- March: 31 days
#- April: 30 days
#- May: 31 days
#- June: 30 days
#- July: 31 days
#- August: 31 days
#- September: 30 days
#- October: 31 days
#- November: 18 days

#Total days from January 1, 2024, to November 18, 2024: 31 + 29 + 31 + 30 + 31 + 30 + 31 + 31 + 30 + 31 + 18 = 323 days

#Adding the two periods together:
#272 days (from April 2023 to December 2023) + 323 days (from January 2024 to November 18, 2024) = 595 days

#Therefore, Mistral AI was founded 595 days ago from November 18, 2024.

Offline Example

from vllm import LLM
from vllm.sampling_params import SamplingParams
from huggingface_hub import hf_hub_download
from datetime import datetime, timedelta

model_name = "mistralai/Pixtral-Large-Instruct-2411"

def load_system_prompt(repo_id: str, filename: str) -> str:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, 'r') as file:
        system_prompt = file.read()
    today = datetime.today().strftime('%Y-%m-%d')
    yesterday = (datetime.today() - timedelta(days=1)).strftime('%Y-%m-%d')
    model_name = repo_id.split("/")[-1]
    return system_prompt.format(name=model_name, today=today, yesterday=yesterday)

SYSTEM_PROMPT = load_system_prompt(model_name, "SYSTEM_PROMPT.txt")

image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.",
            },
            {"type": "image_url", "image_url": {"url": image_url}},
        ],
    },
]

sampling_params = SamplingParams(max_tokens=512)

# note that running this model on GPU requires over 300 GB of GPU RAM
llm = LLM(model=model_name, config_format="mistral", load_format="mistral", tokenizer_mode="mistral", tensor_parallel_size=8, limit_mm_per_prompt={"image": 4})

outputs = llm.chat(messages, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)

The Mistral AI Team

Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall

Quantizations & VRAM

Q4_K_M4.5 bpw
72.7 GB
VRAM required
94%
Quality
Q6_K6.5 bpw
103.7 GB
VRAM required
97%
Quality
Q8_08 bpw
126.9 GB
VRAM required
100%
Quality
FP1616 bpw
250.9 GB
VRAM required
100%
Quality

Benchmarks (16)

IFEval87.0
HumanEval82.0
MMBench79.0
MATH-50071.4
MMMU62.7
BBH52.7
MMLU-PRO50.7
GPQA Diamond50.5
MATH49.5
LiveCodeBench26.1
GPQA24.9
MUSR17.2
AA Intelligence14.0
HLE3.6
AIME2.3
AA Math2.3

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

NVIDIA H100 SXM5 80GB
80 GB VRAM • 3350 GB/s
NVIDIA
$25000
NVIDIA H100 PCIe 80GB
80 GB VRAM • 2000 GB/s
NVIDIA
$25000
NVIDIA A100 SXM 80GB
80 GB VRAM • 2039 GB/s
NVIDIA
$10000
NVIDIA A100 PCIe 80GB
80 GB VRAM • 1935 GB/s
NVIDIA
$10000
NVIDIA A100 SXM4 80 GB
80 GB VRAM • 2040 GB/s
NVIDIA
NVIDIA A100 PCIe 80 GB
80 GB VRAM • 1940 GB/s
NVIDIA
NVIDIA A100X
80 GB VRAM • 2040 GB/s
NVIDIA
NVIDIA H100 PCIe 80 GB
80 GB VRAM • 2040 GB/s
NVIDIA
NVIDIA H100 SXM5 80 GB
80 GB VRAM • 3360 GB/s
NVIDIA
NVIDIA H100 CNX
80 GB VRAM • 2040 GB/s
NVIDIA
NVIDIA A800 PCIe 80 GB
80 GB VRAM • 1940 GB/s
NVIDIA
NVIDIA A800 SXM4 80 GB
80 GB VRAM • 2040 GB/s
NVIDIA
NVIDIA H800 PCIe 80 GB
80 GB VRAM • 2040 GB/s
NVIDIA
NVIDIA H800 SXM5
80 GB VRAM • 3360 GB/s
NVIDIA
NVIDIA RTX 6000D
84 GB VRAM • 1570 GB/s
NVIDIA
NVIDIA B200
90 GB VRAM • 4100 GB/s
NVIDIA
NVIDIA H100 NVL 94 GB
94 GB VRAM • 3940 GB/s
NVIDIA
NVIDIA H100 SXM5 94 GB
94 GB VRAM • 3360 GB/s
NVIDIA
RTX Pro 6000
96 GB VRAM • 1792 GB/s
NVIDIA
$8565
NVIDIA H100 PCIe 96 GB
96 GB VRAM • 3360 GB/s
NVIDIA
NVIDIA H100 SXM5 96 GB
96 GB VRAM • 3360 GB/s
NVIDIA
Intel Data Center GPU Max 1350
96 GB VRAM • 2460 GB/s
INTEL
NVIDIA RTX PRO 6000 Blackwell Server
96 GB VRAM • 1790 GB/s
NVIDIA
AMD Instinct MI300A
120 GB VRAM • 5300 GB/s
AMD
$12000
Apple M4 Max (128GB)
128 GB VRAM • 546 GB/s
APPLE
$3999
AMD Instinct MI250X
128 GB VRAM • 3277 GB/s
AMD
$10000
Apple M1 Ultra (128GB)
128 GB VRAM • 800 GB/s
APPLE
$4999
Apple M2 Ultra (128GB)
128 GB VRAM • 800 GB/s
APPLE
$3999
AMD Radeon Instinct MI250
128 GB VRAM • 3280 GB/s
AMD
AMD Radeon Instinct MI250X
128 GB VRAM • 3280 GB/s
AMD

Find the best GPU for Pixtral Large 124B

Build Hardware for Pixtral Large 124B