Xiaomi/Dense

MiMo 7B

chatcodingreasoningmath
7B
Parameters
32K
Context length
4
Benchmarks
4
Quantizations
80K
HF downloads
Architecture
Dense
Released
2025-05-12
Layers
36
KV Heads
8
Head Dim
128
Family
mimo
<div align="center"> <picture> <source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)"> <img src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo.png?raw=true" width="60%" alt="Xiaomi-MiMo" /> </picture> </div> <h3 align="center"> <b> <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span> <br/> Unlocking the Reasoning Potential of Language Model<br/>From Pretraining to Posttraining <br/> <span>━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span> <br/> </b> </h3> <br/> <div align="center" style="line-height: 1;"> | <a href="https://huggingface.co/XiaomiMiMo" target="_blank">🤗 HuggingFace</a> &nbsp;| <a href="https://www.modelscope.cn/organization/XiaomiMiMo" target="_blank">🤖️ ModelScope</a> &nbsp;| <a href="https://arxiv.org/abs/2505.07608" target="_blank">📔 Technical Report</a> &nbsp;| <br/> </div> <br/>

Updates

[2025.05.30] We scaled the SFT dataset from approximately 500K to 6M instances and continuously expanding the RL training window size from 32K to 48K, the performance of MiMo-7B-RL-0530 on AIME24 can be continuously improved and eventually surpass that of DeepSeek R1 (79.8).

<table> <thead> <tr> <th>Benchmark</th> <th>MiMo-7B-RL</th> <th>MiMo-7B-RL-0530</th> </tr> </thead> <tbody> <tr> <td colspan="3"><strong>Mathematics</strong></td> <p align="center"> <td rowspan="11"><img width="80%" src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/length.jpg?raw=true"></td> </p> </tr> <tr><td>MATH500<br/>(Pass@1)</td><td>95.8</td><td>97.2</td></tr> <tr><td>AIME 2024<br/>(Pass@1)</td><td>68.2</td><td>80.1</td></tr> <tr><td>AIME 2025<br/>(Pass@1)</td><td>55.4</td><td>70.2</td></tr> <tr><td colspan="3"><strong>Code</strong></td></tr> <tr><td>LiveCodeBench v5<br/>(Pass@1)</td><td>57.8</td><td>60.9</td></tr> <tr><td>LiveCodeBench v6<br/>(Pass@1)</td><td>49.3</td><td>52.2</td></tr> <tr><td colspan="3"><strong>STEM</strong></td></tr> <tr><td>GPQA-Diamond<br/>(Pass@1)</td><td>54.4</td><td>60.6</td></tr> <tr><td colspan="3"><strong>General</strong></td></tr> <tr><td>Alignbench1.1<br/>(Evaluated by GPT4.1)</td><td>6.9</td><td>7.4</td></tr> </tbody> </table>

I. Introduction

Currently, most successful RL works, including open-source research, rely on relatively large base models, e.g., 32B models, particularly for enhancing code reasoning capabilities. Moreover, it was widely considered that achieving uniform and simultaneous improvements in both mathematical and code capabilities within a small model is challenging. Nonetheless, we believe that the effectiveness of the RL trained reasoning model relies on the inherent reasoning potential of the base model. To fully unlock the reasoning potential of language models, efforts must focus not only on post-training but also on pre-training strategies tailored to reasoning.

In this work, we present MiMo-7B, a series of models trained from scratch and born for reasoning tasks. Our RL experiments from MiMo-7B-Base show that our model possesses extraordinary reasoning potential, even surpassing much larger 32B models. Additionally, we perform RL training on a cold-started SFT model, resulting in MiMo-7B-RL, which demonstrates superior performance on both mathematics and code reasoning tasks, matching the performance of OpenAI o1-mini.

<p align="center"> <img width="80%" src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/curve.png?raw=true"> </p>

We open-source MiMo-7B series, including checkpoints of the base model, SFT model, RL model trained from base model, and RL model trained from the SFT model. We believe this report along with the models will provide valuable insights to develop powerful reasoning LLMs that benefit the larger community.

🌟 Highlights

  • Pre-Training: Base Model Born for Reasoning

    • We optimize the data preprocessing pipeline, enhancing text extraction toolkits and applying multi-dimensional data filtering to increase reasoning pattern density in pre-training data. We also employ multiple strategies to generate massive diverse synthetic reasoning data.
    • We adopt a three-stage data mixture strategy for pre-training. Overall, MiMo-7B-Base is pre-trained on approximately 25 trillion tokens.
    • We incorporate Multiple-Token Prediction as an additional training objective, which enhances model performance and accelerates inference.
  • Post-Training Recipe: Pioneering Reasoning Model

    • We curate 130K mathematics and code problems as RL training data, which can be verified by rule-based verifiers. Each problem undergoes careful cleaning and difficulty assessment to ensure quality. We employ only rule-based accuracy rewards to avoid potential reward hacking.
    • To mitigate the sparse reward issue for challenging code problems, we introduce a test difficulty driven code reward. By assigning fine-grained scores for test cases with varying difficulty levels, the policy can be more effectively optimized via dense reward signal.
    • We implement a data re-sampling strategy for easy problems to enhance rollout sampling efficiency and stabilize policy updates, particularly in the later phases of RL training.
  • RL Infrastructure

    • We develop a Seamless Rollout Engine to accelerate RL training and validation. Our design integrates continuous rollout, asynchronous reward computation, and early termination to minimize GPU idle time, achieving $2.29\times$ faster training and $1.96\times$ faster validation.
    • We support MTP in vLLM and enhance the robustness of the inference engine in the RL system.

II. Model Details

The MTP layers of MiMo-7B is tuned during pretraining and SFT and freezed during RL. With one MTP layer for speculative decoding, the acceptance rate is about 90%.

<p align="center"> <img width="80%" src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/architecture.png?raw=true"> </p>

Models are available at https://huggingface.co/XiaomiMiMo and https://www.modelscope.cn/organization/XiaomiMiMo

ModelDescriptionDownload (HuggingFace)Download (ModelScope)
MiMo-7B-BaseBase model with extraordinary reasoning potential🤗 XiaomiMiMo/MiMo-7B-Base🤖️ XiaomiMiMo/MiMo-7B-Base
MiMo-7B-RL-ZeroRL model trained from base model🤗 XiaomiMiMo/MiMo-7B-RL-Zero🤖️ XiaomiMiMo/MiMo-7B-RL-Zero
MiMo-7B-SFTSFT model trained from base model🤗 XiaomiMiMo/MiMo-7B-SFT🤖️ XiaomiMiMo/MiMo-7B-SFT
MiMo-7B-RLRL model trained from SFT model, superior performance matching OpenAI o1-mini🤗 XiaomiMiMo/MiMo-7B-RL🤖️ XiaomiMiMo/MiMo-7B-RL

III. Evaluation Results

BenchmarkGPT-4o-0513Claude-3.5-Sonnet-1022OpenAI o1-miniQwQ-32B-PreviewR1-Distill-Qwen-14BR1-Distill-Qwen-7BMiMo-7B-RL
General
GPQA Diamond<br/>(Pass@1)49.965.060.054.559.149.154.4
SuperGPQA<br/>(Pass@1)42.448.245.243.640.628.940.5
DROP<br/>(3-shot F1)83.788.383.971.285.577.078.7
MMLU-Pro<br/>(EM)72.678.080.352.068.853.558.6
IF-Eval<br/>(Prompt Strict)84.386.584.840.478.360.561.0
Mathematics
MATH-500<br/>(Pass@1)74.678.390.090.693.992.895.8
AIME 2024<br/>(Pass@1)9.316.063.650.069.755.568.2
AIME 2025<br/>(Pass@1)11.67.450.732.448.238.855.4
Code
LiveCodeBench v5<br/>(Pass@1)32.938.953.841.953.137.657.8
LiveCodeBench v6<br/>(Pass@1)30.937.246.839.131.923.949.3

MiMo-7B series

BenchmarkMiMo-7B-BaseMiMo-7B-RL-ZeroMiMo-7B-SFTMiMo-7B-RL
Mathematics
MATH500<br/>(Pass@1)37.493.693.095.8
AIME 2024<br/>(Pass@1)32.956.458.768.2
AIME 2025<br/>(Pass@1)24.346.344.355.4
Code
LiveCodeBench v5<br/>(Pass@1)32.949.152.357.8
LiveCodeBench v6<br/>(Pass@1)29.142.945.549.3

[!IMPORTANT] The evaluations are conducted with temperature=0.6.

AIME24 and AIME25 are with averaged score of 32 repetitions. LiveCodeBench v5 (20240801-20250201), LiveCodeBench v6 (20250201-20250501), GPQA-Diamond and IF-Eval are with averaged score of 8 repetitions. MATH500 and SuperGPQA are with a single run.

IV. Deployment

SGLang Inference

Thanks to the MiMo model support and MTP from the SGLang team, we supported MiMo in SGLang mainstream.

Example Script

# Install the latest SGlang from main branch
python3 -m uv pip install "sglang[all] @ git+https://github.com/sgl-project/sglang.git/@main#egg=sglang&subdirectory=python"

# Launch SGLang Server
python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-RL --host 0.0.0.0 --trust-remote-code

# Launch MTP Server
python3 -m sglang.launch_server --model-path XiaomiMiMo/MiMo-7B-RL --trust-remote-code \
--speculative-algorithm EAGLE --speculative-num-steps 1 --speculative-eagle-topk 1 \
--speculative-num-draft-tokens 2  --mem-fraction 0.5

Detailed usage can be found in SGLang documents.

vLLM inference

  1. [Recommended] We officially support inference with MiMo-MTP using our fork of vLLM.

Example script

from vllm import LLM, SamplingParams

model_path = "/path/to/MiMo"
llm = LLM(
    model=model_path,
    trust_remote_code=True,
    num_speculative_tokens=1,
    disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)

conversation = [
    {
        "role": "system",
        "content": ""
    },
    {
        "role": "user",
        "content": "Write an essay about the importance of higher education.",
    },
]

outputs = llm.chat(conversation,
                   sampling_params=sampling_params,
                   use_tqdm=False)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

print("=" * 80)
  1. Or, you can register a vLLM loader for MiMo without loading MTP parameters.

You can copy the registry/register_mimo_in_vllm.py to your directory and import it with

import register_mimo_in_vllm

from vllm import LLM, SamplingParams

model_path = "/path/to/MiMo"
llm = LLM(
    model=model_path,
    trust_remote_code=True,
    # num_speculative_tokens=1,
    disable_log_stats=False
)
sampling_params = SamplingParams(temperature=0.6)

HuggingFace inference

Example script

from transformers import AutoModel, AutoModelForCausalLM, AutoTokenizer

model_id = "XiaomiMiMo/MiMo-7B-RL"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer(["Today is"], return_tensors='pt')
output = model.generate(**inputs, max_new_tokens = 100)
print(tokenizer.decode(output.tolist()[0]))

Recommended environment and prompts

  • We recommend using our fork of vLLM which is developed based on vLLM 0.7.3.
  • We recommend using empty system prompt.

We haven't verified MiMo with other inference engines and welcome contributions based on the model definition in the Huggingface repo 💻.

V. Citation

@misc{coreteam2025mimounlockingreasoningpotential,
      title={MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining}, 
      author={LLM-Core-Team Xiaomi},
      year={2025},
      eprint={2505.07608},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.07608}, 
}

VI. Contact

Please contact us at mimo@xiaomi.com or open an issue if you have any questions.

Quantizations & VRAM

Q4_K_M4.5 bpw
4.4 GB
VRAM required
94%
Quality
Q6_K6.5 bpw
6.2 GB
VRAM required
97%
Quality
Q8_08 bpw
7.5 GB
VRAM required
100%
Quality
FP1616 bpw
14.5 GB
VRAM required
100%
Quality

Benchmarks (4)

MATH95.8
IFEval61.0
MMLU-PRO58.6
GPQA54.4

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

NVIDIA Tesla K20c
5 GB VRAM • 208 GB/s
NVIDIA
NVIDIA Tesla K20m
5 GB VRAM • 208 GB/s
NVIDIA
NVIDIA Tesla K20s
5 GB VRAM • 208 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 5 GB
5 GB VRAM • 160 GB/s
NVIDIA
NVIDIA P102-100
5 GB VRAM • 440 GB/s
NVIDIA
NVIDIA RTX 3050 6GB
6 GB VRAM • 168 GB/s
NVIDIA
$169
Intel Arc A380
6 GB VRAM • 186 GB/s
INTEL
$129
NVIDIA RTX 2060 6GB
6 GB VRAM • 336 GB/s
NVIDIA
$150
NVIDIA GTX 1660 SUPER
6 GB VRAM • 336 GB/s
NVIDIA
$150
NVIDIA GTX 1660 Ti
6 GB VRAM • 288 GB/s
NVIDIA
$140
NVIDIA GTX 1060 6GB
6 GB VRAM • 192 GB/s
NVIDIA
$80
NVIDIA Tesla C2070
6 GB VRAM • 143 GB/s
NVIDIA
NVIDIA Tesla C2075
6 GB VRAM • 150 GB/s
NVIDIA
NVIDIA Tesla C2090
6 GB VRAM • 177 GB/s
NVIDIA
NVIDIA Tesla M2070
6 GB VRAM • 150 GB/s
NVIDIA
NVIDIA Tesla M2070-Q
6 GB VRAM • 150 GB/s
NVIDIA
NVIDIA Tesla M2075
6 GB VRAM • 150 GB/s
NVIDIA
NVIDIA Tesla M2090
6 GB VRAM • 177 GB/s
NVIDIA
NVIDIA Tesla X2070
6 GB VRAM • 177 GB/s
NVIDIA
NVIDIA Tesla X2090
6 GB VRAM • 177 GB/s
NVIDIA
NVIDIA Tesla K20X
6 GB VRAM • 250 GB/s
NVIDIA
NVIDIA Tesla K20Xm
6 GB VRAM • 250 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 6 GB
6 GB VRAM • 192 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 6 GB 9Gbps
6 GB VRAM • 217 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 6 GB GDDR5X
6 GB VRAM • 192 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 6 GB GP104
6 GB VRAM • 192 GB/s
NVIDIA
NVIDIA GeForce GTX 1060 6 GB Rev. 2
6 GB VRAM • 192 GB/s
NVIDIA
NVIDIA GeForce GTX 1660
6 GB VRAM • 192 GB/s
NVIDIA
NVIDIA GeForce GTX 1660 SUPER
6 GB VRAM • 336 GB/s
NVIDIA
NVIDIA GeForce GTX 1660 Ti
6 GB VRAM • 288 GB/s
NVIDIA

Find the best GPU for MiMo 7B

Build Hardware for MiMo 7B