Tencent/Mixture of Experts

Hunyuan A13B

chatcodingreasoningmultilingualmathThinkingTool Use
80B
Parameters (13B active)
250K
Context length
5
Benchmarks
4
Quantizations
70K
HF downloads
Architecture
MoE
Released
2025-06-27
Layers
32
KV Heads
8
Head Dim
128
Family
hunyuan
<p></p>

Welcome to the official repository of Hunyuan-A13B, an innovative and open-source large language model (LLM) built on a fine-grained Mixture-of-Experts (MoE) architecture. Designed for efficiency and scalability, Hunyuan-A13B delivers cutting-edge performance with minimal computational overhead, making it an ideal choice for advanced reasoning and general-purpose applications, especially in resource-constrained environments.

Model Introduction

With the rapid advancement of artificial intelligence technology, large language models (LLMs) have achieved remarkable progress in natural language processing, computer vision, and scientific tasks. However, as model scales continue to expand, optimizing resource consumption while maintaining high performance has become a critical challenge. To address this, we have explored Mixture of Experts (MoE) architectures. The newly introduced Hunyuan-A13B model features a total of 80 billion parameters with 13 billion active parameters. It not only delivers high-performance results but also achieves optimal resource efficiency, successfully balancing computational power and resource utilization.

Key Features and Advantages

  • Compact yet Powerful: With only 13 billion active parameters (out of a total of 80 billion), the model delivers competitive performance on a wide range of benchmark tasks, rivaling much larger models.
  • Hybrid Reasoning Support: Supports both fast and slow thinking modes, allowing users to flexibly choose according to their needs.
  • Ultra-Long Context Understanding: Natively supports a 256K context window, maintaining stable performance on long-text tasks.
  • Enhanced Agent Capabilities: Optimized for agent tasks, achieving leading results on benchmarks such as BFCL-v3, τ-Bench and C3-Bench.
  • Efficient Inference: Utilizes Grouped Query Attention (GQA) and supports multiple quantization formats, enabling highly efficient inference.

Why Choose Hunyuan-A13B?

As a powerful yet computationally efficient large model, Hunyuan-A13B is an ideal choice for researchers and developers seeking high performance under resource constraints. Whether for academic research, cost-effective AI solution development, or innovative application exploration, this model provides a robust foundation for advancement.

 

Related News

  • 2025.6.27 We have open-sourced Hunyuan-A13B-Pretrain , Hunyuan-A13B-Instruct , Hunyuan-A13B-Instruct-FP8 , Hunyuan-A13B-Instruct-GPTQ-Int4 on Hugging Face. In addition, we have released a <a href="report/Hunyuan_A13B_Technical_Report.pdf">technical report </a> and a training and inference operation manual, which provide detailed information about the model’s capabilities as well as the operations for training and inference.

Benchmark

Note: The following benchmarks are evaluated by TRT-LLM-backend on several base models.

ModelHunyuan-LargeQwen2.5-72BQwen3-A22BHunyuan-A13B
MMLU88.4086.1087.8188.17
MMLU-Pro60.2058.1068.1867.23
MMLU-Redux87.4783.9087.4087.67
BBH86.3085.8088.8787.56
SuperGPQA38.9036.2044.0641.32
EvalPlus75.6965.9377.6078.64
MultiPL-E59.1360.5065.9469.33
MBPP72.6076.0081.4083.86
CRUX-I57.0057.63-70.13
CRUX-O60.6366.2079.0077.00
MATH69.8062.1271.8472.35
CMATH91.3084.80-91.17
GSM8k92.8091.5094.3991.83
GPQA25.1845.9047.4749.12

Hunyuan-A13B-Instruct has achieved highly competitive performance across multiple benchmarks, particularly in mathematics, science, agent domains, and more. We compared it with several powerful models, and the results are shown below.

TopicBenchOpenAI-o1-1217DeepSeek R1Qwen3-A22BHunyuan-A13B-Instruct
MathematicsAIME 2024
AIME 2025
MATH74.3
79.2
96.479.8
70
94.985.7
81.5
94.087.3
76.8
94.3
ScienceGPQA-Diamond
OlympiadBench78
83.171.5
82.471.1
85.771.2
82.7
CodingLivecodebench
Fullstackbench
ArtifactsBench63.9
64.6
38.665.9
71.6
44.670.7
65.6
44.663.9
67.8
43
ReasoningBBH
DROP
ZebraLogic80.4
90.2
8183.7
92.2
78.788.9
90.3
80.389.1
91.1
84.7
**Instruction
Following**IF-Eval
SysBench91.8
82.588.3
77.783.4
74.284.7
76.1
**Text
Creation**LengthCtrl
InsCtrl60.1
74.855.9
6953.3
73.755.4
71.9
NLUComplexNLU
Word-Task64.7
67.164.5
76.359.8
56.461.2
62.9
AgentBFCL v3
τ-Bench
ComplexFuncBench
C3-Bench67.8
60.4
47.6
58.856.9
43.8
41.1
55.370.8
44.6
40.6
51.778.3
54.7
61.2
63.5

 

Use with transformers

Our model defaults to using slow-thinking reasoning, and there are two ways to disable CoT reasoning.

  1. Pass "enable_thinking=False" when calling apply_chat_template.
  2. Adding "/no_think" before the prompt will force the model not to use perform CoT reasoning. Similarly, adding "/think" before the prompt will force the model to perform CoT reasoning.

The following code snippet shows how to use the transformers library to load and apply the model. It also demonstrates how to enable and disable the reasoning mode , and how to parse the reasoning process along with the final output.

from transformers import AutoModelForCausalLM, AutoTokenizer
import os
import re

model_name_or_path = os.environ['MODEL_PATH']
# model_name_or_path = "tencent/Hunyuan-A13B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto",trust_remote_code=True)  # You may want to use bfloat16 and/or move to GPU here
messages = [
    {"role": "user", "content": "Write a short summary of the benefits of regular exercise"},
]

text = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            enable_thinking=True
            )

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
model_inputs.pop("token_type_ids", None)
outputs = model.generate(**model_inputs, max_new_tokens=4096)


output_text = tokenizer.decode(outputs[0])

think_pattern = r'<think>(.*?)</think>'
think_matches = re.findall(think_pattern, output_text, re.DOTALL)

answer_pattern = r'<answer>(.*?)</answer>'
answer_matches = re.findall(answer_pattern, output_text, re.DOTALL)

think_content = [match.strip() for match in think_matches][0]
answer_content = [match.strip() for match in answer_matches][0]
print(f"thinking_content:{think_content}\n\n")
print(f"answer_content:{answer_content}\n\n")

Fast and slow thinking switch

This model supports two modes of operation:

...

Quantizations & VRAM

Q4_K_M4.5 bpw
45.5 GB
VRAM required
94%
Quality
Q6_K6.5 bpw
65.5 GB
VRAM required
97%
Quality
Q8_08 bpw
80.5 GB
VRAM required
100%
Quality
FP1616 bpw
160.5 GB
VRAM required
100%
Quality

Benchmarks (5)

MATH94.3
BBH89.1
MBPP83.9
GPQA71.2
MMLU-PRO67.2

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Apple M3 Max (48GB)
48 GB VRAM • 400 GB/s
APPLE
$2899
Apple M4 Pro (48GB)
48 GB VRAM • 273 GB/s
APPLE
$1799
Apple M4 Max (48GB)
48 GB VRAM • 546 GB/s
APPLE
$2499
NVIDIA L40S 48GB
48 GB VRAM • 864 GB/s
NVIDIA
$7500
NVIDIA L40 48GB
48 GB VRAM • 864 GB/s
NVIDIA
$5500
NVIDIA RTX 6000 Ada 48GB
48 GB VRAM • 960 GB/s
NVIDIA
$6800
NVIDIA A40 48GB
48 GB VRAM • 696 GB/s
NVIDIA
$4650
NVIDIA RTX A6000 48GB
48 GB VRAM • 768 GB/s
NVIDIA
$4650
NVIDIA Quadro RTX 8000
48 GB VRAM • 672 GB/s
NVIDIA
NVIDIA Quadro RTX 8000 Passive
48 GB VRAM • 624 GB/s
NVIDIA
NVIDIA A40 PCIe
48 GB VRAM • 696 GB/s
NVIDIA
NVIDIA RTX 6000 Ada Generation
48 GB VRAM • 960 GB/s
NVIDIA
NVIDIA L20
48 GB VRAM • 864 GB/s
NVIDIA
AMD Radeon PRO W7800 48 GB
48 GB VRAM • 864 GB/s
AMD
AMD Radeon PRO W7900
48 GB VRAM • 864 GB/s
AMD
Intel Data Center GPU Max 1100
48 GB VRAM • 1230 GB/s
INTEL
NVIDIA RTX 5880 Ada Generation
48 GB VRAM • 864 GB/s
NVIDIA
NVIDIA RTX PRO 5000 Blackwell
48 GB VRAM • 1340 GB/s
NVIDIA
AMD Radeon PRO W7900D
48 GB VRAM • 864 GB/s
AMD
Apple M1 Ultra (64GB)
64 GB VRAM • 800 GB/s
APPLE
$2499
Apple M2 Ultra (64GB)
64 GB VRAM • 800 GB/s
APPLE
$2999
Apple M4 Max (64GB)
64 GB VRAM • 546 GB/s
APPLE
$2899
Apple M2 Max (64GB)
64 GB VRAM • 400 GB/s
APPLE
$2299
Apple M3 Max (64GB)
64 GB VRAM • 300 GB/s
APPLE
$2799
Apple M4 Pro (64GB)
64 GB VRAM • 273 GB/s
APPLE
$2599
AMD Radeon Instinct MI200
64 GB VRAM • 1640 GB/s
AMD
AMD Radeon Instinct MI210
64 GB VRAM • 1640 GB/s
AMD
NVIDIA H100 SXM5 64 GB
64 GB VRAM • 2020 GB/s
NVIDIA
NVIDIA Jetson AGX Orin 64 GB
64 GB VRAM • 205 GB/s
NVIDIA
NVIDIA Jetson T4000
64 GB VRAM • 273 GB/s
NVIDIA

Find the best GPU for Hunyuan A13B

Build Hardware for Hunyuan A13B