Zhipu AI/Mixture of Experts

GLM 4.5 Air

chatcodingreasoningtool_useThinking

110B

Parameters (12B active)

125K

Context length

8

Benchmarks

4

Quantizations

378K

HF downloads

Architecture

MoE

Released

2025-08-08

Layers

46

KV Heads

8

Head Dim

128

Family

glm

Model Card

View on HuggingFace

GLM-4.5-Air

Model Introduction

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models that provide two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses.

We have open-sourced the base models, hybrid reasoning models, and FP8 versions of the hybrid reasoning models for both GLM-4.5 and GLM-4.5-Air. They are released under the MIT open-source license and can be used commercially and for secondary development.

As demonstrated in our comprehensive evaluation across 12 industry-standard benchmarks, GLM-4.5 achieves exceptional performance with a score of 63.2, in the 3rd place among all the proprietary and open-source models. Notably, GLM-4.5-Air delivers competitive results at 59.8 while maintaining superior efficiency.

For more eval results, show cases, and technical details, please visit our technical blog or technical report.

The model code, tool parser and reasoning parser can be found in the implementation of transformers, vLLM and SGLang.

Quick Start

Please refer our github page for more detail.

Quantizations & VRAM

Q4_K_M4.5 bpw

62.4 GB

VRAM required

94%

Quality

Q6_K6.5 bpw

89.9 GB

VRAM required

97%

Quality

Q8_08 bpw

110.5 GB

VRAM required

100%

Quality

FP1616 bpw

220.5 GB

VRAM required

100%

Quality

Benchmarks (8)

MATH-50096.5

AIME80.7

AA Math80.7

GPQA Diamond73.3

LiveCodeBench68.4

AA Coding23.8

AA Intelligence23.2

HLE6.8

HuggingFace GGUF Downloads Build Hardware

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Apple M1 Ultra (64GB)

64 GB VRAM • 800 GB/s

Apple M2 Ultra (64GB)

64 GB VRAM • 800 GB/s

Apple M4 Max (64GB)

64 GB VRAM • 546 GB/s

Apple M2 Max (64GB)

64 GB VRAM • 400 GB/s

Apple M3 Max (64GB)

64 GB VRAM • 300 GB/s

Apple M4 Pro (64GB)

64 GB VRAM • 273 GB/s

AMD Radeon Instinct MI200

64 GB VRAM • 1640 GB/s

AMD Radeon Instinct MI210

64 GB VRAM • 1640 GB/s

NVIDIA H100 SXM5 64 GB

64 GB VRAM • 2020 GB/s

NVIDIA Jetson AGX Orin 64 GB

64 GB VRAM • 205 GB/s

NVIDIA Jetson T4000

64 GB VRAM • 273 GB/s

NVIDIA RTX PRO 5000 72 GB Blackwell

72 GB VRAM • 1340 GB/s

NVIDIA H100 SXM5 80GB

80 GB VRAM • 3350 GB/s

NVIDIA H100 PCIe 80GB

80 GB VRAM • 2000 GB/s

NVIDIA A100 SXM 80GB

80 GB VRAM • 2039 GB/s

NVIDIA A100 PCIe 80GB

80 GB VRAM • 1935 GB/s

NVIDIA A100 SXM4 80 GB

80 GB VRAM • 2040 GB/s

NVIDIA A100 PCIe 80 GB

80 GB VRAM • 1940 GB/s

80 GB VRAM • 2040 GB/s

NVIDIA H100 PCIe 80 GB

80 GB VRAM • 2040 GB/s

NVIDIA H100 SXM5 80 GB

80 GB VRAM • 3360 GB/s

NVIDIA H100 CNX

80 GB VRAM • 2040 GB/s

NVIDIA A800 PCIe 80 GB

80 GB VRAM • 1940 GB/s

NVIDIA A800 SXM4 80 GB

80 GB VRAM • 2040 GB/s

NVIDIA H800 PCIe 80 GB

80 GB VRAM • 2040 GB/s

NVIDIA H800 SXM5

80 GB VRAM • 3360 GB/s

NVIDIA RTX 6000D

84 GB VRAM • 1570 GB/s

90 GB VRAM • 4100 GB/s

NVIDIA H100 NVL 94 GB

94 GB VRAM • 3940 GB/s

NVIDIA H100 SXM5 94 GB

94 GB VRAM • 3360 GB/s

Find the best GPU for GLM 4.5 Air

Build Hardware for GLM 4.5 Air