DeepSeek/Mixture of Experts

DeepSeek V4 Flash

Name: DeepSeek V4 Flash
Author: DeepSeek

We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a...

chatcodingreasoningmultilingualmathagentictool_use

158.07B

Parameters (13B active)

1024K

Context length

Benchmarks

Quantizations

97K

HF downloads

Architecture

MoE

Released

2026-04-24

Layers

KV Heads

Head Dim

512

Family

deepseek

Quantization Options

Context length:

Quant	Bits	VRAM @ 16K	Quality
IQ2_XXS	2.38	48.5 GB 47.5 + 1.0 KV	low
IQ2_M	2.93	59.4 GB 58.4 + 1.0 KV	low
Q2_K	3.16	63.9 GB 62.9 + 1.0 KV	low
IQ3_XXS	3.25	65.7 GB 64.7 + 1.0 KV	low
IQ3_XS	3.5	70.7 GB 69.6 + 1.0 KV	low
Q3_K_S	3.64	73.4 GB 72.4 + 1.0 KV	low
IQ3_M	3.76	75.8 GB 74.8 + 1.0 KV	low
Q3_K_M	4	80.5 GB 79.5 + 1.0 KV	low
Q3_K_L	4.3	86.5 GB 85.5 + 1.0 KV	moderate
IQ4_XS	4.46	89.6 GB 88.6 + 1.0 KV	moderate
Q4_K_S	4.67	93.8 GB 92.8 + 1.0 KV	moderate
Q4_K_M	4.89	98.1 GB 97.1 + 1.0 KV	good
Q5_K_S	5.57	111.6 GB 110.5 + 1.0 KV	good
Q5_K_M	5.7	114.1 GB 113.1 + 1.0 KV	good
Q6_K	6.56	131.1 GB 130.1 + 1.0 KV	excellent
Q8_0	8.5	169.4 GB 168.4 + 1.0 KV	lossless
FP16	16	317.6 GB 316.6 + 1.0 KV	lossless

Select your GPU above to see speed estimates and compatibility for each quantization.

Too big for a single GPU — plan a multi-GPU deployment

Even the lightest quant needs ~49 GB. Size GPUs, replicas, TCO and scaling for a production setup. Open in Enterprise →

▸ READY TO RUN THIS?RENT BY THE HOUR

RENT A GPU AND RUN DEEPSEEK V4 FLASH NOW

Rent on Vast.ai →Or RunPod →

Spin up an A100 / H100 / 4090 in ~60s. Pay by the second. Cancel anytime.

Community Ratings

Loading ratings...

Benchmarks (14)

τ²-Bench94.4

LiveCodeBench91.6

GPQA Diamond88.1

MMLU-PRO86.2

SWE-bench79.0

HumanEval69.5

Terminal-Bench56.9

IFBench47.2

BigCodeBench40.4

SciCode37.3

AA Intelligence36.5

AA Coding35.2

HLE34.8

AA Long Context33.3

Run this model

▸Easiest way to get started·Beginners

DOCS ↗

curl -fsSL https://ollama.com/install.sh | sh

$ollama run deepseek:158b-q4_K_M

Tag may need adjustment — check ollama.com/library/deepseek for available tags.

▸ SETUP GUIDE

Auto-setup with fitmyllm CLI

Detects your GPU, recommends the best model, downloads it, and starts chatting — zero config. Benchmarks your speed and contributes anonymous data to improve predictions.

pip install fitmyllmthen run fitmyllmLearn more

Auto-detect GPULive tok/s in chatSpeed benchmarks9 inference engines

HuggingFace GGUF Downloads Build Hardware

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

AMD Instinct MI300A

120 GB VRAM • 5300 GB/s

AMD

$12000

Apple M4 Max (128GB)

128 GB VRAM • 546 GB/s

APPLE

$3999

AMD Instinct MI250X

128 GB VRAM • 3277 GB/s

AMD

$10000

Apple M1 Ultra (128GB)

128 GB VRAM • 800 GB/s

APPLE

$4999

Apple M2 Ultra (128GB)

128 GB VRAM • 800 GB/s

APPLE

$3999

AMD Radeon Instinct MI250

128 GB VRAM • 3280 GB/s

AMD

$12000

AMD Radeon Instinct MI250X

128 GB VRAM • 3280 GB/s

AMD

$15000

AMD Radeon Instinct MI300

128 GB VRAM • 6550 GB/s

AMD

$12000

Intel Data Center GPU Max 1550

128 GB VRAM • 3280 GB/s

INTEL

Intel Data Center GPU Max Subsystem

128 GB VRAM • 3210 GB/s

INTEL

NVIDIA GB10

128 GB VRAM • 273 GB/s

NVIDIA

NVIDIA Jetson T5000

128 GB VRAM • 273 GB/s

NVIDIA

Apple M5 Max (128GB)

128 GB VRAM • 614 GB/s

APPLE

NVIDIA H200 SXM 141GB

140 GB VRAM • 4800 GB/s

NVIDIA

$30000

NVIDIA H200 NVL

141 GB VRAM • 4890 GB/s

NVIDIA

$35000

NVIDIA H200 SXM 141 GB

141 GB VRAM • 4890 GB/s

NVIDIA

$30000

NVIDIA B300

144 GB VRAM • 4100 GB/s

NVIDIA

$35000

AMD Instinct MI300X

192 GB VRAM • 5300 GB/s

AMD

$15000

Apple M2 Ultra (192GB)

192 GB VRAM • 800 GB/s

APPLE

$5499

Apple M3 Ultra (192GB)

192 GB VRAM • 800 GB/s

APPLE

$6999

Apple M4 Ultra (192GB)

192 GB VRAM • 1092 GB/s

APPLE

$7499

AMD Radeon Instinct MI300A

192 GB VRAM • 10300 GB/s

AMD

$12000

AMD Radeon Instinct MI300X

192 GB VRAM • 10300 GB/s

AMD

$15000

AMD Radeon Instinct MI308X

192 GB VRAM • 10300 GB/s

AMD

$12000

Apple M5 Ultra (192GB)

192 GB VRAM • 1228 GB/s

APPLE

AMD Radeon Instinct MI325X

288 GB VRAM • 10300 GB/s

AMD

$20000

AMD Radeon Instinct MI350X

288 GB VRAM • 8190 GB/s

AMD

$25000

AMD Radeon Instinct MI355X

288 GB VRAM • 8190 GB/s

AMD

$30000

Apple M4 Ultra (384GB)

384 GB VRAM • 1092 GB/s

APPLE

$9999

Apple M5 Ultra (384GB)

384 GB VRAM • 1228 GB/s

APPLE

Find the best GPU for DeepSeek V4 Flash

Build Hardware for DeepSeek V4 Flash

QUANT	BPW	VRAM	QUALITY
IQ2_XXS	2.38	47.5 GB	65%
IQ2_M	2.93	58.4 GB	75%
Q2_K	3.16	62.9 GB	78%
IQ3_XXS	3.25	64.7 GB	82%
IQ3_XS	3.5	69.6 GB	84%
Q3_K_S	3.64	72.4 GB	85%
IQ3_M	3.76	74.8 GB	86%
Q3_K_M	4	79.5 GB	88%
Q3_K_L	4.3	85.5 GB	90%
IQ4_XS	4.46	88.6 GB	92%
Q4_K_S	4.67	92.8 GB	93%
Q4_K_M	4.89	97.1 GB	94%
Q5_K_S	5.57	110.5 GB	96%
Q5_K_M	5.7	113.1 GB	96%
Q6_K	6.56	130.1 GB	97%
Q8_0	8.5	168.4 GB	100%
FP16	16	316.6 GB	100%

DeepSeek V4 Flash

Quantization Options

Community Ratings

Benchmarks (14)

Run this model

Auto-setup with fitmyllm CLI

GPUs that can run this model

DeepSeek V4 Flash — 158.07B MoE.