DeepSeek/Mixture of Experts

DeepSeek V4 Pro

Name: DeepSeek V4 Pro
Author: DeepSeek

We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a...

chatcodingreasoningmultilingualmathagentictool_use

1600B

Parameters (49B active)

1024K

Context length

Benchmarks

Quantizations

174K

HF downloads

Architecture

MoE

Released

2026-04-24

Layers

KV Heads

Head Dim

512

Family

deepseek

Quantization Options

Context length:

Quant	Bits	VRAM @ 16K	Quality
IQ2_XXS	2.38	477.9 GB 476.5 + 1.4 KV	low
IQ2_M	2.93	587.9 GB 586.5 + 1.4 KV	low
Q2_K	3.16	633.9 GB 632.5 + 1.4 KV	low
IQ3_XXS	3.25	651.9 GB 650.5 + 1.4 KV	low
IQ3_XS	3.5	701.9 GB 700.5 + 1.4 KV	low
Q3_K_S	3.64	729.9 GB 728.5 + 1.4 KV	low
IQ3_M	3.76	753.9 GB 752.5 + 1.4 KV	low
Q3_K_M	4	801.9 GB 800.5 + 1.4 KV	low
Q3_K_L	4.3	861.9 GB 860.5 + 1.4 KV	moderate
IQ4_XS	4.46	893.9 GB 892.5 + 1.4 KV	moderate
Q4_K_S	4.67	935.9 GB 934.5 + 1.4 KV	moderate
Q4_K_M	4.89	979.9 GB 978.5 + 1.4 KV	good
Q5_K_S	5.57	1115.9 GB 1114.5 + 1.4 KV	good
Q5_K_M	5.7	1141.9 GB 1140.5 + 1.4 KV	good
Q6_K	6.56	1313.9 GB 1312.5 + 1.4 KV	excellent
Q8_0	8.5	1701.9 GB 1700.5 + 1.4 KV	lossless
FP16	16	3201.9 GB 3200.5 + 1.4 KV	lossless

Select your GPU above to see speed estimates and compatibility for each quantization.

Too big for a single GPU — plan a multi-GPU deployment

Even the lightest quant needs ~478 GB. Size GPUs, replicas, TCO and scaling for a production setup. Open in Enterprise →

▸ READY TO RUN THIS?RENT BY THE HOUR

RENT A GPU AND RUN DEEPSEEK V4 PRO NOW

Rent on Vast.ai →Or RunPod →

Spin up an A100 / H100 / 4090 in ~60s. Pay by the second. Cancel anytime.

Community Ratings

Loading ratings...

Benchmarks (14)

τ²-Bench94.2

LiveCodeBench93.5

GPQA Diamond90.1

MMLU-PRO87.5

BBH87.5

SWE-bench80.6

HumanEval76.8

IFBench71.3

Terminal-Bench67.9

AA Long Context65.0

AA Intelligence49.8

SciCode46.4

AA Coding43.2

HLE37.7

Run this model

▸Easiest way to get started·Beginners

DOCS ↗

curl -fsSL https://ollama.com/install.sh | sh

$ollama run deepseek:1600b-q4_K_M

Tag may need adjustment — check ollama.com/library/deepseek for available tags.

▸ SETUP GUIDE

Auto-setup with fitmyllm CLI

Detects your GPU, recommends the best model, downloads it, and starts chatting — zero config. Benchmarks your speed and contributes anonymous data to improve predictions.

pip install fitmyllmthen run fitmyllmLearn more

Auto-detect GPULive tok/s in chatSpeed benchmarks9 inference engines

HuggingFace GGUF Downloads Build Hardware

Find the best GPU for DeepSeek V4 Pro

Build Hardware for DeepSeek V4 Pro

QUANT	BPW	VRAM	QUALITY
IQ2_XXS	2.38	476.5 GB	65%
IQ2_M	2.93	586.5 GB	75%
Q2_K	3.16	632.5 GB	78%
IQ3_XXS	3.25	650.5 GB	82%
IQ3_XS	3.5	700.5 GB	84%
Q3_K_S	3.64	728.5 GB	85%
IQ3_M	3.76	752.5 GB	86%
Q3_K_M	4	800.5 GB	88%
Q3_K_L	4.3	860.5 GB	90%
IQ4_XS	4.46	892.5 GB	92%
Q4_K_S	4.67	934.5 GB	93%
Q4_K_M	4.89	978.5 GB	94%
Q5_K_S	5.57	1114.5 GB	96%
Q5_K_M	5.7	1140.5 GB	96%
Q6_K	6.56	1312.5 GB	97%
Q8_0	8.5	1700.5 GB	100%
FP16	16	3200.5 GB	100%