Alibaba/Dense

AlibabaQwen 3.5 4B

Qwen 3.5 4B — multimodal agent model. Beats models 2-3x its size on reasoning and coding.

chatcodingreasoningmultilingualvisionmath
4B
Parameters
256K
Context length
6
Benchmarks
6
Quantizations
2.0M
HF downloads
Architecture
Dense
Released
2026-03-01
Layers
32
KV Heads
4
Head Dim
256
Family
qwen

Quantization Options

QuantBitsVRAMQuality
Q4_K_M4.892.9 GBgood
Q5_K_S5.573.3 GBgood
Q5_K_M5.73.3 GBgood
Q6_K6.563.8 GBexcellent
Q8_08.54.7 GBlossless
FP16168.5 GBlossless

Select your GPU above to see speed estimates and compatibility for each quantization.

Benchmarks (6)

IFEval89.8
MMBench89.4
MMLU-PRO79.1
MMMU77.6
GPQA Diamond76.2
LiveCodeBench55.8

Run this model

Easiest way to get starteddocs →
curl -fsSL https://ollama.com/install.sh | sh
$ollama run qwen3.5:4b:q4_k_m

Downloads and runs automatically. Add --verbose for speed stats.

Setup guide

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Find the best GPU for Qwen 3.5 4B

Build Hardware for Qwen 3.5 4B

Qwen 3.5 4B4B Parameter Dense LLM

Model Specifications

Parameters
4B
Architecture
Dense Transformer
Context Length
256K tokens
Capabilities
chat, coding, reasoning, multilingual, vision, math
Release Date
2026-03-01
Provider
Alibaba
Family
qwen

VRAM Requirements

QuantizationBPWVRAMQuality
Q4_K_M4.892.9 GB94%
Q5_K_S5.573.3 GB96%
Q5_K_M5.73.3 GB96%
Q6_K6.563.8 GB97%
Q8_08.54.7 GB100%
FP16168.5 GB100%

Benchmark Scores

MMLU-PRO79.1
IFEval89.8
MMMU77.6
MMBench89.4
GPQA Diamond76.2
LiveCodeBench55.8

How to Run Qwen 3.5 4B

Run Qwen 3.5 4B locally with Ollama (needs 2.9 GB VRAM at Q4_K_M):

ollama run qwen3.5:4b

Compatible GPUs (30)

GPUs that can run Qwen 3.5 4B at Q4_K_M quantization: