Alibaba/Mixture of Experts

Qwen 3.6 35B A3B

Name: Qwen 3.6 35B A3B
Author: Alibaba

Qwen 3.6 35B A3B — hybrid linear/full attention MoE (DeltaNet + full attention), multimodal (text+image+video), 256 experts (8+1 active). Prioritizes agentic coding and thinking preservation over Qwen 3.5.

chatcodingreasoningmultilingualvisionmathtool_use

35B

Parameters (3B active)

256K

Context length

Benchmarks

Quantizations

100K

HF downloads

Architecture

MoE

Released

2026-04-15

Layers

KV Heads

Head Dim

256

Family

qwen

Quantization Options

Quant	Bits	VRAM	Quality
IQ3_XXS	3.25	14.7 GB	low
IQ3_XS	3.5	15.8 GB	low
Q3_K_S	3.64	16.4 GB	low
IQ3_M	3.72	16.8 GB	low
Q3_K_M	4	18.0 GB	low
Q3_K_L	4.25	19.1 GB	low
IQ4_XS	4.37	19.6 GB	moderate
Q4_K_S	4.5	20.2 GB	moderate
Q4_K_M	4.89	21.9 GB	good
Q5_K_S	5.57	24.9 GB	good
Q5_K_M	5.7	25.4 GB	good
Q6_K	6.56	29.2 GB	excellent
Q8_0	8.5	37.7 GB	lossless
FP16	16	70.5 GB	lossless

Select your GPU above to see speed estimates and compatibility for each quantization.

▸ READY TO RUN THIS?RENT BY THE HOUR

RENT A GPU AND RUN QWEN 3.6 35B A3B NOW

Rent on RunPod →Or Vast.ai →

Spin up an A100 / H100 / 4090 in ~60s. Pay by the second. Cancel anytime.

Community Ratings

Loading ratings...

Benchmarks (21)

MMBench92.8

AIME92.7

GPQA Diamond86.0

MMLU-PRO85.2

τ²-Bench85.1

MMMU81.7

LiveCodeBench80.4

IFEval78.9

SWE-bench73.4

MATH59.7

BBH58.3

AA Long Context56.7

IFBench36.2

BigCodeBench32.3

AA Intelligence31.5

Terminal-Bench25.8

HLE21.4

MUSR19.1

AA Coding17.6

GPQA15.2

SciCode1.3

Run this model

▸Easiest way to get started·Beginners

DOCS ↗

curl -fsSL https://ollama.com/install.sh | sh

$ollama run qwen3.6:35b-a3b:q4_k_m

Downloads and runs automatically. Add --verbose for speed stats.

▸ SETUP GUIDE

Auto-setup with fitmyllm CLI

Detects your GPU, recommends the best model, downloads it, and starts chatting — zero config. Benchmarks your speed and contributes anonymous data to improve predictions.

pip install fitmyllmthen run fitmyllmLearn more

Auto-detect GPULive tok/s in chatSpeed benchmarks9 inference engines

HuggingFace Ollama Library GGUF Downloads Build Hardware

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

NVIDIA RTX 4090

24 GB VRAM • 1008 GB/s

NVIDIA

$1599