Google/Mixture of Experts

GoogleGemma 4 26B A4B

Gemma 4 26B A4B — MoE with 128 experts, 4B active. Matches dense 31B at fraction of compute. 256K context.

chatcodingreasoningmultilingualvision
26B
Parameters (4B active)
256K
Context length
19
Benchmarks
10
Quantizations
300K
HF downloads
Architecture
MoE
Released
2026-04-02
Layers
30
KV Heads
8
Head Dim
256
Family
gemma

Quantization Options

QuantBitsVRAMQuality
Q3_K_M413.5 GBlow
Q3_K_L4.314.5 GBmoderate
IQ4_XS4.4615.0 GBmoderate
Q4_K_S4.6715.7 GBmoderate
Q4_K_M4.8916.4 GBgood
Q5_K_S5.5718.6 GBgood
Q5_K_M5.719.0 GBgood
Q6_K6.5621.8 GBexcellent
Q8_08.528.1 GBlossless
FP161652.5 GBlossless

Select your GPU above to see speed estimates and compatibility for each quantization.

READY TO RUN THIS?RENT BY THE HOUR

RENT A GPU AND RUN GEMMA 4 26B A4B NOW

Spin up an A100 / H100 / 4090 in ~60s. Pay by the second. Cancel anytime.

Community Ratings

Loading ratings...

Benchmarks (19)

Arena Elo1441
AIME88.3
MMLU-PRO82.6
GPQA Diamond82.3
LiveCodeBench77.1
IFEval75.5
IFBench72.4
BBH64.8
AA Long Context55.7
τ²-Bench43.6
BigCodeBench42.8
SciCode40.0
AA Intelligence31.2
MATH27.9
AA Coding22.4
MUSR16.9
GPQA16.0
Terminal-Bench13.6
HLE8.7

Run this model

Easiest way to get started·Beginners
DOCS ↗
curl -fsSL https://ollama.com/install.sh | sh
$ollama run gemma4:26b-q4_K_M

Downloads and runs automatically. Add --verbose for speed stats.

▸ SETUP GUIDE
>_

Auto-setup with fitmyllm CLI

Detects your GPU, recommends the best model, downloads it, and starts chatting — zero config. Benchmarks your speed and contributes anonymous data to improve predictions.

pip install fitmyllmthen run fitmyllmLearn more
Auto-detect GPULive tok/s in chatSpeed benchmarks9 inference engines

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Apple M3 Pro (18GB)
18 GB VRAM • 150 GB/s
APPLE
$1599
AMD RX 7900 XT
20 GB VRAM • 800 GB/s
AMD
$849
NVIDIA A10M
20 GB VRAM • 500 GB/s
NVIDIA
NVIDIA RTX A4500
20 GB VRAM • 640 GB/s
NVIDIA
$2000
NVIDIA RTX 4090
24 GB VRAM • 1008 GB/s
NVIDIA
$1599
NVIDIA RTX 3090 Ti
24 GB VRAM • 1008 GB/s
NVIDIA
$999
NVIDIA RTX 3090
24 GB VRAM • 936 GB/s
NVIDIA
$850
AMD RX 7900 XTX
24 GB VRAM • 960 GB/s
AMD
$999
Apple M4 Pro (24GB)
24 GB VRAM • 273 GB/s
APPLE
$1399
NVIDIA L4 24GB
24 GB VRAM • 300 GB/s
NVIDIA
$2500
NVIDIA A10 24GB
24 GB VRAM • 600 GB/s
NVIDIA
$3500
Apple M2 (24GB)
24 GB VRAM • 100 GB/s
APPLE
$999
Apple M3 (24GB)
24 GB VRAM • 100 GB/s
APPLE
$999
Apple M4 (24GB)
24 GB VRAM • 120 GB/s
APPLE
$699
NVIDIA Tesla M40 24 GB
24 GB VRAM • 288 GB/s
NVIDIA
NVIDIA Tesla P10
24 GB VRAM • 694 GB/s
NVIDIA
NVIDIA Tesla P40
24 GB VRAM • 347 GB/s
NVIDIA
NVIDIA Quadro RTX 6000
24 GB VRAM • 672 GB/s
NVIDIA
$4000
NVIDIA GeForce RTX 3090
24 GB VRAM • 936 GB/s
NVIDIA
$1499
NVIDIA A10 PCIe
24 GB VRAM • 600 GB/s
NVIDIA
NVIDIA A10G
24 GB VRAM • 600 GB/s
NVIDIA
NVIDIA RTX A5000
24 GB VRAM • 768 GB/s
NVIDIA
$2500
NVIDIA GeForce RTX 4090
24 GB VRAM • 1010 GB/s
NVIDIA
$1599

Find the best GPU for Gemma 4 26B A4B

Build Hardware for Gemma 4 26B A4B
▸ SPEC SHEET

Gemma 4 26B A4B26B MoE.

▸ SPECIFICATIONS
PARAMETERS
26B (4B active)
ARCHITECTURE
Mixture of Experts
CONTEXT LENGTH
256K tokens
CAPABILITIES
chat, coding, reasoning, multilingual, vision
RELEASE DATE
2026-04-02
PROVIDER
Google
FAMILY
gemma
▸ VRAM REQUIREMENTS
QUANTBPWVRAMQUALITY
Q3_K_M413.5 GB88%
Q3_K_L4.314.5 GB90%
IQ4_XS4.4615.0 GB92%
Q4_K_S4.6715.7 GB93%
Q4_K_M4.8916.4 GB94%
Q5_K_S5.5718.6 GB96%
Q5_K_M5.719.0 GB96%
Q6_K6.5621.8 GB97%
Q8_08.528.1 GB100%
FP161652.5 GB100%
§ 01BENCHMARK SCORES
MMLU-PRO82.6
MATH27.9
IFEval75.5
BBH64.8
GPQA16.0
MUSR16.9
BigCodeBench42.8
Arena Elo1441.0
GPQA Diamond82.3
HLE8.7
AA Intelligence31.2
AA Coding22.4
LiveCodeBench77.1
AIME88.3
aa_ifbench72.4
aa_terminal_bench13.6
aa_tau243.6
aa_scicode40.0
aa_lcr55.7
§ 02RUN COMMAND

Run Gemma 4 26B A4B locally with Ollama — needs 16.4 GB VRAM at Q4_K_M:

$ollama run gemma4:26b