▸ SPEC SHEET
GPT-OSS 20B — 21B MoE.
▸ SPECIFICATIONS
- PARAMETERS
- 21B (3.6B active)
- ARCHITECTURE
- Mixture of Experts
- CONTEXT LENGTH
- 128K tokens
- CAPABILITIES
- chat, coding, reasoning, tool_use
- RELEASE DATE
- 2026-02-14
- PROVIDER
- OpenAI
- FAMILY
- gpt-oss
▸ VRAM REQUIREMENTS
| QUANT | BPW | VRAM | QUALITY |
|---|---|---|---|
| Q3_K_M | 4 | 11.0 GB | 88% |
| Q3_K_L | 4.3 | 11.8 GB | 90% |
| IQ4_XS | 4.46 | 12.2 GB | 92% |
| Q4_K_S | 4.67 | 12.7 GB | 93% |
| Q4_K_M | 4.89 | 13.3 GB | 94% |
| Q5_K_S | 5.57 | 15.1 GB | 96% |
| Q5_K_M | 5.7 | 15.5 GB | 96% |
| Q6_K | 6.56 | 17.7 GB | 97% |
| Q8_0 | 8.5 | 22.8 GB | 100% |
| FP16 | 16 | 42.5 GB | 100% |
§ 01BENCHMARK SCORES
HumanEval81.7
MMLU-PRO85.3
IFEval69.5
LiveCodeBench65.2
AIME98.7
MATH-50089.3
GPQA Diamond71.5
HLE10.9
AA Intelligence20.8
AA Coding14.4
AA Math62.3
aa_ifbench57.8
aa_terminal_bench4.5
aa_tau250.3
aa_scicode34.0
aa_lcr31.0
§ 02RUN COMMAND
Run GPT-OSS 20B locally with Ollama — needs 13.3 GB VRAM at Q4_K_M:
$
ollama run gpt-oss:20b§ 03COMPATIBLE GPUs
30 @ Q4_K_MNVIDIA RTX 5080
16 GB · 960 GB/s
NVIDIA RTX 5070 Ti
16 GB · 896 GB/s
NVIDIA RTX 4080 SUPER
16 GB · 736 GB/s
NVIDIA RTX 4080
16 GB · 717 GB/s
NVIDIA RTX 4070 Ti SUPER
16 GB · 672 GB/s
NVIDIA RTX 4060 Ti 16GB
16 GB · 288 GB/s
AMD RX 7900 GRE
16 GB · 576 GB/s
AMD RX 7800 XT
16 GB · 624 GB/s
AMD RX 7600 XT
16 GB · 288 GB/s
AMD RX 6950 XT
16 GB · 576 GB/s
AMD RX 6900 XT
16 GB · 512 GB/s
AMD RX 6800 XT
16 GB · 512 GB/s
AMD RX 6800
16 GB · 512 GB/s
Intel Arc A770 16GB
16 GB · 560 GB/s
Apple M1 Pro (16GB)
16 GB · 200 GB/s