IBM/Mixture of Experts

granite-4.0-h-tiny 6.9B

Name: granite-4.0-h-tiny 6.9B
Author: IBM

chat

6.9B

Parameters (1.5B active)

128K

Context length

Benchmarks

Quantizations

28K

HF downloads

Architecture

MoE

Released

2025-10-02

Layers

KV Heads

Head Dim

128

Family

granite

Quantization Options

Quant	Bits	VRAM	Quality
Q4_K_M	4.89	4.7 GB	good
Q5_K_S	5.57	5.3 GB	good
Q5_K_M	5.7	5.4 GB	good
Q6_K	6.56	6.1 GB	excellent
Q8_0	8.5	7.8 GB	lossless
FP16	16	14.3 GB	lossless

Select your GPU above to see speed estimates and compatibility for each quantization.

Benchmarks (9)

HumanEval83.0

IFEval81.4

MBPP80.0

BBH66.3

GPQA32.6

alpacaeval30.6

MMLU-PRO27.9

MATH23.8

MUSR16.8

Run this model

Easiest way to get starteddocs →

curl -fsSL https://ollama.com/install.sh | sh

$ollama run granite:6b-q4_k_m

Downloads and runs automatically. Add --verbose for speed stats.

Setup guide

HuggingFace Ollama Library GGUF Downloads Build Hardware

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

NVIDIA GeForce GTX 1060 5 GB

5 GB VRAM • 140.2 GB/s

NVIDIA

NVIDIA Quadro P2200

5 GB VRAM • 200.2 GB/s

NVIDIA GTX 1660 SUPER

NVIDIA GeForce GTX 1060 6 GB

6 GB VRAM • 192 GB/s

NVIDIA

NVIDIA GeForce GTX 1060 6 GB 9Gbps

6 GB VRAM • 217 GB/s

NVIDIA

NVIDIA GeForce GTX 1060 6 GB GDDR5X

6 GB VRAM • 192 GB/s

NVIDIA

NVIDIA GeForce GTX 1060 6 GB GP104

6 GB VRAM • 192 GB/s

NVIDIA

NVIDIA GeForce GTX 1060 6 GB Rev. 2

6 GB VRAM • 192 GB/s

NVIDIA

NVIDIA GeForce GTX 1660

6 GB VRAM • 192 GB/s

NVIDIA

Find the best GPU for granite-4.0-h-tiny 6.9B

Build Hardware for granite-4.0-h-tiny 6.9B

granite-4.0-h-tiny 6.9B

Quantization Options

Benchmarks (9)

Run this model

GPUs that can run this model

granite-4.0-h-tiny 6.9B — 6.9B Parameter Mixture of Experts LLM

Model Specifications

VRAM Requirements

Benchmark Scores

How to Run granite-4.0-h-tiny 6.9B

Compatible GPUs (30)