granite-4.0-h-tiny 6.9B — 6.9B Parameter Mixture of Experts LLM
Model Specifications
- Parameters
- 6.9B (1.5B active)
- Architecture
- Mixture of Experts
- Context Length
- 128K tokens
- Capabilities
- chat
- Release Date
- 2025-10-02
- Provider
- IBM
- Family
- granite
VRAM Requirements
| Quantization | BPW | VRAM | Quality |
|---|---|---|---|
| Q4_K_M | 4.89 | 4.7 GB | 94% |
| Q5_K_S | 5.57 | 5.3 GB | 96% |
| Q5_K_M | 5.7 | 5.4 GB | 96% |
| Q6_K | 6.56 | 6.1 GB | 97% |
| Q8_0 | 8.5 | 7.8 GB | 100% |
| FP16 | 16 | 14.3 GB | 100% |
Benchmark Scores
HumanEval83.0
MMLU-PRO27.9
MATH23.8
IFEval81.4
BBH66.3
GPQA32.6
MUSR16.8
MBPP80.0
alpacaeval30.6
How to Run granite-4.0-h-tiny 6.9B
Run granite-4.0-h-tiny 6.9B locally with Ollama (needs 4.7 GB VRAM at Q4_K_M):
ollama run granite:6bCompatible GPUs (30)
GPUs that can run granite-4.0-h-tiny 6.9B at Q4_K_M quantization:
NVIDIA Tesla K20c(5GB, 208 GB/s)NVIDIA Tesla K20m(5GB, 208 GB/s)NVIDIA Tesla K20s(5GB, 208 GB/s)NVIDIA GeForce GTX 1060 5 GB(5GB, 160 GB/s)NVIDIA P102-100(5GB, 440 GB/s)NVIDIA Quadro P2000(5GB, 140.2 GB/s)NVIDIA Quadro P2200(5GB, 200.2 GB/s)NVIDIA RTX 3050 6GB(6GB, 168 GB/s)Intel Arc A380(6GB, 186 GB/s)NVIDIA RTX 2060 6GB(6GB, 336 GB/s)NVIDIA GTX 1660 SUPER(6GB, 336 GB/s)NVIDIA GTX 1660 Ti(6GB, 288 GB/s)NVIDIA GTX 1060 6GB(6GB, 192 GB/s)NVIDIA Tesla C2070(6GB, 143 GB/s)NVIDIA Tesla C2075(6GB, 150 GB/s)