Cogito 70B — 70B Parameter Dense LLM
Model Specifications
- Parameters
- 70B
- Architecture
- Dense Transformer
- Context Length
- 125K tokens
- Capabilities
- chat, reasoning, coding, tool_use
- Release Date
- 2025-04-01
- Family
- cogito
VRAM Requirements
| Quantization | BPW | VRAM | Quality |
|---|---|---|---|
| IQ2_M | 2.93 | 26.1 GB | 75% |
| Q2_K | 3.16 | 28.1 GB | 78% |
| IQ3_XXS | 3.25 | 28.9 GB | 82% |
| IQ3_XS | 3.5 | 31.1 GB | 84% |
| Q3_K_S | 3.64 | 32.3 GB | 85% |
| IQ3_M | 3.76 | 33.4 GB | 86% |
| Q3_K_M | 4 | 35.5 GB | 88% |
| Q3_K_L | 4.3 | 38.1 GB | 90% |
| IQ4_XS | 4.46 | 39.5 GB | 92% |
| Q4_K_S | 4.67 | 41.4 GB | 93% |
| Q4_K_M | 4.89 | 43.3 GB | 94% |
| Q5_K_S | 5.57 | 49.2 GB | 96% |
| Q5_K_M | 5.7 | 50.4 GB | 96% |
| Q6_K | 6.56 | 57.9 GB | 97% |
| Q8_0 | 8.5 | 74.9 GB | 100% |
| FP16 | 16 | 140.5 GB | 100% |
How to Run Cogito 70B
Run Cogito 70B locally with Ollama (needs 43.3 GB VRAM at Q4_K_M):
ollama run cogito:70bCompatible GPUs (30)
GPUs that can run Cogito 70B at Q4_K_M quantization:
Apple M3 Max (48GB)(48GB, 400 GB/s)Apple M4 Pro (48GB)(48GB, 273 GB/s)Apple M4 Max (48GB)(48GB, 546 GB/s)NVIDIA L40S 48GB(48GB, 864 GB/s)NVIDIA L40 48GB(48GB, 864 GB/s)NVIDIA RTX 6000 Ada 48GB(48GB, 960 GB/s)NVIDIA A40 48GB(48GB, 696 GB/s)NVIDIA RTX A6000 48GB(48GB, 768 GB/s)NVIDIA Quadro RTX 8000(48GB, 672 GB/s)NVIDIA Quadro RTX 8000 Passive(48GB, 624 GB/s)NVIDIA A40 PCIe(48GB, 696 GB/s)NVIDIA RTX 6000 Ada Generation(48GB, 960 GB/s)NVIDIA L20(48GB, 864 GB/s)AMD Radeon PRO W7800 48 GB(48GB, 864 GB/s)AMD Radeon PRO W7900(48GB, 864 GB/s)