vikhyatk/Dense
Moondream2 1.9B
visionchatThinking
1.9B
Parameters
2K
Context length
8
Benchmarks
4
Quantizations
890K
HF downloads
Architecture
Dense
Released
2024-03-06
Layers
24
KV Heads
4
Head Dim
64
Family
other
Model Card
View on HuggingFace⚠️ This repository contains the latest version of Moondream 2, our previous generation model. The latest version of Moondream is Moondream 3 (Preview).
Moondream is a small vision language model designed to run efficiently everywhere.
This repository contains the latest (2025-06-21) release of Moondream 2, as well as historical releases. The model is updated frequently, so we recommend specifying a revision as shown below if you're using it in a production application.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
model = AutoModelForCausalLM.from_pretrained(
"vikhyatk/moondream2",
revision="2025-06-21",
trust_remote_code=True,
device_map={"": "cuda"} # ...or 'mps', on Apple Silicon
)
# Captioning
print("Short caption:")
print(model.caption(image, length="short")["caption"])
print("\nNormal caption:")
for t in model.caption(image, length="normal", stream=True)["caption"]:
# Streaming generation example, supported for caption() and detect()
print(t, end="", flush=True)
print(model.caption(image, length="normal"))
# Visual Querying
print("\nVisual query: 'How many people are in the image?'")
print(model.query(image, "How many people are in the image?")["answer"])
# Object Detection
print("\nObject detection: 'face'")
objects = model.detect(image, "face")["objects"]
print(f"Found {len(objects)} face(s)")
# Pointing
print("\nPointing: 'person'")
points = model.point(image, "person")["points"]
print(f"Found {len(points)} person(s)")
Changelog
2025-06-21 (full release notes)
- Grounded Reasoning
Introduces a new step-by-step reasoning mode that explicitly grounds reasoning in spatial positions within the image before answering, leading to more precise visual interpretation (e.g., chart median calculations, accurate counting). Enable with
reasoning=Truein thequeryskill to trade off speed vs. accuracy. - Sharper Object Detection Uses reinforcement learning on higher-quality bounding-box annotations to reduce object clumping and improve fine-grained detections (e.g., distinguishing “blue bottle” vs. “bottle”).
- Faster Text Generation Yields 20–40 % faster response generation via a new “superword” tokenizer and lightweight tokenizer transfer hypernetwork, which reduces the number of tokens emitted without loss in accuracy and eases future multilingual extensions.
- Improved UI Understanding Boosts ScreenSpot (UI element localization) performance from an F1@0.5 of 60.3 to 80.4, making Moondream more effective for UI-focused applications.
- Reinforcement Learning Enhancements RL fine-tuning applied across 55 vision-language tasks to reinforce grounded reasoning and detection capabilities, with a roadmap to expand to ~120 tasks in the next update.
2025-04-15 (full release notes)
- Improved chart understanding (ChartQA up from 74.8 to 77.5, 82.2 with PoT)
- Added temperature and nucleus sampling to reduce repetitive outputs
- Better OCR for documents and tables (prompt with “Transcribe the text” or “Transcribe the text in natural reading order”)
- Object detection supports document layout detection (figure, formula, text, etc)
- UI understanding (ScreenSpot F1@0.5 up from 53.3 to 60.3)
- Improved text understanding (DocVQA up from 76.5 to 79.3, TextVQA up from 74.6 to 76.3)
2025-03-27 (full release notes)
- Added support for long-form captioning
- Open vocabulary image tagging
- Improved counting accuracy (e.g. CountBenchQA increased from 80 to 86.4)
- Improved text understanding (e.g. OCRBench increased from 58.3 to 61.2)
- Improved object detection, especially for small objects (e.g. COCO up from 30.5 to 51.2)
- Fixed token streaming bug affecting multi-byte unicode characters
- gpt-fast style
compile()now supported in HF Transformers implementation
Quantizations & VRAM
Q4_K_M4.5 bpw
1.4 GB
VRAM required
94%
Quality
Q6_K6.5 bpw
1.8 GB
VRAM required
97%
Quality
Q8_08 bpw
2.1 GB
VRAM required
100%
Quality
FP1616 bpw
3.9 GB
VRAM required
100%
Quality
Benchmarks (8)
MMBench58.2
IFEval35.0
MMMU32.0
BBH28.0
MMLU-PRO25.0
GPQA20.0
MUSR8.0
MATH3.0
Run with Ollama
$
ollama run moondream:1.8bGPUs that can run this model
At Q4_K_M quantization. Sorted by minimum VRAM.
AMD FireGL V8650
2 GB VRAM • 111 GB/s
AMD
NVIDIA GeForce GTX 285 X2
2 GB VRAM • 148 GB/s
NVIDIA
NVIDIA GeForce GTX 480M
2 GB VRAM • 77 GB/s
NVIDIA
NVIDIA Quadro 5000M
2 GB VRAM • 77 GB/s
NVIDIA
AMD Radeon HD 6950
2 GB VRAM • 160 GB/s
AMD
AMD Radeon HD 6970
2 GB VRAM • 176 GB/s
AMD
Intel Aubrey Isle
2 GB VRAM • 154 GB/s
INTEL
AMD Radeon HD 5870 Eyefinity 6
2 GB VRAM • 154 GB/s
AMD
NVIDIA GeForce GTX 485M
2 GB VRAM • 96 GB/s
NVIDIA
NVIDIA GeForce GTX 580M
2 GB VRAM • 96 GB/s
NVIDIA
NVIDIA Quadro 1000M
2 GB VRAM • 29 GB/s
NVIDIA
NVIDIA Quadro 2000M
2 GB VRAM • 29 GB/s
NVIDIA
NVIDIA Quadro 3000M
2 GB VRAM • 80 GB/s
NVIDIA
NVIDIA Quadro 3000M X2
2 GB VRAM • 80 GB/s
NVIDIA
NVIDIA Quadro 4000M
2 GB VRAM • 80 GB/s
NVIDIA
AMD Radeon HD 6550A
2 GB VRAM • 26 GB/s
AMD
AMD Radeon HD 6570
2 GB VRAM • 64 GB/s
AMD
AMD Radeon HD 6850 X2
2 GB VRAM • 134 GB/s
AMD
AMD Radeon HD 6970M Mac Edition
2 GB VRAM • 115 GB/s
AMD
AMD Radeon HD 6970M X2
2 GB VRAM • 115 GB/s
AMD
AMD Radeon HD 6990
2 GB VRAM • 160 GB/s
AMD
AMD Radeon HD 6990M
2 GB VRAM • 115 GB/s
AMD
NVIDIA GeForce GTX 660
2 GB VRAM • 144 GB/s
NVIDIA
NVIDIA GeForce GTX 660 OEM
2 GB VRAM • 179 GB/s
NVIDIA
NVIDIA GeForce GTX 660 Ti
2 GB VRAM • 144 GB/s
NVIDIA
NVIDIA GeForce GTX 660M
2 GB VRAM • 80 GB/s
NVIDIA
NVIDIA GeForce GTX 670
2 GB VRAM • 192 GB/s
NVIDIA
NVIDIA GeForce GTX 675M
2 GB VRAM • 96 GB/s
NVIDIA
NVIDIA GeForce GTX 675MX
2 GB VRAM • 115 GB/s
NVIDIA
NVIDIA GeForce GTX 680
2 GB VRAM • 192 GB/s
NVIDIA
Find the best GPU for Moondream2 1.9B
Build Hardware for Moondream2 1.9B