Google/Dense

MedGemma 1.5 4B

chatvision

Parameters

128K

Context length

Benchmarks

Quantizations

142K

HF downloads

Architecture

Dense

Released

2026-01-13

Layers

KV Heads

Head Dim

256

Family

gemma

Model Card

View on HuggingFace

MedGemma 1.5 model card

Note: This card describes MedGemma 1.5, which is only available as a 4B multimodal instruction-tuned variant. For information on MedGemma 1 variants, refer to the MedGemma 1 model card.

Model documentation: MedGemma

Resources:

Model on Google Cloud Model Garden: MedGemma
Models on Hugging Face: Collection
Concept applications built using MedGemma: Collection
GitHub repository
Tutorial notebooks
License: The use of MedGemma is governed by the Health AI Developer Foundations terms of use. MedGemma has not been evaluated or optimized for multi-turn applications.

MedGemma's training may make it more sensitive to the specific prompt used than Gemma 3.

When adapting MedGemma developer should consider the following:

License: The use of MedGemma is governed by the Health AI Developer Foundations terms of use.
Support channels

Author: Google

Model information

This section describes the specifications and recommended use of the MedGemma model.

Description

MedGemma is a collection of Gemma 3 variants that are trained for performance on medical text and image comprehension. Developers can use MedGemma to accelerate building healthcare-based AI applications.

MedGemma 1.5 4B is an updated version of the MedGemma 1 4B model.

MedGemma 1.5 4B expands support for several new medical imaging and data processing applications, including:

High-dimensional medical imaging: Interpretation of three-dimensional volume representations of Computed Tomography (CT) and Magnetic Resonance Imaging (MRI).
Whole-slide histopathology imaging (WSI): Simultaneous interpretation of multiple patches from a whole slide histopathology image as input.
Longitudinal medical imaging: Interpretation of chest X-rays in the context of prior images (e.g., comparing current versus historical scans).
Anatomical localization: Bounding box–based localization of anatomical features and findings in chest X-rays.
Medical document understanding: Extraction of structured data, such as values and units, from unstructured medical lab reports.
Electronic Health Record (EHR) understanding: Interpretation of text-based EHR data.

In addition to these new features, MedGemma 1.5 4B delivers improved accuracy on medical text reasoning and modest improvement on standard 2D image interpretation compared to MedGemma 1 4B.

MedGemma utilizes a SigLIP image encoder that has been specifically pre-trained on a variety of de-identified medical data, including chest X-rays, dermatology images, ophthalmology images, and histopathology slides. The LLM component is trained on a diverse set of medical data, including medical text, medical question-answer pairs, FHIR-based electronic health record data, 2D and 3D radiology images, histopathology images, ophthalmology images, dermatology images, and lab reports for document understanding.

MedGemma 1.5 4B has been evaluated on a range of clinically relevant benchmarks to illustrate its baseline performance. These evaluations are based on both open benchmark datasets and internally curated datasets. Developers are expected to fine-tune MedGemma for improved performance on their use case. Consult the Intended use section for more details.

MedGemma is optimized for medical applications that involve a text generation component. For medical image-based applications that do not involve text generation, such as data-efficient classification, zero-shot classification, or content-based or semantic image retrieval, the MedSigLIP image encoder is recommended. MedSigLIP is based on the same image encoder that powers MedGemma 1 and MedGemma 1.5.

How to use

The following are some example code snippets to help you quickly get started running the model locally on GPU.

Note: If you need to use the model at scale, we recommend creating a production version using Model Garden. Model Garden provides various deployment options and tutorial notebooks, including specialized server-side image processing options for efficiently handling large medical images: Whole Slide Digital Pathology (WSI) or volumetric scans (CT/MRI) stored in Cloud DICOM Store or Google Cloud Storage (GCS).

First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.

$ pip install -U transformers

Next, use either the pipeline wrapper or the transformer API directly to send a chest X-ray image and a question to the model.

Note that CT, MRI and whole-slide histopathology images require some pre-processing; see the CT and WSI notebook for examples.

Run model with the pipeline API

from transformers import pipeline
from PIL import Image
import requests
import torch

pipe = pipeline(
    "image-text-to-text",
    model="google/medgemma-1.5-4b-it",
    torch_dtype=torch.bfloat16,
    device="cuda",
)

# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Describe this X-ray"}
        ]
    }
]

output = pipe(text=messages, max_new_tokens=2000)
print(output[0]["generated_text"][-1]["content"])

Run the model directly

# Make sure to install the accelerate library first via `pip install accelerate`
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch

model_id = "google/medgemma-1.5-4b-it"

model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)

# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Describe this X-ray"}
        ]
    }
]

...

Quantizations & VRAM

Q4_K_M4.5 bpw

2.9 GB

VRAM required

94%

Quality

Q6_K6.5 bpw

4.0 GB

VRAM required

97%

Quality

Q8_08 bpw

4.8 GB

VRAM required

100%

Quality

FP1616 bpw

9.1 GB

VRAM required

100%

Quality

Benchmarks (6)

IFEval19.7

MMLU-PRO3.5

BBH3.5

MATH2.3

MUSR2.1

GPQA1.7

HuggingFace GGUF Downloads Build Hardware

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.