Google/Dense

MedGemma 1.5 4B

chatvision
4B
Parameters
128K
Context length
6
Benchmarks
4
Quantizations
142K
HF downloads
Architecture
Dense
Released
2026-01-13
Layers
34
KV Heads
4
Head Dim
256
Family
gemma

MedGemma 1.5 model card

Note: This card describes MedGemma 1.5, which is only available as a 4B multimodal instruction-tuned variant. For information on MedGemma 1 variants, refer to the MedGemma 1 model card.

Model documentation: MedGemma

Resources:

MedGemma's training may make it more sensitive to the specific prompt used than Gemma 3.

When adapting MedGemma developer should consider the following:

Author: Google

Model information

This section describes the specifications and recommended use of the MedGemma model.

Description

MedGemma is a collection of Gemma 3 variants that are trained for performance on medical text and image comprehension. Developers can use MedGemma to accelerate building healthcare-based AI applications.

MedGemma 1.5 4B is an updated version of the MedGemma 1 4B model.

MedGemma 1.5 4B expands support for several new medical imaging and data processing applications, including:

  • High-dimensional medical imaging: Interpretation of three-dimensional volume representations of Computed Tomography (CT) and Magnetic Resonance Imaging (MRI).
  • Whole-slide histopathology imaging (WSI): Simultaneous interpretation of multiple patches from a whole slide histopathology image as input.
  • Longitudinal medical imaging: Interpretation of chest X-rays in the context of prior images (e.g., comparing current versus historical scans).
  • Anatomical localization: Bounding box–based localization of anatomical features and findings in chest X-rays.
  • Medical document understanding: Extraction of structured data, such as values and units, from unstructured medical lab reports.
  • Electronic Health Record (EHR) understanding: Interpretation of text-based EHR data.

In addition to these new features, MedGemma 1.5 4B delivers improved accuracy on medical text reasoning and modest improvement on standard 2D image interpretation compared to MedGemma 1 4B.

MedGemma utilizes a SigLIP image encoder that has been specifically pre-trained on a variety of de-identified medical data, including chest X-rays, dermatology images, ophthalmology images, and histopathology slides. The LLM component is trained on a diverse set of medical data, including medical text, medical question-answer pairs, FHIR-based electronic health record data, 2D and 3D radiology images, histopathology images, ophthalmology images, dermatology images, and lab reports for document understanding.

MedGemma 1.5 4B has been evaluated on a range of clinically relevant benchmarks to illustrate its baseline performance. These evaluations are based on both open benchmark datasets and internally curated datasets. Developers are expected to fine-tune MedGemma for improved performance on their use case. Consult the Intended use section for more details.

MedGemma is optimized for medical applications that involve a text generation component. For medical image-based applications that do not involve text generation, such as data-efficient classification, zero-shot classification, or content-based or semantic image retrieval, the MedSigLIP image encoder is recommended. MedSigLIP is based on the same image encoder that powers MedGemma 1 and MedGemma 1.5.

How to use

The following are some example code snippets to help you quickly get started running the model locally on GPU.

Note: If you need to use the model at scale, we recommend creating a production version using Model Garden. Model Garden provides various deployment options and tutorial notebooks, including specialized server-side image processing options for efficiently handling large medical images: Whole Slide Digital Pathology (WSI) or volumetric scans (CT/MRI) stored in Cloud DICOM Store or Google Cloud Storage (GCS).

First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.

$ pip install -U transformers

Next, use either the pipeline wrapper or the transformer API directly to send a chest X-ray image and a question to the model.

Note that CT, MRI and whole-slide histopathology images require some pre-processing; see the CT and WSI notebook for examples.

Run model with the pipeline API

from transformers import pipeline
from PIL import Image
import requests
import torch

pipe = pipeline(
    "image-text-to-text",
    model="google/medgemma-1.5-4b-it",
    torch_dtype=torch.bfloat16,
    device="cuda",
)

# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Describe this X-ray"}
        ]
    }
]

output = pipe(text=messages, max_new_tokens=2000)
print(output[0]["generated_text"][-1]["content"])

Run the model directly

# Make sure to install the accelerate library first via `pip install accelerate`
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch

model_id = "google/medgemma-1.5-4b-it"

model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)

# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Describe this X-ray"}
        ]
    }
]

...

Quantizations & VRAM

Q4_K_M4.5 bpw
2.9 GB
VRAM required
94%
Quality
Q6_K6.5 bpw
4.0 GB
VRAM required
97%
Quality
Q8_08 bpw
4.8 GB
VRAM required
100%
Quality
FP1616 bpw
9.1 GB
VRAM required
100%
Quality

Benchmarks (6)

IFEval19.7
MMLU-PRO3.5
BBH3.5
MATH2.3
MUSR2.1
GPQA1.7

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Find the best GPU for MedGemma 1.5 4B

Build Hardware for MedGemma 1.5 4B