MedGemma 1.5 4B
Model Card
View on HuggingFaceMedGemma 1.5 model card
Note: This card describes MedGemma 1.5, which is only available as a 4B multimodal instruction-tuned variant. For information on MedGemma 1 variants, refer to the MedGemma 1 model card.
Model documentation: MedGemma
Resources:
-
Model on Google Cloud Model Garden: MedGemma
-
Models on Hugging Face: Collection
-
Concept applications built using MedGemma: Collection
-
License: The use of MedGemma is governed by the Health AI Developer Foundations terms of use. MedGemma has not been evaluated or optimized for multi-turn applications.
MedGemma's training may make it more sensitive to the specific prompt used than Gemma 3.
When adapting MedGemma developer should consider the following:
-
License: The use of MedGemma is governed by the Health AI Developer Foundations terms of use.
-
Support channels
Author: Google
Model information
This section describes the specifications and recommended use of the MedGemma model.
Description
MedGemma is a collection of Gemma 3 variants that are trained for performance on medical text and image comprehension. Developers can use MedGemma to accelerate building healthcare-based AI applications.
MedGemma 1.5 4B is an updated version of the MedGemma 1 4B model.
MedGemma 1.5 4B expands support for several new medical imaging and data processing applications, including:
- High-dimensional medical imaging: Interpretation of three-dimensional volume representations of Computed Tomography (CT) and Magnetic Resonance Imaging (MRI).
- Whole-slide histopathology imaging (WSI): Simultaneous interpretation of multiple patches from a whole slide histopathology image as input.
- Longitudinal medical imaging: Interpretation of chest X-rays in the context of prior images (e.g., comparing current versus historical scans).
- Anatomical localization: Bounding box–based localization of anatomical features and findings in chest X-rays.
- Medical document understanding: Extraction of structured data, such as values and units, from unstructured medical lab reports.
- Electronic Health Record (EHR) understanding: Interpretation of text-based EHR data.
In addition to these new features, MedGemma 1.5 4B delivers improved accuracy on medical text reasoning and modest improvement on standard 2D image interpretation compared to MedGemma 1 4B.
MedGemma utilizes a SigLIP image encoder that has been specifically pre-trained on a variety of de-identified medical data, including chest X-rays, dermatology images, ophthalmology images, and histopathology slides. The LLM component is trained on a diverse set of medical data, including medical text, medical question-answer pairs, FHIR-based electronic health record data, 2D and 3D radiology images, histopathology images, ophthalmology images, dermatology images, and lab reports for document understanding.
MedGemma 1.5 4B has been evaluated on a range of clinically relevant benchmarks to illustrate its baseline performance. These evaluations are based on both open benchmark datasets and internally curated datasets. Developers are expected to fine-tune MedGemma for improved performance on their use case. Consult the Intended use section for more details.
MedGemma is optimized for medical applications that involve a text generation component. For medical image-based applications that do not involve text generation, such as data-efficient classification, zero-shot classification, or content-based or semantic image retrieval, the MedSigLIP image encoder is recommended. MedSigLIP is based on the same image encoder that powers MedGemma 1 and MedGemma 1.5.
How to use
The following are some example code snippets to help you quickly get started running the model locally on GPU.
Note: If you need to use the model at scale, we recommend creating a production version using Model Garden. Model Garden provides various deployment options and tutorial notebooks, including specialized server-side image processing options for efficiently handling large medical images: Whole Slide Digital Pathology (WSI) or volumetric scans (CT/MRI) stored in Cloud DICOM Store or Google Cloud Storage (GCS).
First, install the Transformers library. Gemma 3 is supported starting from transformers 4.50.0.
$ pip install -U transformers
Next, use either the pipeline wrapper or the transformer API directly to send a chest X-ray image and a question to the model.
Note that CT, MRI and whole-slide histopathology images require some pre-processing; see the CT and WSI notebook for examples.
Run model with the pipeline API
from transformers import pipeline
from PIL import Image
import requests
import torch
pipe = pipeline(
"image-text-to-text",
model="google/medgemma-1.5-4b-it",
torch_dtype=torch.bfloat16,
device="cuda",
)
# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Describe this X-ray"}
]
}
]
output = pipe(text=messages, max_new_tokens=2000)
print(output[0]["generated_text"][-1]["content"])
Run the model directly
# Make sure to install the accelerate library first via `pip install accelerate`
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch
model_id = "google/medgemma-1.5-4b-it"
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)
# Image attribution: Stillwaterising, CC0, via Wikimedia Commons
image_url = "https://upload.wikimedia.org/wikipedia/commons/c/c8/Chest_Xray_PA_3-8-2010.png"
image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Describe this X-ray"}
]
}
]
...
Quantizations & VRAM
Benchmarks (6)
GPUs that can run this model
At Q4_K_M quantization. Sorted by minimum VRAM.
Find the best GPU for MedGemma 1.5 4B
Build Hardware for MedGemma 1.5 4B