Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Customize images

With Imagen API, you can create high-quality images by using text prompts and reference images to guide subject or style generation.

View Imagen for Editing and Customization model card

This guide shows you how to customize images with the Imagen API model and covers the following topics:

HTTP method and URL: Learn about the API endpoint for image customization.
Example syntax: See the structure of a REST API request for customizing an image.
Choose a reference image type: Understand the different ways you can use reference images.
Parameter list: Review the available parameters for your customization request.
Examples: View a complete example of how to customize an image.
Class IDs: Find the class IDs for creating image masks based on specific objects.

Supported Models

Model	Code
Customization using reference images (few-shot)	`imagen-3.0-capability-001`

For more information about the features that each model supports, see Imagen models.

HTTP method and URL

POST https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/imagen-3.0-capability-001:predict

Example syntax

The following example shows the syntax for customizing an image from a text prompt and reference images.

REST

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/imagen-3.0-capability-001:predict \
-d '{
    "instances": [
      {
        // Use [1] to refer to the reference images with referenceId=1
        // [2] to refer to the reference images with referenceId=2,
        // following the same format for all reference IDs that you provide.
        "prompt": "${TEXT_PROMPT}",
        "referenceImages": [
          // A list of at most 4 reference image objects.
          [...]
        ]
      }
    ],
    "parameters": {
        [...]
    }
}'

Sample request body:

This sample request body shows a person customization request that uses a face mesh control image and three reference images.

{
  "instances": [
    {
      "prompt": "Create an image about a man with short hair [1] in the pose of
       control image [2] to match the description: A pencil style sketch of a
       full-body portrait of a man with short hair [1] with hatch-cross drawing,
       hatch drawing of portrait with 6B and graphite pencils, white background,
       pencil drawing, high quality, pencil stroke, looking at camera, natural
       human eyes",
      "referenceImages": [
        {
          "referenceType": "REFERENCE_TYPE_CONTROL",
          "referenceId": 2,
          "referenceImage": {
            "bytesBase64Encoded": "${IMAGE_BYTES_1}"
          },
          "controlImageConfig": {
            "controlType": "CONTROL_TYPE_FACE_MESH",
            "enableControlImageComputation": true
          }
        },
        {
          "referenceType": "REFERENCE_TYPE_SUBJECT",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "${IMAGE_BYTES_2}"
          },
          "subjectImageConfig": {
            "subjectDescription": "a man with short hair",
            "subjectType": "SUBJECT_TYPE_PERSON"
          }
        },
        {
          "referenceType": "REFERENCE_TYPE_SUBJECT",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "${IMAGE_BYTES_3}"
          },
          "subjectImageConfig": {
            "subjectDescription": "a man with short hair",
            "subjectType": "SUBJECT_TYPE_PERSON"
          }
        },
        {
          "referenceType": "REFERENCE_TYPE_SUBJECT",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "${IMAGE_BYTES_4}"
          },
          "subjectImageConfig": {
            "subjectDescription": "a man with short hair",
            "subjectType": "SUBJECT_TYPE_PERSON"
          }
        }
      ]
    }
  ],
  "parameters": {
    "negativePrompt": "wrinkles, noise, Low quality, dirty, low res, multi face,
      rough texture, messy, messy background, color background, photo realistic,
      photo, super realistic, signature, autograph, sign, text, characters,
      alphabet, letter",
    "seed": 1,
    "language": "en",
    "sampleCount": 4
  }
}

Choose a reference image type

To customize an image, you provide one or more reference images. Each reference image must have a referenceType that specifies how the model should use it. The following table describes the available reference types.

Reference Type	Description	Use Case
`REFERENCE_TYPE_SUBJECT`	Provides an image of a subject (like a person, animal, or product) to be incorporated into the generated image. You can provide multiple images for the same subject to improve quality.	Placing a specific person or object into a new scene or style.
`REFERENCE_TYPE_STYLE`	Provides an image that defines the artistic style (e.g., watercolor, sketch, pop art) for the generated image.	Applying a consistent artistic style to a generated image based on a source style image.
`REFERENCE_TYPE_CONTROL`	Uses a control image (like a canny edge, scribble, or face mesh) to guide the structure, pose, or composition of the generated image.	Controlling the exact pose of a character or the outline of an object.
`REFERENCE_TYPE_RAW`	Provides the base image for editing tasks. The output image has the same dimensions as this raw image.	Editing an existing image, such as inpainting or outpainting.
`REFERENCE_TYPE_MASK`	Provides a mask to specify which parts of a raw image should be edited (inpainting) or preserved. The mask can be user-provided or automatically generated.	Modifying a specific region of an image while leaving the rest unchanged.

Parameter list

The following sections describe the request parameters and response fields. For implementation details, see the examples.

Request parameters

REST

Parameters
`referenceType`	Required enumeration: `REFERENCE_TYPE_RAW` Required for editing use cases. At most one raw reference image is allowed per request. The output image has the same dimensions as the raw reference image. `REFERENCE_TYPE_MASK` Required for masked editing. Must have the same dimensions as the raw reference image, if provided. You can provide your own mask or have one generated from the reference image. If the mask image is empty and `maskMode` isn't `MASK_MODE_USER_PROVIDED`, the mask is computed from the raw reference image. `REFERENCE_TYPE_CONTROL` Must have the same dimensions as the raw reference image, if provided. If the control image is empty and `enableControlImageComputation` is `true`, the control image is computed from the raw reference image. `REFERENCE_TYPE_SUBJECT` You can provide multiple reference images with the same `referenceId` to potentially improve output quality. `REFERENCE_TYPE_STYLE`
`referenceId`	Required `integer` The ID for the reference image. Use this ID in your prompt to refer to the corresponding image. For example, use `[1]` to refer to images with `referenceId=1` and `[2]` for images with `referenceId=2`.
`referenceImage.bytesBase64Encoded`	Required `string` A Base64-encoded string of the reference image.
`maskImageConfig.maskMode`	Optional enumeration. Use this parameter when `referenceType` is `REFERENCE_TYPE_MASK`. `MASK_MODE_USER_PROVIDED`: If the reference image is a mask image. `MASK_MODE_BACKGROUND`: To automatically generate a mask using background segmentation. `MASK_MODE_FOREGROUND`: To automatically generate a mask using foreground segmentation. `MASK_MODE_SEMANTIC`: To automatically generate a mask using semantic segmentation, and the given mask class.
`maskImageConfig.dilation`	Optional `float`. Range: [0, 1] The percentage of image width to dilate this mask by. Use this parameter when `referenceType` is `REFERENCE_TYPE_MASK`.
`maskImageConfig.maskClasses`	Optional `list[Integer]`. Mask classes for `MASK_MODE_SEMANTIC` mode. Use this parameter when `referenceType` is `REFERENCE_TYPE_MASK`.
`controlImageConfig.controlType`	Required enumeration. Use this parameter when `referenceType` is `REFERENCE_TYPE_CONTROL`. `CONTROL_TYPE_FACE_MESH` for face mesh (person customization). `CONTROL_TYPE_CANNY` for canny edge. `CONTROL_TYPE_SCRIBBLE` for scribble.
`controlImageConfig.enableControlImageComputation`	Optional `bool`. Default: `false`. If `referenceType` is `REFERENCE_TYPE_CONTROL`, set this to `true` to have Imagen compute the control image from the reference image. Otherwise, set to `false` and provide your own control image.
`language`	Optional: `string` (`imagen-3.0-capability-001`, `imagen-3.0.generate-001`, and `imagegeneration@006` only) The language code that corresponds to your text prompt language. The following values are supported: `auto`: Automatic detection. If Imagen detects a supported language, the prompt and an optional negative prompt are translated to English. If the language detected isn't supported, Imagen uses the input text verbatim, which might result in an unexpected output. No error code is returned. `en`: English (if omitted, the default value) `es`: Spanish `hi`: Hindi `ja`: Japanese `ko`: Korean `pt`: Portuguese `zh-TW`: Chinese (traditional) `zh` or `zh-CN`: Chinese (simplified)
`subjectImageConfig.subjectDescription`	Required `string`. A short description of the subject in the image. For example, a woman with short brown hair. Use this parameter when `referenceType` is `REFERENCE_TYPE_SUBJECT`.
`subjectImageConfig.subjectType`	Required enumeration. Use this parameter when `referenceType` is `REFERENCE_TYPE_SUBJECT`. `SUBJECT_TYPE_PERSON`: Person subject type. `SUBJECT_TYPE_ANIMAL`: Animal subject type. `SUBJECT_TYPE_PRODUCT`: Product subject type. `SUBJECT_TYPE_DEFAULT`: Default subject type.
`styleImageConfig.styleDescription`	Optional `string`. A short description for the style. Use this parameter when `referenceType` is `REFERENCE_TYPE_STYLE`.

Response body

The following table describes the fields in the response body.

Parameter
`predictions`	An array of `VisionGenerativeModelResult` objects, one for each requested `sampleCount`. If any images are filtered by responsible AI, they are not included.

Vision generative model result object

The following table describes the fields in the VisionGenerativeModelResult object.

Parameter
`bytesBase64Encoded`	The base64 encoded generated image. This field is not present if the output image did not pass responsible AI filters.
`mimeType`	The MIME type of the generated image. This field is not present if the output image did not pass responsible AI filters.

Examples

The following example shows how to use the Imagen model to customize an image.

REST

Before using any of the request data, make the following replacements:

PROJECT_ID: Your Google Cloud project ID.
LOCATION: Your project's region. For example, us-central1, europe-west2, or asia-northeast3. For a list of available regions, see Generative AI on Vertex AI locations.
TEXT_PROMPT: The text prompt guides what images the model generates. To use Imagen 3 Customization, include the referenceId of the reference image or images you provide in the format [$referenceId]. For example:
- The following text prompt is for a request that has two reference images with "referenceId": 1. Both images have an optional description of "subjectDescription": "man with short hair": Create an image about a man with short hair to match the description: A pencil style sketch of a full-body portrait of a man with short hair [1] with hatch-cross drawing, hatch drawing of portrait with 6B and graphite pencils, white background, pencil drawing, high quality, pencil stroke, looking at camera, natural human eyes
"referenceId": The ID of the reference image, or the ID for a series of reference images that correspond to the same subject or style. In this example the two reference images are of the same person, so they share the same referenceId (1).
BASE64_REFERENCE_IMAGE: A reference image to guide image generation. The image must be specified as a base64-encoded byte string.

SUBJECT_DESCRIPTION: Optional. A text description of the reference image you can then use in the prompt field. For example:

      "prompt": "a full-body portrait of a man with short hair [1] with hatch-cross
      drawing",
      [...],
      "subjectDescription": "man with short hair"

IMAGE_COUNT: The number of generated images. Accepted integer values: 1-4. Default value: 4.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict

Request JSON body:

{
  "instances": [
    {
      "prompt": "TEXT_PROMPT",
      "referenceImages": [
        {
          "referenceType": "REFERENCE_TYPE_SUBJECT",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "BASE64_REFERENCE_IMAGE"
          },
          "subjectImageConfig": {
            "subjectDescription": "SUBJECT_DESCRIPTION",
            "subjectType": "SUBJECT_TYPE_PERSON"
          }
        },
        {
          "referenceType": "REFERENCE_TYPE_SUBJECT",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "BASE64_REFERENCE_IMAGE"
          },
          "subjectImageConfig": {
            "subjectDescription": "SUBJECT_DESCRIPTION",
            "subjectType": "SUBJECT_TYPE_PERSON"
          }
        }
      ]
    }
  ],
  "parameters": {
    "sampleCount": IMAGE_COUNT
  }
}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict" | Select-Object -Expand Content

The following sample response is for a request with "sampleCount": 2. The response returns two prediction objects, with the generated image bytes base64-encoded.

{
  "predictions": [
    {
      "bytesBase64Encoded": "BASE64_IMG_BYTES",
      "mimeType": "image/png"
    },
    {
      "mimeType": "image/png",
      "bytesBase64Encoded": "BASE64_IMG_BYTES"
    }
  ]
}

Class IDs

Use the following object class IDs to automatically create an image mask based on specific objects.

Class ID (`class_id`)	Object
0	backpack
1	umbrella
2	bag
3	tie
4	suitcase
5	case
6	bird
7	cat
8	dog
9	horse
10	sheep
11	cow
12	elephant
13	bear
14	zebra
15	giraffe
16	animal (other)
17	microwave
18	radiator
19	oven
20	toaster
21	storage tank
22	conveyor belt
23	sink
24	refrigerator
25	washer dryer
26	fan
27	dishwasher
28	toilet
29	bathtub
30	shower
31	tunnel
32	bridge
33	pier wharf
34	tent
35	building
36	ceiling
37	laptop
38	keyboard
39	mouse
40	remote
41	cell phone
42	television
43	floor
44	stage
45	banana
46	apple
47	sandwich
48	orange
49	broccoli
50	carrot
51	hot dog
52	pizza
53	donut
54	cake
55	fruit (other)
56	food (other)
57	chair (other)
58	armchair
59	swivel chair
60	stool
61	seat
62	couch
63	trash can
64	potted plant
65	nightstand
66	bed
67	table
68	pool table
69	barrel
70	desk
71	ottoman
72	wardrobe
73	crib
74	basket
75	chest of drawers
76	bookshelf
77	counter (other)
78	bathroom counter
79	kitchen island
80	door
81	light (other)
82	lamp
83	sconce
84	chandelier
85	mirror
86	whiteboard
87	shelf
88	stairs
89	escalator
90	cabinet
91	fireplace
92	stove
93	arcade machine
94	gravel
95	platform
96	playingfield
97	railroad
98	road
99	snow
100	sidewalk pavement
101	runway
102	terrain
103	book
104	box
105	clock
106	vase
107	scissors
108	plaything (other)
109	teddy bear
110	hair dryer
111	toothbrush
112	painting
113	poster
114	bulletin board
115	bottle
116	cup
117	wine glass
118	knife
119	fork
120	spoon
121	bowl
122	tray
123	range hood
124	plate
125	person
126	rider (other)
127	bicyclist
128	motorcyclist
129	paper
130	streetlight
131	road barrier
132	mailbox
133	cctv camera
134	junction box
135	traffic sign
136	traffic light
137	fire hydrant
138	parking meter
139	bench
140	bike rack
141	billboard
142	sky
143	pole
144	fence
145	railing banister
146	guard rail
147	mountain hill
148	rock
149	frisbee
150	skis
151	snowboard
152	sports ball
153	kite
154	baseball bat
155	baseball glove
156	skateboard
157	surfboard
158	tennis racket
159	net
160	base
161	sculpture
162	column
163	fountain
164	awning
165	apparel
166	banner
167	flag
168	blanket
169	curtain (other)
170	shower curtain
171	pillow
172	towel
173	rug floormat
174	vegetation
175	bicycle
176	car
177	autorickshaw
178	motorcycle
179	airplane
180	bus
181	train
182	truck
183	trailer
184	boat ship
185	slow wheeled object
186	river lake
187	sea
188	water (other)
189	swimming pool
190	waterfall
191	wall
192	window
193	window blind

What's next

For more information, see Imagen on Vertex AI.

Customize images Stay organized with collections Save and categorize content based on your preferences.

HTTP method and URL

Example syntax

REST

Choose a reference image type

Parameter list

Request parameters

REST

Response body

Examples

REST

curl

PowerShell

Class IDs

What's next

Customize images