With Imagen API, you can create high-quality images by using text prompts and reference images to guide subject or style generation.
View Imagen for Editing and Customization model card
This guide shows you how to customize images with the Imagen API model and covers the following topics:
- HTTP method and URL: Learn about the API endpoint for image customization.
- Example syntax: See the structure of a REST API request for customizing an image.
- Choose a reference image type: Understand the different ways you can use reference images.
- Parameter list: Review the available parameters for your customization request.
- Examples: View a complete example of how to customize an image.
- Class IDs: Find the class IDs for creating image masks based on specific objects.
Supported Models
Model | Code |
---|---|
Customization using reference images (few-shot) | imagen-3.0-capability-001 |
For more information about the features that each model supports, see Imagen models.
HTTP method and URL
POST https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/imagen-3.0-capability-001:predict
Example syntax
The following example shows the syntax for customizing an image from a text prompt and reference images.
REST
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/imagen-3.0-capability-001:predict \ -d '{ "instances": [ { // Use [1] to refer to the reference images with referenceId=1 // [2] to refer to the reference images with referenceId=2, // following the same format for all reference IDs that you provide. "prompt": "${TEXT_PROMPT}", "referenceImages": [ // A list of at most 4 reference image objects. [...] ] } ], "parameters": { [...] } }'
Sample request body:
This sample request body shows a person customization request that uses a face mesh control image and three reference images.
{ "instances": [ { "prompt": "Create an image about a man with short hair [1] in the pose of control image [2] to match the description: A pencil style sketch of a full-body portrait of a man with short hair [1] with hatch-cross drawing, hatch drawing of portrait with 6B and graphite pencils, white background, pencil drawing, high quality, pencil stroke, looking at camera, natural human eyes", "referenceImages": [ { "referenceType": "REFERENCE_TYPE_CONTROL", "referenceId": 2, "referenceImage": { "bytesBase64Encoded": "${IMAGE_BYTES_1}" }, "controlImageConfig": { "controlType": "CONTROL_TYPE_FACE_MESH", "enableControlImageComputation": true } }, { "referenceType": "REFERENCE_TYPE_SUBJECT", "referenceId": 1, "referenceImage": { "bytesBase64Encoded": "${IMAGE_BYTES_2}" }, "subjectImageConfig": { "subjectDescription": "a man with short hair", "subjectType": "SUBJECT_TYPE_PERSON" } }, { "referenceType": "REFERENCE_TYPE_SUBJECT", "referenceId": 1, "referenceImage": { "bytesBase64Encoded": "${IMAGE_BYTES_3}" }, "subjectImageConfig": { "subjectDescription": "a man with short hair", "subjectType": "SUBJECT_TYPE_PERSON" } }, { "referenceType": "REFERENCE_TYPE_SUBJECT", "referenceId": 1, "referenceImage": { "bytesBase64Encoded": "${IMAGE_BYTES_4}" }, "subjectImageConfig": { "subjectDescription": "a man with short hair", "subjectType": "SUBJECT_TYPE_PERSON" } } ] } ], "parameters": { "negativePrompt": "wrinkles, noise, Low quality, dirty, low res, multi face, rough texture, messy, messy background, color background, photo realistic, photo, super realistic, signature, autograph, sign, text, characters, alphabet, letter", "seed": 1, "language": "en", "sampleCount": 4 } }
Choose a reference image type
To customize an image, you provide one or more reference images. Each reference image must have a referenceType
that specifies how the model should use it. The following table describes the available reference types.
Reference Type | Description | Use Case |
---|---|---|
REFERENCE_TYPE_SUBJECT |
Provides an image of a subject (like a person, animal, or product) to be incorporated into the generated image. You can provide multiple images for the same subject to improve quality. | Placing a specific person or object into a new scene or style. |
REFERENCE_TYPE_STYLE |
Provides an image that defines the artistic style (e.g., watercolor, sketch, pop art) for the generated image. | Applying a consistent artistic style to a generated image based on a source style image. |
REFERENCE_TYPE_CONTROL |
Uses a control image (like a canny edge, scribble, or face mesh) to guide the structure, pose, or composition of the generated image. | Controlling the exact pose of a character or the outline of an object. |
REFERENCE_TYPE_RAW |
Provides the base image for editing tasks. The output image has the same dimensions as this raw image. | Editing an existing image, such as inpainting or outpainting. |
REFERENCE_TYPE_MASK |
Provides a mask to specify which parts of a raw image should be edited (inpainting) or preserved. The mask can be user-provided or automatically generated. | Modifying a specific region of an image while leaving the rest unchanged. |
Parameter list
The following sections describe the request parameters and response fields. For implementation details, see the examples.
Request parameters
REST
Parameters | |
---|---|
referenceType |
Required enumeration:
|
referenceId |
Required The ID for the reference image. Use this ID in your prompt to refer to the corresponding image. For example, use |
referenceImage.bytesBase64Encoded |
Required A Base64-encoded string of the reference image. |
maskImageConfig.maskMode |
Optional enumeration. Use this parameter when
|
maskImageConfig.dilation |
Optional The percentage of image width to dilate this mask by. Use this parameter when |
maskImageConfig.maskClasses |
Optional Mask classes for Use this parameter when |
controlImageConfig.controlType |
Required enumeration. Use this parameter when
|
controlImageConfig.enableControlImageComputation |
Optional If |
language |
Optional: The language code that corresponds to your text prompt language. The following values are supported:
|
subjectImageConfig.subjectDescription |
Required A short description of the subject in the image. For example, a woman with short brown hair. Use this parameter when |
subjectImageConfig.subjectType |
Required enumeration. Use this parameter when
|
styleImageConfig.styleDescription |
Optional A short description for the style. Use this parameter when |
Response body
The following table describes the fields in the response body.
Parameter | |
---|---|
predictions |
An array of
|
Vision generative model result object
The following table describes the fields in the VisionGenerativeModelResult
object.
Parameter | |
---|---|
bytesBase64Encoded |
The base64 encoded generated image. This field is not present if the output image did not pass responsible AI filters. |
mimeType |
The MIME type of the generated image. This field is not present if the output image did not pass responsible AI filters. |
Examples
The following example shows how to use the Imagen model to customize an image.
REST
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your Google Cloud project ID.
- LOCATION: Your project's region. For example,
us-central1
,europe-west2
, orasia-northeast3
. For a list of available regions, see Generative AI on Vertex AI locations. - TEXT_PROMPT: The text prompt guides what images the model
generates. To use Imagen 3 Customization, include the
referenceId
of the reference image or images you provide in the format [$referenceId]. For example:- The following text prompt is for a request that has two reference images with
"referenceId": 1
. Both images have an optional description of"subjectDescription": "man with short hair"
: Create an image about a man with short hair to match the description: A pencil style sketch of a full-body portrait of a man with short hair [1] with hatch-cross drawing, hatch drawing of portrait with 6B and graphite pencils, white background, pencil drawing, high quality, pencil stroke, looking at camera, natural human eyes
- The following text prompt is for a request that has two reference images with
"referenceId"
: The ID of the reference image, or the ID for a series of reference images that correspond to the same subject or style. In this example the two reference images are of the same person, so they share the samereferenceId
(1
).- BASE64_REFERENCE_IMAGE: A reference image to guide image generation. The image must be specified as a base64-encoded byte string.
- SUBJECT_DESCRIPTION: Optional. A text description of the reference image you can
then use in the
prompt
field. For example:"prompt": "a full-body portrait of a man with short hair [1] with hatch-cross drawing", [...], "subjectDescription": "man with short hair"
- IMAGE_COUNT: The number of generated images. Accepted integer values: 1-4. Default value: 4.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict
Request JSON body:
{ "instances": [ { "prompt": "TEXT_PROMPT", "referenceImages": [ { "referenceType": "REFERENCE_TYPE_SUBJECT", "referenceId": 1, "referenceImage": { "bytesBase64Encoded": "BASE64_REFERENCE_IMAGE" }, "subjectImageConfig": { "subjectDescription": "SUBJECT_DESCRIPTION", "subjectType": "SUBJECT_TYPE_PERSON" } }, { "referenceType": "REFERENCE_TYPE_SUBJECT", "referenceId": 1, "referenceImage": { "bytesBase64Encoded": "BASE64_REFERENCE_IMAGE" }, "subjectImageConfig": { "subjectDescription": "SUBJECT_DESCRIPTION", "subjectType": "SUBJECT_TYPE_PERSON" } } ] } ], "parameters": { "sampleCount": IMAGE_COUNT } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict"
PowerShell
Save the request body in a file named request.json
,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict" | Select-Object -Expand Content
"sampleCount": 2
. The response returns two prediction objects, with
the generated image bytes base64-encoded.
{ "predictions": [ { "bytesBase64Encoded": "BASE64_IMG_BYTES", "mimeType": "image/png" }, { "mimeType": "image/png", "bytesBase64Encoded": "BASE64_IMG_BYTES" } ] }
Class IDs
Use the following object class IDs to automatically create an image mask based on specific objects.
Class ID (class_ ) |
Object |
---|---|
0 | backpack |
1 | umbrella |
2 | bag |
3 | tie |
4 | suitcase |
5 | case |
6 | bird |
7 | cat |
8 | dog |
9 | horse |
10 | sheep |
11 | cow |
12 | elephant |
13 | bear |
14 | zebra |
15 | giraffe |
16 | animal (other) |
17 | microwave |
18 | radiator |
19 | oven |
20 | toaster |
21 | storage tank |
22 | conveyor belt |
23 | sink |
24 | refrigerator |
25 | washer dryer |
26 | fan |
27 | dishwasher |
28 | toilet |
29 | bathtub |
30 | shower |
31 | tunnel |
32 | bridge |
33 | pier wharf |
34 | tent |
35 | building |
36 | ceiling |
37 | laptop |
38 | keyboard |
39 | mouse |
40 | remote |
41 | cell phone |
42 | television |
43 | floor |
44 | stage |
45 | banana |
46 | apple |
47 | sandwich |
48 | orange |
49 | broccoli |
50 | carrot |
51 | hot dog |
52 | pizza |
53 | donut |
54 | cake |
55 | fruit (other) |
56 | food (other) |
57 | chair (other) |
58 | armchair |
59 | swivel chair |
60 | stool |
61 | seat |
62 | couch |
63 | trash can |
64 | potted plant |
65 | nightstand |
66 | bed |
67 | table |
68 | pool table |
69 | barrel |
70 | desk |
71 | ottoman |
72 | wardrobe |
73 | crib |
74 | basket |
75 | chest of drawers |
76 | bookshelf |
77 | counter (other) |
78 | bathroom counter |
79 | kitchen island |
80 | door |
81 | light (other) |
82 | lamp |
83 | sconce |
84 | chandelier |
85 | mirror |
86 | whiteboard |
87 | shelf |
88 | stairs |
89 | escalator |
90 | cabinet |
91 | fireplace |
92 | stove |
93 | arcade machine |
94 | gravel |
95 | platform |
96 | playingfield |
97 | railroad |
98 | road |
99 | snow |
100 | sidewalk pavement |
101 | runway |
102 | terrain |
103 | book |
104 | box |
105 | clock |
106 | vase |
107 | scissors |
108 | plaything (other) |
109 | teddy bear |
110 | hair dryer |
111 | toothbrush |
112 | painting |
113 | poster |
114 | bulletin board |
115 | bottle |
116 | cup |
117 | wine glass |
118 | knife |
119 | fork |
120 | spoon |
121 | bowl |
122 | tray |
123 | range hood |
124 | plate |
125 | person |
126 | rider (other) |
127 | bicyclist |
128 | motorcyclist |
129 | paper |
130 | streetlight |
131 | road barrier |
132 | mailbox |
133 | cctv camera |
134 | junction box |
135 | traffic sign |
136 | traffic light |
137 | fire hydrant |
138 | parking meter |
139 | bench |
140 | bike rack |
141 | billboard |
142 | sky |
143 | pole |
144 | fence |
145 | railing banister |
146 | guard rail |
147 | mountain hill |
148 | rock |
149 | frisbee |
150 | skis |
151 | snowboard |
152 | sports ball |
153 | kite |
154 | baseball bat |
155 | baseball glove |
156 | skateboard |
157 | surfboard |
158 | tennis racket |
159 | net |
160 | base |
161 | sculpture |
162 | column |
163 | fountain |
164 | awning |
165 | apparel |
166 | banner |
167 | flag |
168 | blanket |
169 | curtain (other) |
170 | shower curtain |
171 | pillow |
172 | towel |
173 | rug floormat |
174 | vegetation |
175 | bicycle |
176 | car |
177 | autorickshaw |
178 | motorcycle |
179 | airplane |
180 | bus |
181 | train |
182 | truck |
183 | trailer |
184 | boat ship |
185 | slow wheeled object |
186 | river lake |
187 | sea |
188 | water (other) |
189 | swimming pool |
190 | waterfall |
191 | wall |
192 | window |
193 | window blind |
What's next
- For more information, see Imagen on Vertex AI.