Customize images

With Imagen API, you can create high-quality images by using text prompts and reference images to guide subject or style generation.

View Imagen for Editing and Customization model card

This guide shows you how to customize images with the Imagen API model and covers the following topics:

  • HTTP method and URL: Learn about the API endpoint for image customization.
  • Example syntax: See the structure of a REST API request for customizing an image.
  • Choose a reference image type: Understand the different ways you can use reference images.
  • Parameter list: Review the available parameters for your customization request.
  • Examples: View a complete example of how to customize an image.
  • Class IDs: Find the class IDs for creating image masks based on specific objects.

Supported Models

Model Code
Customization using reference images (few-shot) imagen-3.0-capability-001

For more information about the features that each model supports, see Imagen models.

HTTP method and URL

POST https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/imagen-3.0-capability-001:predict

Example syntax

The following example shows the syntax for customizing an image from a text prompt and reference images.

REST

curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/imagen-3.0-capability-001:predict \
-d '{
    "instances": [
      {
        // Use [1] to refer to the reference images with referenceId=1
        // [2] to refer to the reference images with referenceId=2,
        // following the same format for all reference IDs that you provide.
        "prompt": "${TEXT_PROMPT}",
        "referenceImages": [
          // A list of at most 4 reference image objects.
          [...]
        ]
      }
    ],
    "parameters": {
        [...]
    }
}'

Sample request body:

This sample request body shows a person customization request that uses a face mesh control image and three reference images.

{
  "instances": [
    {
      "prompt": "Create an image about a man with short hair [1] in the pose of
       control image [2] to match the description: A pencil style sketch of a
       full-body portrait of a man with short hair [1] with hatch-cross drawing,
       hatch drawing of portrait with 6B and graphite pencils, white background,
       pencil drawing, high quality, pencil stroke, looking at camera, natural
       human eyes",
      "referenceImages": [
        {
          "referenceType": "REFERENCE_TYPE_CONTROL",
          "referenceId": 2,
          "referenceImage": {
            "bytesBase64Encoded": "${IMAGE_BYTES_1}"
          },
          "controlImageConfig": {
            "controlType": "CONTROL_TYPE_FACE_MESH",
            "enableControlImageComputation": true
          }
        },
        {
          "referenceType": "REFERENCE_TYPE_SUBJECT",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "${IMAGE_BYTES_2}"
          },
          "subjectImageConfig": {
            "subjectDescription": "a man with short hair",
            "subjectType": "SUBJECT_TYPE_PERSON"
          }
        },
        {
          "referenceType": "REFERENCE_TYPE_SUBJECT",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "${IMAGE_BYTES_3}"
          },
          "subjectImageConfig": {
            "subjectDescription": "a man with short hair",
            "subjectType": "SUBJECT_TYPE_PERSON"
          }
        },
        {
          "referenceType": "REFERENCE_TYPE_SUBJECT",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "${IMAGE_BYTES_4}"
          },
          "subjectImageConfig": {
            "subjectDescription": "a man with short hair",
            "subjectType": "SUBJECT_TYPE_PERSON"
          }
        }
      ]
    }
  ],
  "parameters": {
    "negativePrompt": "wrinkles, noise, Low quality, dirty, low res, multi face,
      rough texture, messy, messy background, color background, photo realistic,
      photo, super realistic, signature, autograph, sign, text, characters,
      alphabet, letter",
    "seed": 1,
    "language": "en",
    "sampleCount": 4
  }
}

Choose a reference image type

To customize an image, you provide one or more reference images. Each reference image must have a referenceType that specifies how the model should use it. The following table describes the available reference types.

Reference Type Description Use Case
REFERENCE_TYPE_SUBJECT Provides an image of a subject (like a person, animal, or product) to be incorporated into the generated image. You can provide multiple images for the same subject to improve quality. Placing a specific person or object into a new scene or style.
REFERENCE_TYPE_STYLE Provides an image that defines the artistic style (e.g., watercolor, sketch, pop art) for the generated image. Applying a consistent artistic style to a generated image based on a source style image.
REFERENCE_TYPE_CONTROL Uses a control image (like a canny edge, scribble, or face mesh) to guide the structure, pose, or composition of the generated image. Controlling the exact pose of a character or the outline of an object.
REFERENCE_TYPE_RAW Provides the base image for editing tasks. The output image has the same dimensions as this raw image. Editing an existing image, such as inpainting or outpainting.
REFERENCE_TYPE_MASK Provides a mask to specify which parts of a raw image should be edited (inpainting) or preserved. The mask can be user-provided or automatically generated. Modifying a specific region of an image while leaving the rest unchanged.

Parameter list

The following sections describe the request parameters and response fields. For implementation details, see the examples.

Request parameters

REST

Parameters
referenceType

Required enumeration:

  • REFERENCE_TYPE_RAW
    • Required for editing use cases.
    • At most one raw reference image is allowed per request.
    • The output image has the same dimensions as the raw reference image.
  • REFERENCE_TYPE_MASK
    • Required for masked editing.
    • Must have the same dimensions as the raw reference image, if provided.
    • You can provide your own mask or have one generated from the reference image.
    • If the mask image is empty and maskMode isn't MASK_MODE_USER_PROVIDED, the mask is computed from the raw reference image.
  • REFERENCE_TYPE_CONTROL
    • Must have the same dimensions as the raw reference image, if provided.
    • If the control image is empty and enableControlImageComputation is true, the control image is computed from the raw reference image.
  • REFERENCE_TYPE_SUBJECT
    • You can provide multiple reference images with the same referenceId to potentially improve output quality.
  • REFERENCE_TYPE_STYLE
referenceId

Required integer

The ID for the reference image. Use this ID in your prompt to refer to the corresponding image. For example, use [1] to refer to images with referenceId=1 and [2] for images with referenceId=2.

referenceImage.bytesBase64Encoded

Required string

A Base64-encoded string of the reference image.

maskImageConfig.maskMode

Optional enumeration.

Use this parameter when referenceType is REFERENCE_TYPE_MASK.

  • MASK_MODE_USER_PROVIDED: If the reference image is a mask image.
  • MASK_MODE_BACKGROUND: To automatically generate a mask using background segmentation.
  • MASK_MODE_FOREGROUND: To automatically generate a mask using foreground segmentation.
  • MASK_MODE_SEMANTIC: To automatically generate a mask using semantic segmentation, and the given mask class.
maskImageConfig.dilation

Optional float. Range: [0, 1]

The percentage of image width to dilate this mask by.

Use this parameter when referenceType is REFERENCE_TYPE_MASK.

maskImageConfig.maskClasses

Optional list[Integer].

Mask classes for MASK_MODE_SEMANTIC mode.

Use this parameter when referenceType is REFERENCE_TYPE_MASK.

controlImageConfig.controlType

Required enumeration.

Use this parameter when referenceType is REFERENCE_TYPE_CONTROL.

  • CONTROL_TYPE_FACE_MESH for face mesh (person customization).
  • CONTROL_TYPE_CANNY for canny edge.
  • CONTROL_TYPE_SCRIBBLE for scribble.
controlImageConfig.enableControlImageComputation

Optional bool. Default: false.

If referenceType is REFERENCE_TYPE_CONTROL, set this to true to have Imagen compute the control image from the reference image. Otherwise, set to false and provide your own control image.

language

Optional: string (imagen-3.0-capability-001, imagen-3.0.generate-001, and imagegeneration@006 only)

The language code that corresponds to your text prompt language. The following values are supported:

  • auto: Automatic detection. If Imagen detects a supported language, the prompt and an optional negative prompt are translated to English. If the language detected isn't supported, Imagen uses the input text verbatim, which might result in an unexpected output. No error code is returned.
  • en: English (if omitted, the default value)
  • es: Spanish
  • hi: Hindi
  • ja: Japanese
  • ko: Korean
  • pt: Portuguese
  • zh-TW: Chinese (traditional)
  • zh or zh-CN: Chinese (simplified)
subjectImageConfig.subjectDescription

Required string.

A short description of the subject in the image. For example, a woman with short brown hair.

Use this parameter when referenceType is REFERENCE_TYPE_SUBJECT.

subjectImageConfig.subjectType

Required enumeration.

Use this parameter when referenceType is REFERENCE_TYPE_SUBJECT.

  • SUBJECT_TYPE_PERSON: Person subject type.
  • SUBJECT_TYPE_ANIMAL: Animal subject type.
  • SUBJECT_TYPE_PRODUCT: Product subject type.
  • SUBJECT_TYPE_DEFAULT: Default subject type.
styleImageConfig.styleDescription

Optional string.

A short description for the style.

Use this parameter when referenceType is REFERENCE_TYPE_STYLE.

Response body

The following table describes the fields in the response body.

Parameter
predictions

An array of VisionGenerativeModelResult objects, one for each requested sampleCount. If any images are filtered by responsible AI, they are not included.

Vision generative model result object

The following table describes the fields in the VisionGenerativeModelResult object.

Parameter
bytesBase64Encoded

The base64 encoded generated image. This field is not present if the output image did not pass responsible AI filters.

mimeType

The MIME type of the generated image. This field is not present if the output image did not pass responsible AI filters.

Examples

The following example shows how to use the Imagen model to customize an image.

REST

Before using any of the request data, make the following replacements:

  • PROJECT_ID: Your Google Cloud project ID.
  • LOCATION: Your project's region. For example, us-central1, europe-west2, or asia-northeast3. For a list of available regions, see Generative AI on Vertex AI locations.
  • TEXT_PROMPT: The text prompt guides what images the model generates. To use Imagen 3 Customization, include the referenceId of the reference image or images you provide in the format [$referenceId]. For example:
    • The following text prompt is for a request that has two reference images with "referenceId": 1. Both images have an optional description of "subjectDescription": "man with short hair": Create an image about a man with short hair to match the description: A pencil style sketch of a full-body portrait of a man with short hair [1] with hatch-cross drawing, hatch drawing of portrait with 6B and graphite pencils, white background, pencil drawing, high quality, pencil stroke, looking at camera, natural human eyes
  • "referenceId": The ID of the reference image, or the ID for a series of reference images that correspond to the same subject or style. In this example the two reference images are of the same person, so they share the same referenceId (1).
  • BASE64_REFERENCE_IMAGE: A reference image to guide image generation. The image must be specified as a base64-encoded byte string.
  • SUBJECT_DESCRIPTION: Optional. A text description of the reference image you can then use in the prompt field. For example:
          "prompt": "a full-body portrait of a man with short hair [1] with hatch-cross
          drawing",
          [...],
          "subjectDescription": "man with short hair"
        
  • IMAGE_COUNT: The number of generated images. Accepted integer values: 1-4. Default value: 4.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict

Request JSON body:

{
  "instances": [
    {
      "prompt": "TEXT_PROMPT",
      "referenceImages": [
        {
          "referenceType": "REFERENCE_TYPE_SUBJECT",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "BASE64_REFERENCE_IMAGE"
          },
          "subjectImageConfig": {
            "subjectDescription": "SUBJECT_DESCRIPTION",
            "subjectType": "SUBJECT_TYPE_PERSON"
          }
        },
        {
          "referenceType": "REFERENCE_TYPE_SUBJECT",
          "referenceId": 1,
          "referenceImage": {
            "bytesBase64Encoded": "BASE64_REFERENCE_IMAGE"
          },
          "subjectImageConfig": {
            "subjectDescription": "SUBJECT_DESCRIPTION",
            "subjectType": "SUBJECT_TYPE_PERSON"
          }
        }
      ]
    }
  ],
  "parameters": {
    "sampleCount": IMAGE_COUNT
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict"

PowerShell

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/imagen-3.0-capability-001:predict" | Select-Object -Expand Content
The following sample response is for a request with "sampleCount": 2. The response returns two prediction objects, with the generated image bytes base64-encoded.
{
  "predictions": [
    {
      "bytesBase64Encoded": "BASE64_IMG_BYTES",
      "mimeType": "image/png"
    },
    {
      "mimeType": "image/png",
      "bytesBase64Encoded": "BASE64_IMG_BYTES"
    }
  ]
}

Class IDs

Use the following object class IDs to automatically create an image mask based on specific objects.

Class ID (class_id) Object
0 backpack
1 umbrella
2 bag
3 tie
4 suitcase
5 case
6 bird
7 cat
8 dog
9 horse
10 sheep
11 cow
12 elephant
13 bear
14 zebra
15 giraffe
16 animal (other)
17 microwave
18 radiator
19 oven
20 toaster
21 storage tank
22 conveyor belt
23 sink
24 refrigerator
25 washer dryer
26 fan
27 dishwasher
28 toilet
29 bathtub
30 shower
31 tunnel
32 bridge
33 pier wharf
34 tent
35 building
36 ceiling
37 laptop
38 keyboard
39 mouse
40 remote
41 cell phone
42 television
43 floor
44 stage
45 banana
46 apple
47 sandwich
48 orange
49 broccoli
50 carrot
51 hot dog
52 pizza
53 donut
54 cake
55 fruit (other)
56 food (other)
57 chair (other)
58 armchair
59 swivel chair
60 stool
61 seat
62 couch
63 trash can
64 potted plant
65 nightstand
66 bed
67 table
68 pool table
69 barrel
70 desk
71 ottoman
72 wardrobe
73 crib
74 basket
75 chest of drawers
76 bookshelf
77 counter (other)
78 bathroom counter
79 kitchen island
80 door
81 light (other)
82 lamp
83 sconce
84 chandelier
85 mirror
86 whiteboard
87 shelf
88 stairs
89 escalator
90 cabinet
91 fireplace
92 stove
93 arcade machine
94 gravel
95 platform
96 playingfield
97 railroad
98 road
99 snow
100 sidewalk pavement
101 runway
102 terrain
103 book
104 box
105 clock
106 vase
107 scissors
108 plaything (other)
109 teddy bear
110 hair dryer
111 toothbrush
112 painting
113 poster
114 bulletin board
115 bottle
116 cup
117 wine glass
118 knife
119 fork
120 spoon
121 bowl
122 tray
123 range hood
124 plate
125 person
126 rider (other)
127 bicyclist
128 motorcyclist
129 paper
130 streetlight
131 road barrier
132 mailbox
133 cctv camera
134 junction box
135 traffic sign
136 traffic light
137 fire hydrant
138 parking meter
139 bench
140 bike rack
141 billboard
142 sky
143 pole
144 fence
145 railing banister
146 guard rail
147 mountain hill
148 rock
149 frisbee
150 skis
151 snowboard
152 sports ball
153 kite
154 baseball bat
155 baseball glove
156 skateboard
157 surfboard
158 tennis racket
159 net
160 base
161 sculpture
162 column
163 fountain
164 awning
165 apparel
166 banner
167 flag
168 blanket
169 curtain (other)
170 shower curtain
171 pillow
172 towel
173 rug floormat
174 vegetation
175 bicycle
176 car
177 autorickshaw
178 motorcycle
179 airplane
180 bus
181 train
182 truck
183 trailer
184 boat ship
185 slow wheeled object
186 river lake
187 sea
188 water (other)
189 swimming pool
190 waterfall
191 wall
192 window
193 window blind

What's next