The ML.GENERATE_TEXT function

This document describes the ML.GENERATE_TEXT function, which lets you perform generative natural language tasks by using any combination of text and unstructured data from BigQuery standard tables, or unstructured data from BigQuery object tables.

The function works by sending requests to a BigQuery ML remote model that represents a Vertex AI model, and then returning that model's response. The following types of remote models are supported:

Several of the ML.GENERATE_TEXT function's arguments provide the parameters that shape the Vertex AI model's response.

You can use the ML.GENERATE_TEXT function to perform tasks such as classification, sentiment analysis, image captioning, and transcription.

Prompt design can strongly affect the responses returned by the Vertex AI model. For more information, see Introduction to prompting or Design multimodal prompts.

Input

The input you can provide to ML.GENERATE_TEXT varies depending on the Vertex AI model that you reference from your remote model.

Input for Gemini models

When you use the Gemini 1.5 or newer models, you can use the following types of input:

When you analyze unstructured data, that data must meet the following requirements:

  • Content must be in one of the supported formats that are described in the Gemini API model mimeType parameter.
  • If you are analyzing a video, the maximum supported length is two minutes. If the video is longer than two minutes, ML.GENERATE_TEXT only returns results for the first two minutes.

Input for other models

For all other types of models, you can analyze text data from a standard table.

Syntax for standard tables

ML.GENERATE_TEXT syntax differs depending on the Vertex AI model that your remote model references. Choose the option appropriate for your use case.

gemini-2.0-flash

ML.GENERATE_TEXT(
MODEL project_id.dataset.model,
{ TABLE project_id.dataset.table | (query_statement) },
STRUCT(
  [max_output_tokens AS max_output_tokens]
  [, top_p AS top_p]
  [, temperature AS temperature]
  [, flatten_json_output AS flatten_json_output]
  [, stop_sequences AS stop_sequences]
  [, ground_with_google_search AS ground_with_google_search]
  [, safety_settings AS safety_settings])
)

Arguments

ML.GENERATE_TEXT takes the following arguments:

  • project_id: your project ID.

  • dataset: the BigQuery dataset that contains the model.

  • model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

    You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use the query_statement argument instead and provide a SELECT statement that includes an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement: the GoogleSQL query that generates the prompt data. The query must produce a column named prompt. Within the query, you can provide the prompt value in the following ways:

    • Specify a STRING value. For example, ('Write a poem about birds').
    • Specify a STRUCT value that contains one or more fields. You can use the following types of fields within the STRUCT value:

      Field type Description Examples
      STRING A string literal, or the name of a STRING column. String literal:
      'Is Seattle a US city?'

      String column name:
      my_string_column
      ARRAY<STRING> You can only use string literals in the array. Array of string literals:
      ['Is ', 'Seattle', ' a US city']
      ObjectRefRuntime

      An ObjectRefRuntime value returned by the OBJ.GET_ACCESS_URL function. The OBJ.GET_ACCESS_URL function takes an ObjectRef value as input, which you can provide by either specifying the name of a column that contains ObjectRef values, or by constructing an ObjectRef value.

      ObjectRefRuntime values must have the access_url.read_url and details.gcs_metadata.content_type elements of the JSON value populated.

      Function call with ObjectRef column:
      OBJ.GET_ACCESS_URL(my_objectref_column, 'r')

      Function call with constructed ObjectRef value:
      OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image.jpg', 'myconnection'), 'r')
      ARRAY<ObjectRefRuntime>

      ObjectRefRuntime values returned from multiple calls to the OBJ.GET_ACCESS_URL function. The OBJ.GET_ACCESS_URL function takes an ObjectRef value as input, which you can provide by either specifying the name of a column that contains ObjectRef values, or by constructing an ObjectRef value.

      ObjectRefRuntime values must have the access_url.read_url and details.gcs_metadata.content_type elements of the JSON value populated.

      Function calls with ObjectRef columns:
      [OBJ.GET_ACCESS_URL(my_objectref_column1, 'r'), OBJ.GET_ACCESS_URL(my_objectref_column2, 'r')]

      Function calls with constructed ObjectRef values:
      [OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image1.jpg', 'myconnection'), 'r'), OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image2.jpg', 'myconnection'), 'r')]

      The function combines STRUCT fields similarly to a CONCAT operation and concatenates the fields in their specified order. The same is true for the elements of any arrays used within the struct. The following table shows some examples of STRUCT prompt values and how they are interpreted:

      Struct field types Struct value Semantic equivalent
      STRUCT<STRING> ('Describe the city of Seattle') 'Describe the city of Seattle'
      STRUCT<STRING, STRING, STRING> ('Describe the city ', my_city_column, ' in 15 words') 'Describe the city my_city_column_value in 15 words'
      STRUCT<STRING, ARRAY<STRING>> ('Describe ', ['the city of', 'Seattle']) 'Describe the city of Seattle'
      STRUCT<STRING, ObjectRefRuntime> ('Describe this city', OBJ.GET_ACCESS_URL(image_objectref_column, 'r')) 'Describe this city' image
      STRUCT<STRING, ObjectRefRuntime, ObjectRefRuntime> ('If the city in the first image is within the country of the second image, provide a ten word description of the city',
      OBJ.GET_ACCESS_URL(city_image_objectref_column, 'r'),
      OBJ.GET_ACCESS_URL(country_image_objectref_column, 'r'))
      'If the city in the first image is within the country of the second image, provide a ten word description of the city' city_image country_image

    • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.
    • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95.

      Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

    • temperature: a FLOAT64 value in the range [0.0,1.0] that controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

    • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

    • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

    • safety_settings: an ARRAY<STRUCT<STRING AS category, STRING AS threshold>> value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify both STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold) and STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, the BLOCK_MEDIUM_AND_ABOVE safety setting is used.

      Supported categories are as follows:

      • HARM_CATEGORY_HATE_SPEECH
      • HARM_CATEGORY_DANGEROUS_CONTENT
      • HARM_CATEGORY_HARASSMENT
      • HARM_CATEGORY_SEXUALLY_EXPLICIT

      Supported thresholds are as follows:

      • BLOCK_NONE (Restricted)
      • BLOCK_LOW_AND_ABOVE
      • BLOCK_MEDIUM_AND_ABOVE (Default)
      • BLOCK_ONLY_HIGH
      • HARM_BLOCK_THRESHOLD_UNSPECIFIED

      For more information, refer to the definition of safety category and blocking threshold.

    Details

    The model and input table must be in the same region.

gemini-1.5-flash

ML.GENERATE_TEXT(
MODEL project_id.dataset.model,
{ TABLE project_id.dataset.table | (query_statement) },
STRUCT(
  [max_output_tokens AS max_output_tokens]
  [, top_p AS top_p]
  [, temperature AS temperature]
  [, flatten_json_output AS flatten_json_output]
  [, stop_sequences AS stop_sequences]
  [, ground_with_google_search AS ground_with_google_search]
  [, safety_settings AS safety_settings])
)

Arguments

ML.GENERATE_TEXT takes the following arguments:

  • project_id: your project ID.

  • dataset: the BigQuery dataset that contains the model.

  • model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

    You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use the query_statement argument instead and provide a SELECT statement that includes an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement: the GoogleSQL query that generates the prompt data. The query must produce a column named prompt. Within the query, you can provide the prompt value in the following ways:

    • Specify a STRING value. For example, ('Write a poem about birds').
    • Specify a STRUCT value that contains one or more fields. You can use the following types of fields within the STRUCT value:

      Field type Description Examples
      STRING A string literal, or the name of a STRING column. String literal:
      'Is Seattle a US city?'

      String column name:
      my_string_column
      ARRAY<STRING> You can only use string literals in the array. Array of string literals:
      ['Is ', 'Seattle', ' a US city']
      ObjectRefRuntime

      An ObjectRefRuntime value returned by the OBJ.GET_ACCESS_URL function. The OBJ.GET_ACCESS_URL function takes an ObjectRef value as input, which you can provide by either specifying the name of a column that contains ObjectRef values, or by constructing an ObjectRef value.

      ObjectRefRuntime values must have the access_url.read_url and details.gcs_metadata.content_type elements of the JSON value populated.

      Function call with ObjectRef column:
      OBJ.GET_ACCESS_URL(my_objectref_column, 'r')

      Function call with constructed ObjectRef value:
      OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image.jpg', 'myconnection'), 'r')
      ARRAY<ObjectRefRuntime>

      ObjectRefRuntime values returned from multiple calls to the OBJ.GET_ACCESS_URL function. The OBJ.GET_ACCESS_URL function takes an ObjectRef value as input, which you can provide by either specifying the name of a column that contains ObjectRef values, or by constructing an ObjectRef value.

      ObjectRefRuntime values must have the access_url.read_url and details.gcs_metadata.content_type elements of the JSON value populated.

      Function calls with ObjectRef columns:
      [OBJ.GET_ACCESS_URL(my_objectref_column1, 'r'), OBJ.GET_ACCESS_URL(my_objectref_column2, 'r')]

      Function calls with constructed ObjectRef values:
      [OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image1.jpg', 'myconnection'), 'r'), OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image2.jpg', 'myconnection'), 'r')]

      The function combines STRUCT fields similarly to a CONCAT operation and concatenates the fields in their specified order. The same is true for the elements of any arrays used within the struct. The following table shows some examples of STRUCT prompt values and how they are interpreted:

      Struct field types Struct value Semantic equivalent
      STRUCT<STRING> ('Describe the city of Seattle') 'Describe the city of Seattle'
      STRUCT<STRING, STRING, STRING> ('Describe the city ', my_city_column, ' in 15 words') 'Describe the city my_city_column_value in 15 words'
      STRUCT<STRING, ARRAY<STRING>> ('Describe ', ['the city of', 'Seattle']) 'Describe the city of Seattle'
      STRUCT<STRING, ObjectRefRuntime> ('Describe this city', OBJ.GET_ACCESS_URL(image_objectref_column, 'r')) 'Describe this city' image
      STRUCT<STRING, ObjectRefRuntime, ObjectRefRuntime> ('If the city in the first image is within the country of the second image, provide a ten word description of the city',
      OBJ.GET_ACCESS_URL(city_image_objectref_column, 'r'),
      OBJ.GET_ACCESS_URL(country_image_objectref_column, 'r'))
      'If the city in the first image is within the country of the second image, provide a ten word description of the city' city_image country_image

    • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.
    • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95.

      Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

    • temperature: a FLOAT64 value in the range [0.0,1.0] that controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

    • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

    • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

    • ground_with_google_search: a BOOL value that determines whether the Vertex AI model uses Grounding with Google Search when generating responses. Grounding lets the model use additional information from the internet when generating a response, in order to make model responses more specific and factual. When both flatten_json_output and this field are set to TRUE, an additional ml_generate_text_grounding_result column is included in the results, providing the sources that the model used to gather additional information. The default is FALSE.
    • safety_settings: an ARRAY<STRUCT<STRING AS category, STRING AS threshold>> value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify both STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold) and STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, the BLOCK_MEDIUM_AND_ABOVE safety setting is used.

      Supported categories are as follows:

      • HARM_CATEGORY_HATE_SPEECH
      • HARM_CATEGORY_DANGEROUS_CONTENT
      • HARM_CATEGORY_HARASSMENT
      • HARM_CATEGORY_SEXUALLY_EXPLICIT

      Supported thresholds are as follows:

      • BLOCK_NONE (Restricted)
      • BLOCK_LOW_AND_ABOVE
      • BLOCK_MEDIUM_AND_ABOVE (Default)
      • BLOCK_ONLY_HIGH
      • HARM_BLOCK_THRESHOLD_UNSPECIFIED

      For more information, refer to the definition of safety category and blocking threshold.

    Details

    The model and input table must be in the same region.

gemini-1.5-pro

ML.GENERATE_TEXT(
MODEL project_id.dataset.model,
{ TABLE project_id.dataset.table | (query_statement) },
STRUCT(
  [max_output_tokens AS max_output_tokens]
  [, top_p AS top_p]
  [, temperature AS temperature]
  [, flatten_json_output AS flatten_json_output]
  [, stop_sequences AS stop_sequences]
  [, ground_with_google_search AS ground_with_google_search]
  [, safety_settings AS safety_settings])
)

Arguments

ML.GENERATE_TEXT takes the following arguments:

  • project_id: your project ID.

  • dataset: the BigQuery dataset that contains the model.

  • model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

    You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use the query_statement argument instead and provide a SELECT statement that includes an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement: the GoogleSQL query that generates the prompt data. The query must produce a column named prompt. Within the query, you can provide the prompt value in the following ways:

    • Specify a STRING value. For example, ('Write a poem about birds').
    • Specify a STRUCT value that contains one or more fields. You can use the following types of fields within the STRUCT value:

      Field type Description Examples
      STRING A string literal, or the name of a STRING column. String literal:
      'Is Seattle a US city?'

      String column name:
      my_string_column
      ARRAY<STRING> You can only use string literals in the array. Array of string literals:
      ['Is ', 'Seattle', ' a US city']
      ObjectRefRuntime

      An ObjectRefRuntime value returned by the OBJ.GET_ACCESS_URL function. The OBJ.GET_ACCESS_URL function takes an ObjectRef value as input, which you can provide by either specifying the name of a column that contains ObjectRef values, or by constructing an ObjectRef value.

      ObjectRefRuntime values must have the access_url.read_url and details.gcs_metadata.content_type elements of the JSON value populated.

      Function call with ObjectRef column:
      OBJ.GET_ACCESS_URL(my_objectref_column, 'r')

      Function call with constructed ObjectRef value:
      OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image.jpg', 'myconnection'), 'r')
      ARRAY<ObjectRefRuntime>

      ObjectRefRuntime values returned from multiple calls to the OBJ.GET_ACCESS_URL function. The OBJ.GET_ACCESS_URL function takes an ObjectRef value as input, which you can provide by either specifying the name of a column that contains ObjectRef values, or by constructing an ObjectRef value.

      ObjectRefRuntime values must have the access_url.read_url and details.gcs_metadata.content_type elements of the JSON value populated.

      Function calls with ObjectRef columns:
      [OBJ.GET_ACCESS_URL(my_objectref_column1, 'r'), OBJ.GET_ACCESS_URL(my_objectref_column2, 'r')]

      Function calls with constructed ObjectRef values:
      [OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image1.jpg', 'myconnection'), 'r'), OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image2.jpg', 'myconnection'), 'r')]

      The function combines STRUCT fields similarly to a CONCAT operation and concatenates the fields in their specified order. The same is true for the elements of any arrays used within the struct. The following table shows some examples of STRUCT prompt values and how they are interpreted:

      Struct field types Struct value Semantic equivalent
      STRUCT<STRING> ('Describe the city of Seattle') 'Describe the city of Seattle'
      STRUCT<STRING, STRING, STRING> ('Describe the city ', my_city_column, ' in 15 words') 'Describe the city my_city_column_value in 15 words'
      STRUCT<STRING, ARRAY<STRING>> ('Describe ', ['the city of', 'Seattle']) 'Describe the city of Seattle'
      STRUCT<STRING, ObjectRefRuntime> ('Describe this city', OBJ.GET_ACCESS_URL(image_objectref_column, 'r')) 'Describe this city' image
      STRUCT<STRING, ObjectRefRuntime, ObjectRefRuntime> ('If the city in the first image is within the country of the second image, provide a ten word description of the city',
      OBJ.GET_ACCESS_URL(city_image_objectref_column, 'r'),
      OBJ.GET_ACCESS_URL(country_image_objectref_column, 'r'))
      'If the city in the first image is within the country of the second image, provide a ten word description of the city' city_image country_image

    • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.
    • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95.

      Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

    • temperature: a FLOAT64 value in the range [0.0,1.0] that controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

    • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

    • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

    • ground_with_google_search: a BOOL value that determines whether the Vertex AI model uses Grounding with Google Search when generating responses. Grounding lets the model use additional information from the internet when generating a response, in order to make model responses more specific and factual. When both flatten_json_output and this field are set to TRUE, an additional ml_generate_text_grounding_result column is included in the results, providing the sources that the model used to gather additional information. The default is FALSE.
    • safety_settings: an ARRAY<STRUCT<STRING AS category, STRING AS threshold>> value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify both STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold) and STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, the BLOCK_MEDIUM_AND_ABOVE safety setting is used.

      Supported categories are as follows:

      • HARM_CATEGORY_HATE_SPEECH
      • HARM_CATEGORY_DANGEROUS_CONTENT
      • HARM_CATEGORY_HARASSMENT
      • HARM_CATEGORY_SEXUALLY_EXPLICIT

      Supported thresholds are as follows:

      • BLOCK_NONE (Restricted)
      • BLOCK_LOW_AND_ABOVE
      • BLOCK_MEDIUM_AND_ABOVE (Default)
      • BLOCK_ONLY_HIGH
      • HARM_BLOCK_THRESHOLD_UNSPECIFIED

      For more information, refer to the definition of safety category and blocking threshold.

    Details

    The model and input table must be in the same region.

Claude

You must enable Claude models in Vertex AI before you can use them. For more information, see Enable a partner model.

ML.GENERATE_TEXT(
MODEL project_id.dataset.model,
{ TABLE project_id.dataset.table | (query_statement) },
STRUCT(
  [max_output_tokens AS max_output_tokens]
  [, top_k AS top_k]
  [, top_p AS top_p]
  [, flatten_json_output AS flatten_json_output])
)

Arguments

ML.GENERATE_TEXT takes the following arguments:

  • project_id: your project ID.

  • dataset: the BigQuery dataset that contains the model.

  • model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

    You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use the query_statement argument instead and provide a SELECT statement that includes an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement: the GoogleSQL query that generates the prompt data. The query must produce a column named prompt.

    • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,4096]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.
    • top_k: an INT64 value in the range [1,40] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.

      A top_k value of 1 means the next selected token is the most probable among all tokens in the model's vocabulary, while a top_k value of 3 means that the next token is selected from among the three most probable tokens by using the temperature value.

      For each token selection step, the top_k tokens with the highest probabilities are sampled. Then tokens are further filtered based on the top_p value, with the final token selected using temperature sampling.

    • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.

      Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

    • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

    Details

    The model and input table must be in the same region.

    Llama

    You must enable Llama models in Vertex AI before you can use them. For more information, see Enable a partner model.

    ML.GENERATE_TEXT(
    MODEL project_id.dataset.model,
    { TABLE project_id.dataset.table | (query_statement) },
    STRUCT(
      [max_output_tokens AS max_output_tokens]
      [, top_p AS top_p]
      [, temperature AS temperature]
      [, flatten_json_output AS flatten_json_output]
      [, stop_sequences AS stop_sequences])
    )
    

    Arguments

    ML.GENERATE_TEXT takes the following arguments:

    • project_id: your project ID.

    • dataset: the BigQuery dataset that contains the model.

    • model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

      You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use the query_statement argument instead and provide a SELECT statement that includes an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement: the GoogleSQL query that generates the prompt data. The query must produce a column named prompt.

    • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,4096]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.
    • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95. If you don't specify a value, the model determines an appropriate value.

      Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

    • temperature: a FLOAT64 value in the range [0.0,1.0] that controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

    • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

    • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

    Details

    The model and input table must be in the same region.

    Mistral AI

    You must enable Mistral AI models in Vertex AI before you can use them. For more information, see Enable a partner model.

    ML.GENERATE_TEXT(
    MODEL project_id.dataset.model,
    { TABLE project_id.dataset.table | (query_statement) },
    STRUCT(
      [max_output_tokens AS max_output_tokens]
      [, top_p AS top_p]
      [, temperature AS temperature]
      [, flatten_json_output AS flatten_json_output]
      [, stop_sequences AS stop_sequences])
    )
    

    Arguments

    ML.GENERATE_TEXT takes the following arguments:

    • project_id: your project ID.

    • dataset: the BigQuery dataset that contains the model.

    • model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

      You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use the query_statement argument instead and provide a SELECT statement that includes an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement: the GoogleSQL query that generates the prompt data. The query must produce a column named prompt.

    • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,4096]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.
    • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95. If you don't specify a value, the model determines an appropriate value.

      Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

    • temperature: a FLOAT64 value in the range [0.0,1.0] that controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

    • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

    • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

    Details

    The model and input table must be in the same region.

    Open models

    ML.GENERATE_TEXT(
    MODEL project_id.dataset.model,
    { TABLE project_id.dataset.table | (query_statement) },
    STRUCT(
      [max_output_tokens AS max_output_tokens]
      [, top_k AS top_k]
      [, top_p AS top_p]
      [, temperature AS temperature]
      [, flatten_json_output AS flatten_json_output])
    )
    

    Arguments

    ML.GENERATE_TEXT takes the following arguments:

    • project_id: your project ID.

    • dataset: the BigQuery dataset that contains the model.

    • model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

      You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

  • table: the name of the BigQuery table that contains the prompt data. The text in the column that's named prompt is sent to the model. If your table does not have a prompt column, use the query_statement argument instead and provide a SELECT statement that includes an alias for an existing table column. An error occurs if no prompt column is available.

  • query_statement: the GoogleSQL query that generates the prompt data. The query must produce a column named prompt.

    • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,4096]. Specify a lower value for shorter responses and a higher value for longer responses. If you don't specify a value, the model determines an appropriate value.
    • top_k: an INT64 value in the range [1,40] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.

      A top_k value of 1 means the next selected token is the most probable among all tokens in the model's vocabulary, while a top_k value of 3 means that the next token is selected from among the three most probable tokens by using the temperature value.

      For each token selection step, the top_k tokens with the highest probabilities are sampled. Then tokens are further filtered based on the top_p value, with the final token selected using temperature sampling.

    • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. If you don't specify a value, the model determines an appropriate value.

      Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

    • temperature: a FLOAT64 value in the range [0.0,1.0] that controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. If you don't specify a value, the model determines an appropriate value.

    • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

    Details

    The model and input table must be in the same region.

    Syntax for object tables

    ML.GENERATE_TEXT syntax differs depending on the Vertex AI model that your remote models references. Choose the option appropriate for your use case.

    gemini-2.0-flash

    ML.GENERATE_TEXT(
    MODEL project_id.dataset.model,
    TABLE project_id.dataset.table,
    STRUCT(
      prompt AS prompt
      [, max_output_tokens AS max_output_tokens]
      [, top_p AS top_p]
      [, temperature AS temperature]
      [, flatten_json_output AS flatten_json_output]
      [, stop_sequences AS stop_sequences]
      [, safety_settings AS safety_settings])
    )

    Arguments

    ML.GENERATE_TEXT takes the following arguments:

    • project_id: your project ID.

    • dataset: the BigQuery dataset that contains the model.

    • model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

      You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

    • table: the name of the object table that contains the content to analyze. For more information on what types of content you can analyze, see Input.

      The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the ML.GENERATE_TEXT function.

    • query_statement: the GoogleSQL query that generates the image data. You can only specify WHERE, ORDER BY, and LIMIT clauses in the query.

    • prompt: a STRING value that contains the prompt to use to analyze the visual content. The prompt value must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
    • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.
    • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95.

      Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

    • temperature: a FLOAT64 value in the range [0.0,1.0] that controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

    • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

    • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

    • safety_settings: an ARRAY<STRUCT<STRING AS category, STRING AS threshold>> value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify both STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold) and STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, the BLOCK_MEDIUM_AND_ABOVE safety setting is used.

      Supported categories are as follows:

      • HARM_CATEGORY_HATE_SPEECH
      • HARM_CATEGORY_DANGEROUS_CONTENT
      • HARM_CATEGORY_HARASSMENT
      • HARM_CATEGORY_SEXUALLY_EXPLICIT

      Supported thresholds are as follows:

      • BLOCK_NONE (Restricted)
      • BLOCK_LOW_AND_ABOVE
      • BLOCK_MEDIUM_AND_ABOVE (Default)
      • BLOCK_ONLY_HIGH
      • HARM_BLOCK_THRESHOLD_UNSPECIFIED

      For more information, refer to the definition of safety category and blocking threshold.

    Details

    The model and input table must be in the same region.

    gemini-1.5-flash

    ML.GENERATE_TEXT(
    MODEL project_id.dataset.model,
    TABLE project_id.dataset.table,
    STRUCT(
      prompt AS prompt
      [, max_output_tokens AS max_output_tokens]
      [, top_p AS top_p]
      [, temperature AS temperature]
      [, flatten_json_output AS flatten_json_output]
      [, stop_sequences AS stop_sequences]
      [, safety_settings AS safety_settings])
    )

    Arguments

    ML.GENERATE_TEXT takes the following arguments:

    • project_id: your project ID.

    • dataset: the BigQuery dataset that contains the model.

    • model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

      You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

    • table: the name of the object table that contains the content to analyze. For more information on what types of content you can analyze, see Input.

      The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the ML.GENERATE_TEXT function.

    • query_statement: the GoogleSQL query that generates the image data. You can only specify WHERE, ORDER BY, and LIMIT clauses in the query.

    • prompt: a STRING value that contains the prompt to use to analyze the visual content. The prompt value must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
    • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.
    • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95.

      Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

    • temperature: a FLOAT64 value in the range [0.0,1.0] that controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

    • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

    • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

    • safety_settings: an ARRAY<STRUCT<STRING AS category, STRING AS threshold>> value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify both STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold) and STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, the BLOCK_MEDIUM_AND_ABOVE safety setting is used.

      Supported categories are as follows:

      • HARM_CATEGORY_HATE_SPEECH
      • HARM_CATEGORY_DANGEROUS_CONTENT
      • HARM_CATEGORY_HARASSMENT
      • HARM_CATEGORY_SEXUALLY_EXPLICIT

      Supported thresholds are as follows:

      • BLOCK_NONE (Restricted)
      • BLOCK_LOW_AND_ABOVE
      • BLOCK_MEDIUM_AND_ABOVE (Default)
      • BLOCK_ONLY_HIGH
      • HARM_BLOCK_THRESHOLD_UNSPECIFIED

      For more information, refer to the definition of safety category and blocking threshold.

    Details

    The model and input table must be in the same region.

    gemini-1.5-pro

    ML.GENERATE_TEXT(
    MODEL project_id.dataset.model,
    TABLE project_id.dataset.table,
    STRUCT(
      prompt AS prompt
      [, max_output_tokens AS max_output_tokens]
      [, top_p AS top_p]
      [, temperature AS temperature]
      [, flatten_json_output AS flatten_json_output]
      [, stop_sequences AS stop_sequences]
      [, safety_settings AS safety_settings])
    )

    Arguments

    ML.GENERATE_TEXT takes the following arguments:

    • project_id: your project ID.

    • dataset: the BigQuery dataset that contains the model.

    • model: the name of the remote model over the Vertex AI model. For more information about how to create this type of remote model, see The CREATE MODEL statement for remote models over LLMs.

      You can confirm what model is used by the remote model by opening the Google Cloud console and looking at the Remote endpoint field in the model details page.

    • table: the name of the object table that contains the content to analyze. For more information on what types of content you can analyze, see Input.

      The Cloud Storage bucket used by the input object table must be in the same project where you have created the model and where you are calling the ML.GENERATE_TEXT function.

    • query_statement: the GoogleSQL query that generates the image data. You can only specify WHERE, ORDER BY, and LIMIT clauses in the query.

    • prompt: a STRING value that contains the prompt to use to analyze the visual content. The prompt value must contain less than 16,000 tokens. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words.
    • max_output_tokens: an INT64 value that sets the maximum number of tokens that can be generated in the response. A token might be smaller than a word and is approximately four characters. One hundred tokens correspond to approximately 60-80 words. This value must be in the range [1,8192]. Specify a lower value for shorter responses and a higher value for longer responses. The default is 128.
    • top_p: a FLOAT64 value in the range [0.0,1.0] that changes how the model selects tokens for output. Specify a lower value for less random responses and a higher value for more random responses. The default is 0.95.

      Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top_p value is 0.5, then the model selects either A or B as the next token by using the temperature value and doesn't consider C.

    • temperature: a FLOAT64 value in the range [0.0,1.0] that controls the degree of randomness in token selection. Lower temperature values are good for prompts that require a more deterministic and less open-ended or creative response, while higher temperature values can lead to more diverse or creative results. A temperature value of 0 is deterministic, meaning that the highest probability response is always selected. The default is 0.

    • flatten_json_output: a BOOL value that determines whether the JSON content returned by the function is parsed into separate columns. The default is FALSE.

    • stop_sequences: an ARRAY<STRING> value that removes the specified strings if they are included in responses from the model. Strings are matched exactly, including capitalization. The default is an empty array.

    • safety_settings: an ARRAY<STRUCT<STRING AS category, STRING AS threshold>> value that configures content safety thresholds to filter responses. The first element in the struct specifies a harm category, and the second element in the struct specifies a corresponding blocking threshold. The model filters out content that violate these settings. You can only specify each category once. For example, you can't specify both STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold) and STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_ONLY_HIGH' AS threshold). If there is no safety setting for a given category, the BLOCK_MEDIUM_AND_ABOVE safety setting is used.

      Supported categories are as follows:

      • HARM_CATEGORY_HATE_SPEECH
      • HARM_CATEGORY_DANGEROUS_CONTENT
      • HARM_CATEGORY_HARASSMENT
      • HARM_CATEGORY_SEXUALLY_EXPLICIT

      Supported thresholds are as follows:

      • BLOCK_NONE (Restricted)
      • BLOCK_LOW_AND_ABOVE
      • BLOCK_MEDIUM_AND_ABOVE (Default)
      • BLOCK_ONLY_HIGH
      • HARM_BLOCK_THRESHOLD_UNSPECIFIED

      For more information, refer to the definition of safety category and blocking threshold.

    Details

    The model and input table must be in the same region.

    Output

    ML.GENERATE_TEXT returns the input table plus the following columns:

    Gemini API models

    • ml_generate_text_result: This is the JSON response from the projects.locations.endpoints.generateContent call to the model. The generated text is in the text element. The safety attributes are in the safety_ratings element. This column is returned when flatten_json_output is FALSE.
    • ml_generate_text_llm_result: a STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.
    • ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.
    • ml_generate_text_grounding_result: a STRING value that contains a list of the grounding sources that the model used to gather additional information. This column is returned when both flatten_json_output and ground_with_google_search are TRUE.

    Claude models

    • ml_generate_text_result: This is the JSON response from the projects.locations.endpoints.rawPredict call to the model. The generated text is in the content element. This column is returned when flatten_json_output is FALSE.
    • ml_generate_text_llm_result: a STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.
    • ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.

    LLama models

    • ml_generate_text_result: This is the JSON response from the projects.locations.endpoints.rawPredict call to the model. The generated text is in the content element. This column is returned when flatten_json_output is FALSE.
    • ml_generate_text_llm_result: a STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.
    • ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.

    Mistral AI models

    • ml_generate_text_result: This is the JSON response from the projects.locations.endpoints.rawPredict call to the model. The generated text is in the content element. This column is returned when flatten_json_output is FALSE.
    • ml_generate_text_llm_result: a STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.
    • ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.

    Open models

    • ml_generate_text_result: This is the JSON response from the projects.locations.endpoints.predict call to the model. The generated text is in the predictions element. This column is returned when flatten_json_output is FALSE.
    • ml_generate_text_llm_result: a STRING value that contains the generated text. This column is returned when flatten_json_output is TRUE.
    • ml_generate_text_status: a STRING value that contains the API response status for the corresponding row. This value is empty if the operation was successful.

    Examples

    Text analysis

    Example 1

    This example shows a request to a Claude model that provides a single prompt.

    SELECT *
    FROM
      ML.GENERATE_TEXT(
        MODEL `mydataset.claude_model`,
        (SELECT 'What is the purpose of dreams?' AS prompt));

    Example 2

    This example shows a request to a gemini-1.5-pro model that provides prompt data from a table column named question that is aliased as prompt.

    SELECT *
    FROM
      ML.GENERATE_TEXT(
        MODEL `mydataset.pro15_model`,
        (SELECT question AS prompt FROM `mydataset.prompt_table`));

    Example 3

    This example shows a request to a gemini-1.5-flash model that concatenates strings and a table column to provide the prompt data.

    SELECT *
    FROM
      ML.GENERATE_TEXT(
        MODEL `mydataset.flash15_model`,
        (
          SELECT
            CONCAT(
              'Classify the sentiment of the following text as positive or negative.Text:',
              input_column,
              'Sentiment:') AS prompt
          FROM `mydataset.input_table`));

    Example 4

    This example shows a request a gemini-2.0-flash-001 model that excludes model responses that contain the strings Golf or football.

    SELECT *
    FROM
      ML.GENERATE_TEXT(
        MODEL
          `mydataset.flash2_model`,
        TABLE `mydataset.prompt_table`,
        STRUCT(
          .15 AS TEMPERATURE,
          TRUE AS flatten_json_output,
          ['Golf', 'football'] AS stop_sequences));

    Example 5

    This example shows a request to a gemini-1.5-flash model with the following characteristics:

    • Provides prompt data from a table column that's named prompt.
    • Flattens the JSON response into separate columns.
    • Retrieves and returns public web data for response grounding.
    SELECT *
    FROM
      ML.GENERATE_TEXT(
        MODEL
          `mydataset.flash15_model`,
        TABLE `mydataset.prompt_table`,
        STRUCT(
          TRUE AS flatten_json_output,
          TRUE AS ground_with_google_search));

    Example 6

    This example shows a request to a gemini-1.5-flash model with the following characteristics:

    • Provides prompt data from a table column that's named prompt.
    • Returns a shorter generated text response.
    • Filters out unsafe responses by using safety settings.
    SELECT *
    FROM
      ML.GENERATE_TEXT(
        MODEL
          `mydataset.flash15_model`,
        TABLE `mydataset.prompt_table`,
        STRUCT(
          75 AS max_output_tokens,
          [STRUCT('HARM_CATEGORY_HATE_SPEECH' AS category,
            'BLOCK_LOW_AND_ABOVE' AS threshold),
          STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category,
            'BLOCK_MEDIUM_AND_ABOVE' AS threshold)] AS safety_settings));

    Visual content analysis

    Example 1

    This example adds product description information to a table by analyzing the object data in an ObjectRef column named image:

    UPDATE mydataset.products
    SET
      image_description = (
        SELECT
          ml_generate_text_llm_result
        FROM
          ML.GENERATE_TEXT(
            MODEL `mydataset.geminiflash2`,
            (
              SELECT
                ('Can you describe the following image?', OBJ.GET_ACCESS_URL(image, 'r')) AS prompt
            ),
            STRUCT(
              TRUE AS FLATTEN_JSON_OUTPUT))
      )
    WHERE image IS NOT NULL;

    Example 2

    This example analyzes visual content from an object table that's named dogs and identifies the breed of dog contained in the content. The content returned is filtered by the specified safety settings:

    SELECT
      uri,
      ml_generate_text_llm_result
    FROM
      ML.GENERATE_TEXT(
        MODEL
          `mydataset.dog_identifier_model`,
        TABLE `mydataset.dogs`
          STRUCT(
            'What is the breed of the dog?' AS PROMPT,
            .01 AS TEMPERATURE,
            TRUE AS FLATTEN_JSON_OUTPUT,
            [STRUCT('HARM_CATEGORY_HATE_SPEECH' AS category,
              'BLOCK_LOW_AND_ABOVE' AS threshold),
            STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category,
              'BLOCK_MEDIUM_AND_ABOVE' AS threshold)] AS safety_settings));

    Audio content analysis

    This example translates and transcribes audio content from an object table that's named feedback:

    SELECT
      uri,
      ml_generate_text_llm_result
    FROM
      ML.GENERATE_TEXT(
        MODEL
          `mydataset.audio_model`,
            TABLE `mydataset.feedback`,
              STRUCT(
              'What is the content of this audio clip, translated into Spanish?' AS PROMPT,
              .01 AS TEMPERATURE,
              TRUE AS FLATTEN_JSON_OUTPUT));

    PDF content analysis

    This example classifies PDF content from an object table that's named documents:

    SELECT
      uri,
      ml_generate_text_llm_result
    FROM
      ML.GENERATE_TEXT(
        MODEL
          `mydataset.classify_model`
            TABLE `mydataset.documents`
              STRUCT(
              'Classify this document using the following categories: legal, tax-related, real estate' AS PROMPT,
              .2 AS TEMPERATURE,
              TRUE AS FLATTEN_JSON_OUTPUT));

    Locations

    ML.GENERATE_TEXT must run in the same region or multi-region as the remote model that the function references.

    With the exception of Gemini 2.0 models, you can create remote models over built-in Vertex AI models in all of the regions that support Generative AI APIS, and also in the US and EU multi-regions. For Gemini 2.0 models, you can create remote models in the us-central1 region and the US multi-region.

    You can create remote models over Claude models in all of the supported regions for Claude models.

    You can create remote models over Mistral AI models in all of the supported regions for Mistral AI models.

    Quotas

    See Vertex AI and Cloud AI service functions quotas and limits.

    Known issues

    This section contains information about known issues.

    Resource exhausted errors

    Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:

    A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>
    

    This issue occurs because BigQuery query jobs finish successfully even if the function fails for some of the rows. The function fails when the volume of API calls to the remote endpoint exceeds the quota limits for that service. This issue occurs most often when you are running multiple parallel batch queries. BigQuery retries these calls, but if the retries fail, the resource exhausted error message is returned.

    To iterate through inference calls until all rows are successfully processed, you can use the BigQuery remote inference SQL scripts or the BigQuery remote inference pipeline Dataform package. To try the BigQuery ML remote inference SQL script, see Handle quota errors by calling ML.GENERATE_TEXT iteratively.

    What's next