Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Batch predictions are a good option for large volumes of non-latency-sensitive embeddings requests. Key features of batch predictions include:
Large volume: Process a large number of requests in a single batch job instead of one at a time.
Asynchronous processing: Similar to batch prediction for tabular data in Vertex AI, you specify an output location for your results, and the job populates it asynchronously.
Text embeddings models that support batch predictions
All stable versions of text embedding models support batch predictions. Stable versions are versions that are no longer in preview and are fully supported for production environments. To see the full list of supported embedding models, see Embedding model and versions.
Choose an input source
Before you prepare your inputs, decide whether to use JSONL files in Cloud Storage or a BigQuery table. The following table provides a comparison to help you choose the best option for your use case.
Input Source
Description
Use Case
JSONL file in Cloud Storage
A text file where each line is a separate JSON object that contains a prompt.
Use this option when your source data is in files or if you prefer a file-based data pipeline.
BigQuery table
A structured table in BigQuery with a column that contains the prompts.
Use this option when your prompts are stored in BigQuery or are part of a larger structured dataset.
Prepare your inputs
The input for batch requests is a list of prompts stored in either a BigQuery table or a JSON Lines (JSONL) file in Cloud Storage. Each batch request can include up to 30,000 prompts.
JSONL format
Input example
Each line in the input file must be a valid JSON object with a content field that contains the prompt.
{"content":"Give a short description of a machine learning model:"}{"content":"Best recipe for banana bread:"}
Output example
The output is written to a JSONL file where each line contains the instance, the corresponding prediction, and a status.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-15 UTC."],[],[]]