vertex-ai
vertex-ai
vertex-ai
The vertex-ai provider is used to interact with the Google Vertex AI services.
vertex-ai now supports Anthropic models!Example using a Vertex API Key (Express Mode):
To get started quickly, we recommend using Express Mode with a Vertex API Key. This avoids service account setup and works well for prototyping.
See Google’s guide: Use Vertex API keys (Express Mode).
See also Express mode overview.
When using a Vertex API Key, set the key query parameter and specify your project_id and location:
When in doubt, check the ‘cURL’ tab in the playground to see the exact request being sent!
Notes:
project_id cannot be inferred when using an API key; set it explicitly.credentials unset when using an API key, so BAML does not prefer service account auth.You should see the VERTEX_API_KEY environment variable in the playground API Keys dialog. You can set it there and you’re all set!
If no vertex api key is set, BAML will by default try to authenticate using application default credentials
This is what the MY_APPLICATION_CREDENTIALS_CONTENT environment variable looks like:
BAML accepts this blob as a string, a path to a file, or a JSON object.
Here is the order of authentication:
GOOGLE_APPLICATION_CREDENTIALS environment variable is set, it will use the specified service accountgcloud auth application-default login, it will find the
credentials generated by gcloud by the path convention. Note that you will still
need to set either options.project_id or the GOOGLE_CLOUD_PROJECT environment variable.gcloud is available on the PATH, it will use gcloud auth print-access-tokenYou need to use an account with a ProjectID that has been authorized to use Vertex. When administering your Google Cloud account, be sure to enable Vertex, and set up ADC:
If you’re using Google Cloud application default credentials, you can expect authentication to work out of the box.
Setting options.credentials will take precedence and force vertex-ai to load
service account credentials from that file path.
To use a vertex-ai client in the playground, you need to run gcloud auth application-default login in the terminal and set the
GOOGLE_CLOUD_PROJECT environment variable in the “API Keys” dialog. The
playground will then use these credentials to auth all Vertex API calls.
If you’re having issues with vertex-ai authentication, you can try setting
BAML_INTERNAL_LOG=debug to see more detailed logs.
To understand these logs, it’ll help to understand the auth implementation of the vertex-ai provider.
The vertex-ai provider uses one of 3 strategies to authenticate with Google Cloud:
AuthStrategy::JsonString(value: String) - parse value as a JSON
object, and use that to resolve a service accountAuthStrategy::JsonFile(path: String) - read the file at path (relative to
the process’ current working directory), parse it as a JSON object, and use that
to resolve a service accountAuthStrategy::SystemDefault - try 3 strategies in order:
.config/gcloud/application_default_credentials.json; elsegcloud is available on the PATH and if so, use gcloud auth print-access-tokenWe choose one of the three strategies based on the following rules, in order:
credentials provided?
AuthStrategy::JsonString with credentials.AuthStrategy::JsonObject with credentials (this is probably only
relevant if you’re using the ClientRegistry API in baml_client).AuthStrategy::JsonFile with credentials.GOOGLE_APPLICATION_CREDENTIALS set?
AuthStrategy::JsonString with GOOGLE_APPLICATION_CREDENTIALSAuthStrategy::JsonFile with GOOGLE_APPLICATION_CREDENTIALSAuthStrategy::SystemDefaultWe use the REST API to send requests to Vertex AI, and you can debug these using the BAML playground and switch from showing “Prompt Preview” to “Raw cURL”, which will show you the exact request the BAML runtime will construct and send.
Non-streaming requests will use {base_url}:generateContent:
Streaming requests will use {base_url}:streamGenerateContent?alt=sse:
If you encounter an error like:
Try setting location: global in your client options:
Some models may only be available in specific regions or through the global endpoint.
optionsThese unique parameters (aka options) modify the API request sent to the provider.
You can use this to modify the headers and base_url for example.
The base URL for the API.
Default: inferred from the project_id and location using the following format:
If the location is global, the base URL will be:
Can be used in lieu of the project_id and location fields, to manually set the request URL.
The Google Cloud project ID hosting the Vertex AI service you want to call.
Default: inferred from the provided credentials (see Authentication).
Vertex requires you to specify the location you want to serve your models from. Some models may only be available in certain locations.
Common locations include:
us-central1us-west1us-east1us-south1See the Vertex AI docs for all locations and supported models.
This field supports any of 3 formats:
See Authentication and Debugging for more information.
Default: env.GOOGLE_APPLICATION_CREDENTIALS
In this case, the path is resolved relative to the CWD of your process.
Since the BAML playground now allows using gcloud auth application-default login, to
authenticate wih GCP, we will soon be deprecating credentials_content.
A string containing service account credentials in JSON format.
See Authentication and Debugging for more information.
Default: env.GOOGLE_APPLICATION_CREDENTIALS_CONTENT
Additional headers to send with the request.
Example:
Query string parameters appended to the request URL.
Example (use a Vertex API Key with Express Mode):
When using an API key, omit credentials and set project_id explicitly.
The role to use if the role is not in the allowed_roles. Default: "user" usually, but some models like OpenAI’s gpt-5 will use "system"
Picked the first role in allowed_roles if not “user”, otherwise “user”.
Which roles should we forward to the API? Default: ["system", "user", "assistant"] usually, but some models like OpenAI’s o1-mini will use ["user", "assistant"]
When building prompts, any role not in this list will be set to the default_role.
A mapping to transform role names before sending to the API. Default: {} (no remapping)
For google-ai provider, the default is: { "assistant": "model" }
This allows you to use standard role names in your prompts (like “user”, “assistant”, “system”) but send different role names to the API. The remapping happens after role validation and default role assignment.
Example:
With this configuration, {{ _.role("user") }} in your prompt will result in a message with role “human” being sent to the API.
Which role metadata should we forward to the API? Default: []
For example you can set this to ["foo", "bar"] to forward the cache policy to the API.
If you do not set allowed_role_metadata, we will not forward any role metadata to the API even if it is set in the prompt.
Then in your prompt you can use something like:
You can use the playground to see the raw curl request to see what is being sent to the API.
Whether the internal LLM client should use the streaming API. Default: true
Then in your prompt you can use something like:
Which finish reasons are allowed? Default: null
Will raise a BamlClientFinishReasonError if the finish reason is not in the allow list. See Exceptions for more details.
Note, only one of finish_reason_allow_list or finish_reason_deny_list can be set.
For example you can set this to ["stop"] to only allow the stop finish reason, all other finish reasons (e.g. length) will treated as failures that PREVENT fallbacks and retries (similar to parsing errors).
Then in your code you can use something like:
Which finish reasons are denied? Default: null
Will raise a BamlClientFinishReasonError if the finish reason is in the deny list. See Exceptions for more details.
Note, only one of finish_reason_allow_list or finish_reason_deny_list can be set.
For example you can set this to ["length"] to stop the function from continuing if the finish reason is length. (e.g. LLM was cut off because it was too long).
Then in your code you can use something like:
media_url_handlerControls how media URLs are processed before sending to the provider. This allows you to override the default behavior for handling images, audio, PDFs, and videos.
Each media type can be configured with one of these modes:
send_base64 - Always download URLs and convert to base64 data URIssend_url - Pass URLs through unchanged to the providersend_url_add_mime_type - Ensure MIME type is present (may require downloading to detect)send_base64_unless_google_url - Only process non-gs:// URLs (keep Google Cloud Storage URLs as-is)If not specified, each provider uses these defaults:
send_base64 when your provider doesn’t support external URLs and you need to embed media contentsend_url when your provider handles URL fetching and you want to avoid the overhead of base64 conversionsend_url_add_mime_type when your provider requires MIME type information (e.g., Vertex AI)send_base64_unless_google_url when working with Google Cloud Storage and want to preserve gs:// URLsURL fetching happens at request time and may add latency. Consider caching or pre-converting frequently used media when using send_base64 mode.
Vertex AI uses send_url_add_mime_type by default for images and audio, which ensures MIME type information is included. This may require downloading the content to detect the MIME type if not provided.
These are other parameters that are passed through to the provider, without modification by BAML. For example if the request has a temperature field, you can define it in the client here so every call has that set.
Consult the specific provider’s documentation for more information.
Safety settings to apply to the request. You can stack different safety settings with a new safetySettings header for each one. See the Google Vertex API Request Docs for more information on what safety settings can be set.
Generation configurations to apply to the request. See the Google Vertex API Request Docs for more information on what properties can be set.
For Gemini models (2.5-pro and later), you can enable thinking mode using thinkingConfig:
thinkingConfig is only for Gemini models. For Claude models on Vertex AI, use the thinking parameter instead. See Extended Thinking for Claude on Vertex AI.
For all other options, see the official Vertex AI documentation.
If you are using models from publishers other than Google, such as Llama from
Meta, use your project endpoint as the base_url in BAML:
For Anthropic Claude models, you can use the simplified configuration with location and project_id:
Claude Model Names on Vertex AI: Use the format model-name@version (e.g., claude-sonnet-4-5@20250929).
Available models include claude-sonnet-4, claude-opus-4, claude-sonnet-4-5, claude-opus-4-5, claude-sonnet-4-6, and claude-opus-4-6.
Check Google Cloud’s Claude documentation for the latest model names and regional availability.
Alternatively, you can use a custom base_url with environment variables:
Where VERTEX_CLAUDE_BASE_URL would be set to something like:
When using a custom base_url, do NOT include the model name at the end. BAML will automatically append /{model}:rawPredict (or :streamRawPredict for streaming) to the base URL.
To enable extended thinking for Claude models on Vertex AI, use the thinking parameter:
Extended thinking is available on Claude Sonnet 4.5+, Claude Opus 4.5+, and later models.
For Claude Opus 4.6, Anthropic recommends using adaptive thinking (type: "adaptive") with an effort parameter instead of manual mode.