vertex-ai
The vertex-ai
provider is used to interact with the Google Vertex AI services.
vertex-ai
support for Anthropic models is coming soon.
Example:
Authentication
The vertex-ai
provider by default will try to authenticate using the following strategies:
- if
GOOGLE_APPLICATION_CREDENTIALS
is set, it will use the specified service account - if you have run
gcloud auth application-default login
, it will use those credentials - if running in GCP, it will query the metadata server to use the attached service account
- if
gcloud
is available on thePATH
, it will usegcloud auth print-access-token
If you’re using Google Cloud application default credentials, you can expect authentication to work out of the box.
Setting options.credentials
will take precedence and force vertex-ai
to load
service account credentials from that file path.
Setting options.credentials_content
will also take precedence and force
vertex-ai
to load service account credentials from that string.
Using a `vertex-ai` client in the playground
To use a vertex-ai
client in the playground, you’ll need to create service account credentials.
- Go to IAM & Admin > Service Accounts in the Google Cloud Console.
- Choose the project you want to use.
- Select an existing service account or create a new one.
- Click on the service account, navigate to the
Keys
tab, selectAdd Key
, and selectCreate new key
. - Confirm that the key type will be
JSON
and clickCreate
. - Copy the contents of the downloaded JSON key.
- Open the BAML playground, and click
API Keys
in the top right. - Paste the JSON key into the
GOOGLE_APPLICATION_CREDENTIALS
field.
You should now be able to use a vertex-ai
client in the playground!
Debugging
Authentication
If you’re having issues with vertex-ai
authentication, you can try setting BAML_LOG=debug
to see more detailed logs.
To understand these logs, it’ll help to understand the auth implementation of the vertex-ai
provider.
The vertex-ai
provider uses one of 3 strategies to authenticate with Google Cloud:
AuthStrategy::JsonString(value: String)
- parsevalue
as a JSON object, and use that to resolve a service accountAuthStrategy::JsonFile(path: String)
- read the file atpath
(relative to the process’ current working directory), parse it as a JSON object, and use that to resolve a service accountAuthStrategy::SystemDefault
- try 3 strategies in order:- resolve credentials from
.config/gcloud/application_default_credentials.json
; else - use the service account from the GCP compute environment by querying the metadata server; else
- check if
gcloud
is available on thePATH
and if so, usegcloud auth print-access-token
- resolve credentials from
We choose one of the three strategies based on the following rules, in order:
- Is
credentials
provided?- If so, and it’s a string containing a JSON object, we use
AuthStrategy::JsonFile
withcredentials
. - If so, and it’s a JSON object, we use
AuthStrategy::JsonObject
withcredentials
(this is probably only relevant if you’re using theClientRegistry
API inbaml_client
). - If so, but it’s just a regular string, use
AuthStrategy::JsonFile
withcredentials
.
- If so, and it’s a string containing a JSON object, we use
- Is
credentials_content
provided?- If so, we use
AuthStrategy::JsonString
withcredentials_content
- If so, we use
- Is
GOOGLE_APPLICATION_CREDENTIALS
set?- If so, and it looks like a JSON object, we use
AuthStrategy::JsonString
withGOOGLE_APPLICATION_CREDENTIALS
- If so, but it’s just a regular string, use
AuthStrategy::JsonFile
withGOOGLE_APPLICATION_CREDENTIALS
- If so, and it looks like a JSON object, we use
- Is
GOOGLE_APPLICATION_CREDENTIALS_CONTENT
set?- If so, we use
AuthStrategy::JsonString
withGOOGLE_APPLICATION_CREDENTIALS_CONTENT
- If so, we use
- Else, we use
AuthStrategy::SystemDefault
Request protocol
We use the REST API to send requests to Vertex AI, and you can debug these using the BAML playground and switch from showing “Prompt Preview” to “Raw cURL”, which will show you the exact request the BAML runtime will construct and send.
Non-streaming requests will use {base_url}:generateContent
:
Streaming requests will use {base_url}:streamGenerateContent?alt=sse
:
BAML-specific request options
These unique parameters (aka options
) modify the API request sent to the provider.
You can use this to modify the headers
and base_url
for example.
The base URL for the API.
Default: inferred from the project_id
and location
using the following format:
Can be used in lieu of the project_id
and location
fields, to manually set the request URL.
The Google Cloud project ID hosting the Vertex AI service you want to call.
Default: inferred from the provided credentials (see Authentication
).
Vertex requires you to specify the location you want to serve your models from. Some models may only be available in certain locations.
Common locations include:
us-central1
us-west1
us-east1
us-south1
See the Vertex AI docs for all locations and supported models.
This field supports any of 3 formats:
- A string containing service account credentials in JSON format.
- Path to a file containing service account credentials in JSON format.
- A JSON object containing service account credentials.
See Authentication and Debugging for more information.
Default: env.GOOGLE_APPLICATION_CREDENTIALS
Example: string
Example: file path
In this case, the path is resolved relative to the CWD of your process.
Example: JSON object
A string containing service account credentials in JSON format.
See Authentication and Debugging for more information.
Default: env.GOOGLE_APPLICATION_CREDENTIALS_CONTENT
Example
We do not recommend using credentials_content
in production; it is only
intended for use in the BAML playground.
Additional headers to send with the request.
Example:
The role to use if the role is not in the allowed_roles. Default: "user"
usually, but some models like OpenAI’s gpt-4o
will use "system"
Picked the first role in allowed_roles
if not “user”, otherwise “user”.
Which roles should we forward to the API? Default: ["system", "user", "assistant"]
usually, but some models like OpenAI’s o1-mini
will use ["user", "assistant"]
When building prompts, any role not in this list will be set to the default_role
.
Which role metadata should we forward to the API? Default: []
For example you can set this to ["foo", "bar"]
to forward the cache policy to the API.
If you do not set allowed_role_metadata
, we will not forward any role metadata to the API even if it is set in the prompt.
Then in your prompt you can use something like:
You can use the playground to see the raw curl request to see what is being sent to the API.
Whether the internal LLM client should use the streaming API. Default: true
Then in your prompt you can use something like:
Which finish reasons are allowed? Default: null
Will raise a BamlClientFinishReasonError
if the finish reason is not in the allow list. See Exceptions for more details.
Note, only one of finish_reason_allow_list
or finish_reason_deny_list
can be set.
For example you can set this to ["stop"]
to only allow the stop finish reason, all other finish reasons (e.g. length
) will treated as failures that PREVENT fallbacks and retries (similar to parsing errors).
Then in your code you can use something like:
Which finish reasons are denied? Default: null
Will raise a BamlClientFinishReasonError
if the finish reason is in the deny list. See Exceptions for more details.
Note, only one of finish_reason_allow_list
or finish_reason_deny_list
can be set.
For example you can set this to ["length"]
to stop the function from continuing if the finish reason is length
. (e.g. LLM was cut off because it was too long).
Then in your code you can use something like:
Provider request parameters
These are other parameters that are passed through to the provider, without modification by BAML. For example if the request has a temperature
field, you can define it in the client here so every call has that set.
Consult the specific provider’s documentation for more information.
Safety settings to apply to the request. You can stack different safety settings with a new safetySettings
header for each one. See the Google Vertex API Request Docs for more information on what safety settings can be set.
Generation configurations to apply to the request. See the Google Vertex API Request Docs for more information on what properties can be set.
For all other options, see the official Vertex AI documentation.
Publishers Other Than Google
If you are using models from publishers other than Google, such as Llama from
Meta, use your project endpoint as the base_url
in BAML: