The vertex-ai provider is used to interact with the Google Vertex AI services.

vertex-ai support for Anthropic models is coming soon.

Example:

BAML
1client<llm> MyClient {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 location us-central1
6 }
7}

Authentication

The vertex-ai provider by default will try to authenticate using the following strategies:

  • if GOOGLE_APPLICATION_CREDENTIALS is set, it will use the specified service account
  • if you have run gcloud auth application-default login, it will use those credentials
  • if running in GCP, it will query the metadata server to use the attached service account
  • if gcloud is available on the PATH, it will use gcloud auth print-access-token

If you’re using Google Cloud application default credentials, you can expect authentication to work out of the box.

Setting options.credentials will take precedence and force vertex-ai to load service account credentials from that file path.

Setting options.credentials_content will also take precedence and force vertex-ai to load service account credentials from that string.

To use a vertex-ai client in the playground, you’ll need to create service account credentials.

  1. Go to IAM & Admin > Service Accounts in the Google Cloud Console.
  2. Choose the project you want to use.
  3. Select an existing service account or create a new one.
  4. Click on the service account, navigate to the Keys tab, select Add Key, and select Create new key.
  5. Confirm that the key type will be JSON and click Create.
  6. Copy the contents of the downloaded JSON key.
  7. Open the BAML playground, and click API Keys in the top right.
  8. Paste the JSON key into the GOOGLE_APPLICATION_CREDENTIALS field.

You should now be able to use a vertex-ai client in the playground!

Debugging

If you’re having issues with vertex-ai authentication, you can try setting BAML_LOG=debug to see more detailed logs.

To understand these logs, it’ll help to understand the auth implementation of the vertex-ai provider.

The vertex-ai provider uses one of 3 strategies to authenticate with Google Cloud:

  • AuthStrategy::JsonString(value: String) - parse value as a JSON object, and use that to resolve a service account
  • AuthStrategy::JsonFile(path: String) - read the file at path (relative to the process’ current working directory), parse it as a JSON object, and use that to resolve a service account
  • AuthStrategy::SystemDefault - try 3 strategies in order:
    • resolve credentials from .config/gcloud/application_default_credentials.json; else
    • use the service account from the GCP compute environment by querying the metadata server; else
    • check if gcloud is available on the PATH and if so, use gcloud auth print-access-token

We choose one of the three strategies based on the following rules, in order:

  1. Is credentials provided?
    • If so, and it’s a string containing a JSON object, we use AuthStrategy::JsonFile with credentials.
    • If so, and it’s a JSON object, we use AuthStrategy::JsonObject with credentials (this is probably only relevant if you’re using the ClientRegistry API in baml_client).
    • If so, but it’s just a regular string, use AuthStrategy::JsonFile with credentials.
  2. Is credentials_content provided?
    • If so, we use AuthStrategy::JsonString with credentials_content
  3. Is GOOGLE_APPLICATION_CREDENTIALS set?
    • If so, and it looks like a JSON object, we use AuthStrategy::JsonString with GOOGLE_APPLICATION_CREDENTIALS
    • If so, but it’s just a regular string, use AuthStrategy::JsonFile with GOOGLE_APPLICATION_CREDENTIALS
  4. Is GOOGLE_APPLICATION_CREDENTIALS_CONTENT set?
    • If so, we use AuthStrategy::JsonString with GOOGLE_APPLICATION_CREDENTIALS_CONTENT
  5. Else, we use AuthStrategy::SystemDefault

We use the REST API to send requests to Vertex AI, and you can debug these using the BAML playground and switch from showing “Prompt Preview” to “Raw cURL”, which will show you the exact request the BAML runtime will construct and send.

Non-streaming requests will use {base_url}:generateContent:

https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent

Streaming requests will use {base_url}:streamGenerateContent?alt=sse:

https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent

BAML-specific request options

These unique parameters (aka options) modify the API request sent to the provider.

You can use this to modify the headers and base_url for example.

base_url
string

The base URL for the API.

Default: inferred from the project_id and location using the following format:

https://{LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/{LOCATION}/publishers/google/models/

Can be used in lieu of the project_id and location fields, to manually set the request URL.

project_id
string

The Google Cloud project ID hosting the Vertex AI service you want to call.

Default: inferred from the provided credentials (see Authentication).

location
stringRequired

Vertex requires you to specify the location you want to serve your models from. Some models may only be available in certain locations.

Common locations include:

  • us-central1
  • us-west1
  • us-east1
  • us-south1

See the Vertex AI docs for all locations and supported models.

credentials
string | object

This field supports any of 3 formats:

  • A string containing service account credentials in JSON format.
  • Path to a file containing service account credentials in JSON format.
  • A JSON object containing service account credentials.

See Authentication and Debugging for more information.

Default: env.GOOGLE_APPLICATION_CREDENTIALS

BAML
1client<llm> Vertex {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 location us-central1
6 // credentials can be a block string containing service account credentials in JSON format
7 credentials #"
8 {
9 "type": "service_account",
10 "project_id": "my-project-id",
11 "private_key_id": "string",
12 "private_key": "-----BEGIN PRIVATE KEY-----string\n-----END PRIVATE KEY-----\n",
13 "client_email": "john_doe@gmail.com",
14 "client_id": "123456",
15 "auth_uri": "https://accounts.google.com/o/oauth2/auth",
16 "token_uri": "https://oauth2.googleapis.com/token",
17 "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
18 "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/...",
19 "universe_domain": "googleapis.com"
20 }
21 "#
22 }
23}

In this case, the path is resolved relative to the CWD of your process.

BAML
1client<llm> Vertex {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 location us-central1
6 credentials "path/to/credentials.json"
7 }
8}
BAML
1client<llm> Vertex {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 location us-central1
6 // credentials can be a block string containing service account credentials in JSON format
7 credentials {
8 type "service_account",
9 project_id "my-project-id",
10 private_key_id "string",
11 private_key "-----BEGIN PRIVATE KEY-----string\n-----END PRIVATE KEY-----\n",
12 client_email "john_doe@gmail.com",
13 client_id "123456",
14 auth_uri "https://accounts.google.com/o/oauth2/auth",
15 token_uri "https://oauth2.googleapis.com/token",
16 auth_provider_x509_cert_url "https://www.googleapis.com/oauth2/v1/certs",
17 client_x509_cert_url "https://www.googleapis.com/robot/v1/metadata/...",
18 universe_domain "googleapis.com"
19 }
20 }
21}
credentials_content
string

A string containing service account credentials in JSON format.

See Authentication and Debugging for more information.

Default: env.GOOGLE_APPLICATION_CREDENTIALS_CONTENT

BAML
1client<llm> Vertex {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 location us-central1
6 // credentials_content is a block string containing service account credentials in JSON format
7 credentials_content #"
8 {
9 "type": "service_account",
10 "project_id": "my-project-id",
11 "private_key_id": "string",
12 "private_key": "-----BEGIN PRIVATE KEY-----string\n-----END PRIVATE KEY-----\n",
13 "client_email": "john_doe@gmail.com",
14 "client_id": "123456",
15 "auth_uri": "https://accounts.google.com/o/oauth2/auth",
16 "token_uri": "https://oauth2.googleapis.com/token",
17 "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
18 "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/...",
19 "universe_domain": "googleapis.com"
20 }
21 "#
22 }
23}

We do not recommend using credentials_content in production; it is only intended for use in the BAML playground.

model
stringRequired

The Google model to use for the request.

ModelInput(s)Optimized for
gemini-1.5-proAudio, images, videos, and textComplex reasoning tasks such as code and text generation, text editing, problem solving, data extraction and generation
gemini-1.5-flashAudio, images, videos, and textFast and versatile performance across a diverse variety of tasks
gemini-1.0-proTextNatural language tasks, multi-turn text and code chat, and code generation

See the Google Model Docs for the latest models.

headers
object

Additional headers to send with the request.

Example:

BAML
1client<llm> MyClient {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 project_id my-project-id
6 location us-central1
7 // Additional headers
8 headers {
9 "X-My-Header" "my-value"
10 }
11 }
12}
default_role
string

The role to use if the role is not in the allowed_roles. Default: "user" usually, but some models like OpenAI’s gpt-4o will use "system"

Picked the first role in allowed_roles if not “user”, otherwise “user”.

allowed_roles
string[]

Which roles should we forward to the API? Default: ["system", "user", "assistant"] usually, but some models like OpenAI’s o1-mini will use ["user", "assistant"]

When building prompts, any role not in this list will be set to the default_role.

allowed_role_metadata
string[]

Which role metadata should we forward to the API? Default: []

For example you can set this to ["foo", "bar"] to forward the cache policy to the API.

If you do not set allowed_role_metadata, we will not forward any role metadata to the API even if it is set in the prompt.

Then in your prompt you can use something like:

1client<llm> Foo {
2 provider openai
3 options {
4 allowed_role_metadata: ["foo", "bar"]
5 }
6}
7
8client<llm> FooWithout {
9 provider openai
10 options {
11 }
12}
13template_string Foo() #"
14 {{ _.role('user', foo={"type": "ephemeral"}, bar="1", cat=True) }}
15 This will be have foo and bar, but not cat metadata. But only for Foo, not FooWithout.
16 {{ _.role('user') }}
17 This will have none of the role metadata for Foo or FooWithout.
18"#

You can use the playground to see the raw curl request to see what is being sent to the API.

supports_streaming
boolean

Whether the internal LLM client should use the streaming API. Default: true

Then in your prompt you can use something like:

1client<llm> MyClientWithoutStreaming {
2 provider anthropic
3 options {
4 model claude-3-haiku-20240307
5 api_key env.ANTHROPIC_API_KEY
6 max_tokens 1000
7 supports_streaming false
8 }
9}
10
11function MyFunction() -> string {
12 client MyClientWithoutStreaming
13 prompt #"Write a short story"#
14}
1# This will be streamed from your python code perspective,
2# but under the hood it will call the non-streaming HTTP API
3# and then return a streamable response with a single event
4b.stream.MyFunction()
5
6# This will work exactly the same as before
7b.MyFunction()
finish_reason_allow_list
string[]

Which finish reasons are allowed? Default: null

version 0.73.0 onwards: This is case insensitive.

Will raise a BamlClientFinishReasonError if the finish reason is not in the allow list. See Exceptions for more details.

Note, only one of finish_reason_allow_list or finish_reason_deny_list can be set.

For example you can set this to ["stop"] to only allow the stop finish reason, all other finish reasons (e.g. length) will treated as failures that PREVENT fallbacks and retries (similar to parsing errors).

Then in your code you can use something like:

1client<llm> MyClient {
2 provider "openai"
3 options {
4 model "gpt-4o-mini"
5 api_key env.OPENAI_API_KEY
6 // Finish reason allow list will only allow the stop finish reason
7 finish_reason_allow_list ["stop"]
8 }
9}
finish_reason_deny_list
string[]

Which finish reasons are denied? Default: null

version 0.73.0 onwards: This is case insensitive.

Will raise a BamlClientFinishReasonError if the finish reason is in the deny list. See Exceptions for more details.

Note, only one of finish_reason_allow_list or finish_reason_deny_list can be set.

For example you can set this to ["length"] to stop the function from continuing if the finish reason is length. (e.g. LLM was cut off because it was too long).

Then in your code you can use something like:

1client<llm> MyClient {
2 provider "openai"
3 options {
4 model "gpt-4o-mini"
5 api_key env.OPENAI_API_KEY
6 // Finish reason deny list will allow all finish reasons except length
7 finish_reason_deny_list ["length"]
8 }
9}

Provider request parameters

These are other parameters that are passed through to the provider, without modification by BAML. For example if the request has a temperature field, you can define it in the client here so every call has that set.

Consult the specific provider’s documentation for more information.

safetySettings
object

Safety settings to apply to the request. You can stack different safety settings with a new safetySettings header for each one. See the Google Vertex API Request Docs for more information on what safety settings can be set.

BAML
1client<llm> MyClient {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 project_id my-project-id
6 location us-central1
7
8 safetySettings {
9 category HARM_CATEGORY_HATE_SPEECH
10 threshold BLOCK_LOW_AND_ABOVE
11 method SEVERITY
12 }
13 }
14}
generationConfig
object

Generation configurations to apply to the request. See the Google Vertex API Request Docs for more information on what properties can be set.

BAML
1client<llm> MyClient {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 project_id my-project-id
6 location us-central1
7
8 generationConfig {
9 maxOutputTokens 100
10 temperature 1
11 }
12 }
13}

For all other options, see the official Vertex AI documentation.

Publishers Other Than Google

If you are using models from publishers other than Google, such as Llama from Meta, use your project endpoint as the base_url in BAML:

1client<llm> VertexLlama {
2 provider vertex-ai
3 options {
4 base_url "https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/"
5 location us-central1
6 }
7}
Built with