vertex-ai | Boundary Documentation

The vertex-ai provider is used to interact with the Google Vertex AI services.

As of BAML 0.85.0, vertex-ai now supports Anthropic models!

Example:

BAML

1 client<llm> MyClient {
2   provider vertex-ai
3   options {
4     model gemini-1.5-pro
5     location us-central1
6   }
7 }

Authentication

The vertex-ai provider by default will try to authenticate using the following strategies:

If GOOGLE_APPLICATION_CREDENTIALS is set, it will use the specified service account
If you have run gcloud auth application-default login --project MY_PROJECT_ID, it will find the credentials generated by gcloud by the path convention
If running in GCP, it will query the metadata server to use the attached service account
If gcloud is available on the PATH, it will use gcloud auth print-access-token

Requirements

You need to use an account with a ProjectID that has been authorized to use Vertex. When administering your Google Cloud account, be sure to enable Vertex, and specify your project’s ID when authenticating with gcloud auth:

$ gcloud auth application-default login --project MY_PROJECT_ID

If you’re using Google Cloud application default credentials, you can expect authentication to work out of the box.

Setting options.credentials will take precedence and force vertex-ai to load service account credentials from that file path.

Playground

To use a vertex-ai client in the playground, all you need to do is run gcloud auth application-default login in the terminal. The playground will then use these credentials to auth all Vertex API calls.

Debugging

Authentication

If you’re having issues with vertex-ai authentication, you can try setting BAML_INTERNAL_LOG=debug to see more detailed logs.

To understand these logs, it’ll help to understand the auth implementation of the vertex-ai provider.

The vertex-ai provider uses one of 3 strategies to authenticate with Google Cloud:

AuthStrategy::JsonString(value: String) - parse value as a JSON object, and use that to resolve a service account
AuthStrategy::JsonFile(path: String) - read the file at path (relative to the process’ current working directory), parse it as a JSON object, and use that to resolve a service account
AuthStrategy::SystemDefault - try 3 strategies in order:
- resolve credentials from .config/gcloud/application_default_credentials.json; else
- use the service account from the GCP compute environment by querying the metadata server; else
- check if gcloud is available on the PATH and if so, use gcloud auth print-access-token

We choose one of the three strategies based on the following rules, in order:

Is credentials provided?
- If so, and it’s a string containing a JSON object, we use AuthStrategy::JsonString with credentials.
- If so, and it’s a JSON object, we use AuthStrategy::JsonObject with credentials (this is probably only relevant if you’re using the ClientRegistry API in baml_client).
- If so, but it’s just a regular string, use AuthStrategy::JsonFile with credentials.
Is GOOGLE_APPLICATION_CREDENTIALS set?
- If so, and it looks like a JSON object, we use AuthStrategy::JsonString with GOOGLE_APPLICATION_CREDENTIALS
- If so, but it’s just a regular string, use AuthStrategy::JsonFile with GOOGLE_APPLICATION_CREDENTIALS
Else, we use AuthStrategy::SystemDefault

Request protocol

We use the REST API to send requests to Vertex AI, and you can debug these using the BAML playground and switch from showing “Prompt Preview” to “Raw cURL”, which will show you the exact request the BAML runtime will construct and send.

Non-streaming requests will use {base_url}:generateContent:

https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent

Streaming requests will use {base_url}:streamGenerateContent?alt=sse:

https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent

BAML-specific request `options`

These unique parameters (aka options) modify the API request sent to the provider.

You can use this to modify the headers and base_url for example.

base_url

string

The base URL for the API.

Default: inferred from the project_id and location using the following format:

https://{LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/{LOCATION}/publishers/google/models/

If the location is global, the base URL will be:

https://aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/global/publishers/google/models/

Can be used in lieu of the project_id and location fields, to manually set the request URL.

project_id

string

The Google Cloud project ID hosting the Vertex AI service you want to call.

Default: inferred from the provided credentials (see Authentication).

location

stringRequired

Vertex requires you to specify the location you want to serve your models from. Some models may only be available in certain locations.

Common locations include:

us-central1
us-west1
us-east1
us-south1

See the Vertex AI docs for all locations and supported models.

credentials

string | object

This field supports any of 3 formats:

A string containing service account credentials in JSON format.
Path to a file containing service account credentials in JSON format.
A JSON object containing service account credentials.

See Authentication and Debugging for more information.

Default: env.GOOGLE_APPLICATION_CREDENTIALS

Example: string

BAML

1 client<llm> Vertex {
2   provider vertex-ai
3   options {
4     model gemini-1.5-pro
5     location us-central1
6     // credentials can be a block string containing service account credentials in JSON format
7     credentials #"
8       {
9         "type": "service_account",
10         "project_id": "my-project-id",
11         "private_key_id": "string",
12         "private_key": "-----BEGIN PRIVATE KEY-----string\n-----END PRIVATE KEY-----\n",
13         "client_email": "john_doe@gmail.com",
14         "client_id": "123456",
15         "auth_uri": "https://accounts.google.com/o/oauth2/auth",
16         "token_uri": "https://oauth2.googleapis.com/token",
17         "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
18         "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/...",
19         "universe_domain": "googleapis.com"
20       }
21     "#
22   }
23 }

Example: file path

In this case, the path is resolved relative to the CWD of your process.

BAML

1 client<llm> Vertex {
2   provider vertex-ai
3   options {
4     model gemini-1.5-pro
5     location us-central1
6     credentials "path/to/credentials.json"
7   }
8 }

Example: JSON object

BAML

1 client<llm> Vertex {
2   provider vertex-ai
3   options {
4     model gemini-1.5-pro
5     location us-central1
6     // credentials can be a block string containing service account credentials in JSON format
7     credentials {
8       type "service_account",
9       project_id "my-project-id",
10       private_key_id "string",
11       private_key "-----BEGIN PRIVATE KEY-----string\n-----END PRIVATE KEY-----\n",
12       client_email "john_doe@gmail.com",
13       client_id "123456",
14       auth_uri "https://accounts.google.com/o/oauth2/auth",
15       token_uri "https://oauth2.googleapis.com/token",
16       auth_provider_x509_cert_url "https://www.googleapis.com/oauth2/v1/certs",
17       client_x509_cert_url "https://www.googleapis.com/robot/v1/metadata/...",
18       universe_domain "googleapis.com"
19     }
20   }
21 }

credentials_content

string

Since the BAML playground now allows using gcloud auth application-default login, to authenticate wih GCP, we will soon be deprecating credentials_content.

A string containing service account credentials in JSON format.

See Authentication and Debugging for more information.

Default: env.GOOGLE_APPLICATION_CREDENTIALS_CONTENT

Example

BAML

1 client<llm> Vertex {
2   provider vertex-ai
3   options {
4     model gemini-1.5-pro
5     location us-central1
6     // credentials_content is a block string containing service account credentials in JSON format
7     credentials_content #"
8       {
9         "type": "service_account",
10         "project_id": "my-project-id",
11         "private_key_id": "string",
12         "private_key": "-----BEGIN PRIVATE KEY-----string\n-----END PRIVATE KEY-----\n",
13         "client_email": "john_doe@gmail.com",
14         "client_id": "123456",
15         "auth_uri": "https://accounts.google.com/o/oauth2/auth",
16         "token_uri": "https://oauth2.googleapis.com/token",
17         "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
18         "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/...",
19         "universe_domain": "googleapis.com"
20       }
21     "#
22   }
23 }

model

stringRequired

The Google model to use for the request.

Model	Input(s)	Optimized for
`gemini-1.5-pro`	Audio, images, videos, and text	Complex reasoning tasks such as code and text generation, text editing, problem solving, data extraction and generation
`gemini-1.5-flash`	Audio, images, videos, and text	Fast and versatile performance across a diverse variety of tasks
`gemini-1.0-pro`	Text	Natural language tasks, multi-turn text and code chat, and code generation

See the Google Model Docs for the latest models.

headers

object

Additional headers to send with the request.

Example:

BAML

1 client<llm> MyClient {
2   provider vertex-ai
3   options {
4     model gemini-1.5-pro
5     project_id my-project-id
6     location us-central1
7     // Additional headers
8     headers {
9       "X-My-Header" "my-value"
10     }
11   }
12 }

default_role

string

The role to use if the role is not in the allowed_roles. Default: "user" usually, but some models like OpenAI’s gpt-4o will use "system"

Picked the first role in allowed_roles if not “user”, otherwise “user”.

allowed_roles

string[]

Which roles should we forward to the API? Default: ["system", "user", "assistant"] usually, but some models like OpenAI’s o1-mini will use ["user", "assistant"]

When building prompts, any role not in this list will be set to the default_role.

remap_roles

map<string, string>

A mapping to transform role names before sending to the API. Default: {} (no remapping)

For google-ai provider, the default is: { "assistant": "model" }

This allows you to use standard role names in your prompts (like “user”, “assistant”, “system”) but send different role names to the API. The remapping happens after role validation and default role assignment.

Example:

1 {
2   "user": "human",
3   "assistant": "ai",
4 }

With this configuration, {{ _.role("user") }} in your prompt will result in a message with role “human” being sent to the API.

allowed_role_metadata

string[]

Which role metadata should we forward to the API? Default: []

For example you can set this to ["foo", "bar"] to forward the cache policy to the API.

If you do not set allowed_role_metadata, we will not forward any role metadata to the API even if it is set in the prompt.

Then in your prompt you can use something like:

1 client<llm> Foo {
2   provider openai
3   options {
4     allowed_role_metadata: ["foo", "bar"]
5   }
6 }
7 
8 client<llm> FooWithout {
9   provider openai
10   options {
11   }
12 }
13 template_string Foo() #"
14   {{ _.role('user', foo={"type": "ephemeral"}, bar="1", cat=True) }}
15   This will be have foo and bar, but not cat metadata. But only for Foo, not FooWithout.
16   {{ _.role('user') }}
17   This will have none of the role metadata for Foo or FooWithout.
18 "#

You can use the playground to see the raw curl request to see what is being sent to the API.

supports_streaming

boolean

Whether the internal LLM client should use the streaming API. Default: true

Then in your prompt you can use something like:

1 client<llm> MyClientWithoutStreaming {
2   provider anthropic
3   options {
4     model claude-3-haiku-20240307
5     api_key env.ANTHROPIC_API_KEY
6     max_tokens 1000
7     supports_streaming false
8   }
9 }
10 
11 function MyFunction() -> string {
12   client MyClientWithoutStreaming
13   prompt #"Write a short story"#
14 }

1 # This will be streamed from your python code perspective, 
2 # but under the hood it will call the non-streaming HTTP API
3 # and then return a streamable response with a single event
4 b.stream.MyFunction()
5 
6 # This will work exactly the same as before
7 b.MyFunction()

finish_reason_allow_list

string[]

Which finish reasons are allowed? Default: null

version 0.73.0 onwards: This is case insensitive.

Will raise a BamlClientFinishReasonError if the finish reason is not in the allow list. See Exceptions for more details.

Note, only one of finish_reason_allow_list or finish_reason_deny_list can be set.

For example you can set this to ["stop"] to only allow the stop finish reason, all other finish reasons (e.g. length) will treated as failures that PREVENT fallbacks and retries (similar to parsing errors).

Then in your code you can use something like:

1 client<llm> MyClient {
2   provider "openai"
3   options {
4     model "gpt-4o-mini"
5     api_key env.OPENAI_API_KEY
6     // Finish reason allow list will only allow the stop finish reason
7     finish_reason_allow_list ["stop"]
8   }
9 }

finish_reason_deny_list

string[]

Which finish reasons are denied? Default: null

version 0.73.0 onwards: This is case insensitive.

Will raise a BamlClientFinishReasonError if the finish reason is in the deny list. See Exceptions for more details.

Note, only one of finish_reason_allow_list or finish_reason_deny_list can be set.

For example you can set this to ["length"] to stop the function from continuing if the finish reason is length. (e.g. LLM was cut off because it was too long).

Then in your code you can use something like:

1 client<llm> MyClient {
2   provider "openai"
3   options {
4     model "gpt-4o-mini"
5     api_key env.OPENAI_API_KEY
6     // Finish reason deny list will allow all finish reasons except length
7     finish_reason_deny_list ["length"]
8   }
9 }

Provider request parameters

These are other parameters that are passed through to the provider, without modification by BAML. For example if the request has a temperature field, you can define it in the client here so every call has that set.

Consult the specific provider’s documentation for more information.

safetySettings

object

Safety settings to apply to the request. You can stack different safety settings with a new safetySettings header for each one. See the Google Vertex API Request Docs for more information on what safety settings can be set.

BAML

1 client<llm> MyClient {
2   provider vertex-ai
3   options {
4     model gemini-1.5-pro
5     project_id my-project-id
6     location us-central1
7 
8     safetySettings {
9       category HARM_CATEGORY_HATE_SPEECH
10       threshold BLOCK_LOW_AND_ABOVE
11       method SEVERITY
12     }
13   }
14 }

generationConfig

object

Generation configurations to apply to the request. See the Google Vertex API Request Docs for more information on what properties can be set.

BAML

1 client<llm> MyClient {
2   provider vertex-ai
3   options {
4     model gemini-1.5-pro
5     project_id my-project-id
6     location us-central1
7 
8     generationConfig {
9       maxOutputTokens 100
10       temperature 1
11     }
12   }
13 }

For all other options, see the official Vertex AI documentation.

Publishers Other Than Google

If you are using models from publishers other than Google, such as Llama from Meta, use your project endpoint as the base_url in BAML:

1 client<llm> VertexLlama {
2   provider vertex-ai
3   options {
4     base_url "https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/"
5     location us-central1
6   }
7 }

For anthropic

1 client<llm> VertexClaudeSonnet {
2   provider vertex-ai
3   options {
4     model "claude-sonnet-4"
5     anthropic_version "${ANTHROPIC_VERSION}"
6     base_url "https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/anthropic/models"
7   }
8 }

Authentication

Requirements

Playground

Debugging

Authentication

Request protocol

BAML-specific request options

Example: string

Example: file path

Example: JSON object

Example

Provider request parameters

Publishers Other Than Google

BAML-specific request `options`