The vertex-ai provider is used to interact with the Google Vertex AI services, specifically the following endpoints:

https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent

Example:

BAML
1client<llm> MyClient {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 project_id my-project-id
6 location us-central1
7 }
8}

Authorization

The vertex-ai provider uses the Google Cloud SDK to authenticate with a temporary access token. We generate these Google Cloud Authentication Tokens using Google Cloud service account credentials. We do not store this token, and it is only used for the duration of the request.

Instructions for downloading Google Cloud credentials

  1. Go to the Google Cloud Console.
  2. Click on the project you want to use.
  3. Select the IAM & Admin section, and click on Service Accounts.
  4. Select an existing service account or create a new one.
  5. Click on the service account and select Add Key.
  6. Choose the JSON key type and click Create.
  7. Set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path of the file.

See the Google Cloud Application Default Credentials Docs for more information.

The project_id of your client object must match the project_id of your credentials file.

The options are passed through directly to the API, barring a few. Here’s a shorthand of the options:

Non-forwarded options

base_url
string

The base URL for the API.

Default: https://{LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/{LOCATION}/publishers/google/models/

Can be used in lieu of the project_id and location fields, to manually set the request URL.

project_id
stringRequired

Vertex requires a Google Cloud project ID for each request. See the Google Cloud Project ID Docs for more information.

location
stringRequired

Vertex requires a location for each request. Some locations may have different models avaiable.

Common locations include:

  • us-central1
  • us-west1
  • us-east1
  • us-south1

See the Vertex Location Docs for all locations and supported models.

credentials
string | object

Path to a JSON credentials file or a JSON object containing the credentials.

Default: env.GOOGLE_APPLICATION_CREDENTIALS

This field cannot be used in the BAML Playground. For the playground, use the credentials_content instead.

credentials_content
string

Overrides contents of the Google Cloud Application Credentials. Default: env.GOOGLE_APPLICATION_CREDENTIALS_CONTENT

Only use this for the BAML Playground only. Use credentials for your runtime code.

authorization
string

Directly set Google Cloud Authentication Token in lieu of token generation via env.GOOGLE_APPLICATION_CREDENTIALS or env.GOOGLE_APPLICATION_CREDENTIALS_CONTENT fields.

model
stringRequired

The Google model to use for the request.

ModelInput(s)Optimized for
gemini-1.5-proAudio, images, videos, and textComplex reasoning tasks such as code and text generation, text editing, problem solving, data extraction and generation
gemini-1.5-flashAudio, images, videos, and textFast and versatile performance across a diverse variety of tasks
gemini-1.0-proTextNatural language tasks, multi-turn text and code chat, and code generation

See the Google Model Docs for the latest models.

headers
object

Additional headers to send with the request.

Example:

BAML
1client<llm> MyClient {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 project_id my-project-id
6 location us-central1
7 // Additional headers
8 headers {
9 "X-My-Header" "my-value"
10 }
11 }
12}
default_role
string

The role to use if the role is not in the allowed_roles. Default: "user" usually, but some models like OpenAI’s gpt-4o will use "system"

Picked the first role in allowed_roles if not “user”, otherwise “user”.

allowed_roles
string[]

Which roles should we forward to the API? Default: ["system", "user", "assistant"] usually, but some models like OpenAI’s o1-mini will use ["user", "assistant"]

When building prompts, any role not in this list will be set to the default_role.

allowed_role_metadata
string[]

Which role metadata should we forward to the API? Default: []

For example you can set this to ["foo", "bar"] to forward the cache policy to the API.

If you do not set allowed_role_metadata, we will not forward any role metadata to the API even if it is set in the prompt.

Then in your prompt you can use something like:

1client<llm> Foo {
2 provider openai
3 options {
4 allowed_role_metadata: ["foo", "bar"]
5 }
6}
7
8client<llm> FooWithout {
9 provider openai
10 options {
11 }
12}
13template_string Foo() #"
14 {{ _.role('user', foo={"type": "ephemeral"}, bar="1", cat=True) }}
15 This will be have foo and bar, but not cat metadata. But only for Foo, not FooWithout.
16 {{ _.role('user') }}
17 This will have none of the role metadata for Foo or FooWithout.
18"#

You can use the playground to see the raw curl request to see what is being sent to the API.

supports_streaming
boolean

Whether the internal LLM client should use the streaming API. Default: true

Then in your prompt you can use something like:

1client<llm> MyClientWithoutStreaming {
2 provider anthropic
3 options {
4 model claude-3-haiku-20240307
5 api_key env.ANTHROPIC_API_KEY
6 max_tokens 1000
7 supports_streaming false
8 }
9}
10
11function MyFunction() -> string {
12 client MyClientWithoutStreaming
13 prompt #"Write a short story"#
14}
1# This will be streamed from your python code perspective,
2# but under the hood it will call the non-streaming HTTP API
3# and then return a streamable response with a single event
4b.stream.MyFunction()
5
6# This will work exactly the same as before
7b.MyFunction()

Forwarded options

safetySettings
object

Safety settings to apply to the request. You can stack different safety settings with a new safetySettings header for each one. See the Google Vertex API Request Docs for more information on what safety settings can be set.

BAML
1client<llm> MyClient {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 project_id my-project-id
6 location us-central1
7
8 safetySettings {
9 category HARM_CATEGORY_HATE_SPEECH
10 threshold BLOCK_LOW_AND_ABOVE
11 method SEVERITY
12 }
13 }
14}
generationConfig
object

Generation configurations to apply to the request. See the Google Vertex API Request Docs for more information on what properties can be set.

BAML
1client<llm> MyClient {
2 provider vertex-ai
3 options {
4 model gemini-1.5-pro
5 project_id my-project-id
6 location us-central1
7
8 generationConfig {
9 maxOutputTokens 100
10 temperature 1
11 }
12 }
13}

For all other options, see the official Vertex AI documentation.

Publishers Other Than Google

If you are using models from publishers other than Google, such as Llama from Meta, use your project endpoint as the base_url in BAML:

1client<llm> VertexLlama {
2 provider vertex-ai
3 options {
4 base_url "https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/"
5 project_id my-project-id
6 location us-central1
7 }
8}