> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.boundaryml.com/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.boundaryml.com/_mcp/server.

# vertex-ai

The `vertex-ai` provider is used to interact with the Google Vertex AI services.

As of BAML 0.85.0, `vertex-ai` now supports Anthropic models!

Example using a Vertex API Key (Express Mode):

```baml BAML
client<llm> MyClient {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    location us-central1 // or "global"
    project_id my-project-id
    query_params {
      key env.VERTEX_API_KEY
    }
  }
}
```

## Authentication

### Using a Vertex API Key (Express Mode)

To get started quickly, we recommend using Express Mode with a Vertex API Key.
This avoids service account setup and works well for prototyping.

See Google's guide: [Use Vertex API keys (Express Mode)](https://cloud.google.com/vertex-ai/generative-ai/docs/start/api-keys?usertype=expressmode).

See also [Express mode overview](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview).

When using a Vertex API Key, set the `key` query parameter and specify your `project_id` and `location`:

```baml BAML
client<llm> VertexApiKeyClient {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    location us-central1 // you can also use "global"
    project_id my-project-id
    query_params {
      key env.VERTEX_API_KEY
    }
  }
}
```

**When in doubt, check the 'cURL' tab in the playground to see the exact request being sent!**

Notes:

* `project_id` cannot be inferred when using an API key; set it explicitly.
* Keep `credentials` unset when using an API key, so BAML does not prefer service account auth.

#### Using a Vertex API Key in the playground

You should see the `VERTEX_API_KEY` environment variable in the playground API Keys dialog. You can set it there and you're all set!

### Using Google Application Credentials

If no vertex api key is set, BAML will by default try to authenticate using [application default
credentials](https://cloud.google.com/docs/authentication/application-default-credentials)

```
client<llm> MyClient {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    location us-central1
    project_id my-project-id
    // we will by default try to use this form of authentication.
    credentials env.MY_APPLICATION_CREDENTIALS_CONTENT
  }
}
```

This is what the MY\_APPLICATION\_CREDENTIALS\_CONTENT environment variable looks like:

```json
MY_APPLICATION_CREDENTIALS_CONTENT={
  "type": "service_account",
  "project_id": "my-project-id",
  "private_key_id": "string",
  "private_key": "-----BEGIN PRIVATE KEY-----string\n-----END PRIVATE KEY-----\n"
  ...other fields...
}
```

BAML accepts this blob as a string, a path to a file, or a JSON object.

#### More details on Google Application Credentials

Here is the order of authentication:

* If `GOOGLE_APPLICATION_CREDENTIALS` environment variable is set, it will use the specified service account
* If you have run `gcloud auth application-default login`, it will find the
  credentials generated by `gcloud` by the path convention. Note that you will still
  need to set either `options.project_id` or the `GOOGLE_CLOUD_PROJECT` environment variable.
* If running in GCP, it will query the metadata server to use the attached service account
* If `gcloud` is available on the `PATH`, it will use `gcloud auth print-access-token`

### Requirements

You need to use an account with a ProjectID that has been authorized to use Vertex.
When administering your Google Cloud account, be sure to enable Vertex, and set up ADC:

```bash
gcloud auth application-default login
```

If you're using Google Cloud [application default
credentials](https://cloud.google.com/docs/authentication/application-default-credentials), you
can expect authentication to work out of the box.

Setting [`options.credentials`](#credentials) will take precedence and force `vertex-ai` to load
service account credentials from that file path.

### Playground

To use a `vertex-ai` client in the playground, you need to run `gcloud
auth application-default login` in the terminal and set the
`GOOGLE_CLOUD_PROJECT` environment variable in the "API Keys" dialog. The
playground will then use these credentials to auth all Vertex API calls.

## Debugging

If you're having issues with `vertex-ai` authentication, you can try setting
`BAML_INTERNAL_LOG=debug` to see more detailed logs.

To understand these logs, it'll help to understand the auth implementation of the `vertex-ai` provider.

The `vertex-ai` provider uses one of 3 strategies to authenticate with Google Cloud:

* `AuthStrategy::JsonString(value: String)` - parse `value` as a JSON
  object, and use that to resolve a service account
* `AuthStrategy::JsonFile(path: String)` - read the file at `path` (relative to
  the process' current working directory), parse it as a JSON object, and use that
  to resolve a service account
* `AuthStrategy::SystemDefault` - try 3 strategies in order:
  * resolve credentials from `.config/gcloud/application_default_credentials.json`; else
  * use the service account from the GCP compute environment by querying the metadata server; else
  * check if `gcloud` is available on the `PATH` and if so, use `gcloud auth print-access-token`

We choose one of the three strategies based on the following rules, in order:

1. Is `credentials` provided?
   * If so, and it's a string containing a JSON object, we use `AuthStrategy::JsonString` with `credentials`.
   * If so, and it's a JSON object, we use `AuthStrategy::JsonObject` with `credentials` (this is probably only
     relevant if you're using the [`ClientRegistry`](/ref/baml_client/client-registry) API in `baml_client`).
   * If so, but it's just a regular string, use `AuthStrategy::JsonFile` with `credentials`.
2. Is `GOOGLE_APPLICATION_CREDENTIALS` set?
   * If so, and it looks like a JSON object, we use `AuthStrategy::JsonString` with `GOOGLE_APPLICATION_CREDENTIALS`
   * If so, but it's just a regular string, use `AuthStrategy::JsonFile` with `GOOGLE_APPLICATION_CREDENTIALS`
3. Else, we use `AuthStrategy::SystemDefault`

We use the REST API to send requests to Vertex AI, and you can debug these using
the BAML playground and switch from showing "Prompt Preview" to "Raw cURL", which
will show you the exact request the BAML runtime will construct and send.

Non-streaming requests will use `{base_url}:generateContent`:

```
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:generateContent
```

Streaming requests will use `{base_url}:streamGenerateContent?alt=sse`:

```
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent
```

If you encounter an error like:

```
[modelname] was not found or your project does not have access to it. Please ensure you are using a valid model version.
```

Try setting `location: global` in your client options:

```baml BAML
client<llm> MyClient {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    location global
    project_id my-project-id
  }
}
```

Some models may only be available in specific regions or through the global endpoint.

## BAML-specific request `options`

These unique parameters (aka `options`) modify the API request sent to the provider.

You can use this to modify the `headers` and `base_url` for example.

The base URL for the API.

**Default**: inferred from the `project_id` and `location` using the following format:

```
https://{LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/{LOCATION}/publishers/google/models/
```

If the location is `global`, the base URL will be:

```
https://aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/global/publishers/google/models/
```

Can be used in lieu of the **`project_id`** and **`location`** fields, to manually set the request URL.

The Google Cloud project ID hosting the Vertex AI service you want to call.

**Default**: inferred from the provided credentials (see [`Authentication`](#authentication)).

{/*The anchor is placed above "location" and not "credentials" because this will ensure that "credentials" is
visible on-screen when the user navigates to #credentials, due to how Fern renders its HTML layout.*/}

<a name="credentials" />

Vertex requires you to specify the location you want to serve your models
from. Some models may only be available in certain locations.

Common locations include:

* `us-central1`
* `us-west1`
* `us-east1`
* `us-south1`

See the [Vertex AI docs](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations#united-states)
for all locations and supported models.

{/*The anchor is placed above "credentials" and not "credentials_content" because this will ensure that "credentials_content" is
visible on-screen when the user navigates to #credentials_content, due to how Fern renders its HTML layout.*/}

<a name="credentials_content" />

This field supports any of 3 formats:

* A string containing service account credentials in JSON format.
* Path to a file containing service account credentials in JSON format.
* A JSON object containing service account credentials.

See [Authentication](#authentication) and [Debugging](#debugging) for more information.

**Default: `env.GOOGLE_APPLICATION_CREDENTIALS`**

```baml BAML
client<llm> Vertex {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    location us-central1
    // credentials can be a block string containing service account credentials in JSON format
    credentials #"
      {
        "type": "service_account",
        "project_id": "my-project-id",
        "private_key_id": "string",
        "private_key": "-----BEGIN PRIVATE KEY-----string\n-----END PRIVATE KEY-----\n",
        "client_email": "john_doe@gmail.com",
        "client_id": "123456",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token",
        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
        "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/...",
        "universe_domain": "googleapis.com"
      }
    "#
  }
}

```

In this case, the path is resolved relative to the CWD of your process.

```baml BAML
client<llm> Vertex {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    location us-central1
    credentials "path/to/credentials.json"
  }
}
```

```baml BAML
client<llm> Vertex {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    location us-central1
    // credentials can be a block string containing service account credentials in JSON format
    credentials {
      type "service_account",
      project_id "my-project-id",
      private_key_id "string",
      private_key "-----BEGIN PRIVATE KEY-----string\n-----END PRIVATE KEY-----\n",
      client_email "john_doe@gmail.com",
      client_id "123456",
      auth_uri "https://accounts.google.com/o/oauth2/auth",
      token_uri "https://oauth2.googleapis.com/token",
      auth_provider_x509_cert_url "https://www.googleapis.com/oauth2/v1/certs",
      client_x509_cert_url "https://www.googleapis.com/robot/v1/metadata/...",
      universe_domain "googleapis.com"
    }
  }
}
```

Since the BAML playground now allows using `gcloud auth application-default login`, to
authenticate wih GCP, we will soon be deprecating `credentials_content`.

A string containing service account credentials in JSON format.

See [Authentication](#authentication) and [Debugging](#debugging) for more information.

**Default: `env.GOOGLE_APPLICATION_CREDENTIALS_CONTENT`**

```baml BAML
client<llm> Vertex {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    location us-central1
    // credentials_content is a block string containing service account credentials in JSON format
    credentials_content #"
      {
        "type": "service_account",
        "project_id": "my-project-id",
        "private_key_id": "string",
        "private_key": "-----BEGIN PRIVATE KEY-----string\n-----END PRIVATE KEY-----\n",
        "client_email": "john_doe@gmail.com",
        "client_id": "123456",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://oauth2.googleapis.com/token",
        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
        "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/...",
        "universe_domain": "googleapis.com"
      }
    "#
  }
}

```

The Google model to use for the request.

| Model              | Input(s)                        | Optimized for                                                                                                           |
| ------------------ | ------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| `gemini-2.5-pro`   | Audio, images, videos, and text | Complex reasoning tasks such as code and text generation, text editing, problem solving, data extraction and generation |
| `gemini-2.5-flash` | Audio, images, videos, and text | Fast and versatile performance across a diverse variety of tasks                                                        |
| `gemini-1.0-pro`   | Text                            | Natural language tasks, multi-turn text and code chat, and code generation                                              |

See the [Google Model Docs](https://ai.google.dev/gemini-api/docs/models/gemini) for the latest models.

Additional headers to send with the request.

Example:

```baml BAML
client<llm> MyClient {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    project_id my-project-id
    location us-central1
    // Additional headers
    headers {
      "X-My-Header" "my-value"
    }
  }
}
```

Query string parameters appended to the request URL.

Example (use a Vertex API Key with Express Mode):

```baml BAML
client<llm> MyClient {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    project_id my-project-id
    location us-central1
    query_params {
      key env.VERTEX_API_KEY
    }
  }
}
```

When using an API key, omit `credentials` and set `project_id` explicitly.

The role to use if the role is not in the allowed\_roles. **Default: `"user"` usually, but some models like OpenAI's `gpt-5` will use `"system"`**

Picked the first role in `allowed_roles` if not "user", otherwise "user".

Which roles should we forward to the API? **Default: `["system", "user", "assistant"]` usually, but some models like OpenAI's `o1-mini` will use `["user", "assistant"]`**

When building prompts, any role not in this list will be set to the `default_role`.

A mapping to transform role names before sending to the API. **Default: `{}`** (no remapping)

For google-ai provider, the default is: `{ "assistant": "model" }`

This allows you to use standard role names in your prompts (like "user", "assistant", "system") but send different role names to the API. The remapping happens after role validation and default role assignment.

**Example:**

```json
{
  "user": "human",
  "assistant": "ai",
}
```

With this configuration, `{{ _.role("user") }}` in your prompt will result in a message with role "human" being sent to the API.

Which role metadata should we forward to the API? **Default: `[]`**

For example you can set this to `["foo", "bar"]` to forward the cache policy to the API.

If you do not set `allowed_role_metadata`, we will not forward any role metadata to the API even if it is set in the prompt.

Then in your prompt you can use something like:

```baml
client<llm> Foo {
  provider openai
  options {
    allowed_role_metadata: ["foo", "bar"]
  }
}

client<llm> FooWithout {
  provider openai
  options {
  }
}
template_string Foo() #"
  {{ _.role('user', foo={"type": "ephemeral"}, bar="1", cat=True) }}
  This will be have foo and bar, but not cat metadata. But only for Foo, not FooWithout.
  {{ _.role('user') }}
  This will have none of the role metadata for Foo or FooWithout.
"#
```

You can use the playground to see the raw curl request to see what is being sent to the API.

Whether the internal LLM client should use the streaming API. **Default: `true`**

Then in your prompt you can use something like:

```baml
client<llm> MyClientWithoutStreaming {
  provider anthropic
  options {
    model claude-3-5-haiku-20241022
    api_key env.ANTHROPIC_API_KEY
    max_tokens 1000
    supports_streaming false
  }
}

function MyFunction() -> string {
  client MyClientWithoutStreaming
  prompt #"Write a short story"#
}
```

```python
# This will be streamed from your python code perspective, 
# but under the hood it will call the non-streaming HTTP API
# and then return a streamable response with a single event
b.stream.MyFunction()

# This will work exactly the same as before
b.MyFunction()
```

Which finish reasons are allowed? **Default: `null`**

version 0.73.0 onwards: This is case insensitive.

Will raise a `BamlClientFinishReasonError` if the finish reason is not in the allow list. See [Exceptions](/guide/baml-basics/error-handling#bamlclientfinishreasonerror) for more details.

Note, only one of `finish_reason_allow_list` or `finish_reason_deny_list` can be set.

For example you can set this to `["stop"]` to only allow the stop finish reason, all other finish reasons (e.g. `length`) will treated as failures that PREVENT fallbacks and retries (similar to parsing errors).

Then in your code you can use something like:

```baml
client<llm> MyClient {
  provider "openai"
  options {
    model "gpt-5-mini"
    api_key env.OPENAI_API_KEY
    // Finish reason allow list will only allow the stop finish reason
    finish_reason_allow_list ["stop"]
  }
}
```

Which finish reasons are denied? **Default: `null`**

version 0.73.0 onwards: This is case insensitive.

Will raise a `BamlClientFinishReasonError` if the finish reason is in the deny list. See [Exceptions](/guide/baml-basics/error-handling#bamlclientfinishreasonerror) for more details.

Note, only one of `finish_reason_allow_list` or `finish_reason_deny_list` can be set.

For example you can set this to `["length"]` to stop the function from continuing if the finish reason is `length`. (e.g. LLM was cut off because it was too long).

Then in your code you can use something like:

```baml
client<llm> MyClient {
  provider "openai"
  options {
    model "gpt-5-mini"
    api_key env.OPENAI_API_KEY
    // Finish reason deny list will allow all finish reasons except length
    finish_reason_deny_list ["length"]
  }
}
```

### `media_url_handler`

Controls how media URLs are processed before sending to the provider. This allows you to override the default behavior for handling images, audio, PDFs, and videos.

```baml
client<llm> MyClient {
  provider openai
  options {
    media_url_handler {
      image "send_base64"                    // Options: send_base64 | send_url | send_url_add_mime_type | send_base64_unless_google_url
      audio "send_url"
      pdf "send_url_add_mime_type"
      video "send_url"
    }
  }
}
```

#### Options

Each media type can be configured with one of these modes:

* **`send_base64`** - Always download URLs and convert to base64 data URIs
* **`send_url`** - Pass URLs through unchanged to the provider
* **`send_url_add_mime_type`** - Ensure MIME type is present (may require downloading to detect)
* **`send_base64_unless_google_url`** - Only process non-gs\:// URLs (keep Google Cloud Storage URLs as-is)

#### Provider Defaults

If not specified, each provider uses these defaults:

| Provider     | Image                           | Audio                    | PDF           | Video      |
| ------------ | ------------------------------- | ------------------------ | ------------- | ---------- |
| OpenAI       | `send_url`                      | `send_base64`            | `send_url`    | `send_url` |
| Anthropic    | `send_url`                      | `send_url`               | `send_base64` | `send_url` |
| Google AI    | `send_base64_unless_google_url` | `send_url`               | `send_url`    | `send_url` |
| Vertex AI    | `send_url_add_mime_type`        | `send_url_add_mime_type` | `send_url`    | `send_url` |
| AWS Bedrock  | `send_base64`                   | `send_base64`            | `send_base64` | `send_url` |
| Azure OpenAI | `send_url`                      | `send_base64`            | `send_url`    | `send_url` |

#### When to Use

* **Use `send_base64`** when your provider doesn't support external URLs and you need to embed media content
* **Use `send_url`** when your provider handles URL fetching and you want to avoid the overhead of base64 conversion
* **Use `send_url_add_mime_type`** when your provider requires MIME type information (e.g., Vertex AI)
* **Use `send_base64_unless_google_url`** when working with Google Cloud Storage and want to preserve gs\:// URLs

URL fetching happens at request time and may add latency. Consider caching or pre-converting frequently used media when using `send_base64` mode.

Vertex AI uses `send_url_add_mime_type` by default for images and audio, which ensures MIME type information is included. This may require downloading the content to detect the MIME type if not provided.

## Provider request parameters

These are other parameters that are passed through to the provider, without modification by BAML. For example if the request has a `temperature` field, you can define it in the client here so every call has that set.

Consult the specific provider's documentation for more information.

Safety settings to apply to the request. You can stack different safety settings with a new `safetySettings` header for each one. See the [Google Vertex API Request Docs](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference) for more information on what safety settings can be set.

```baml BAML
client<llm> MyClient {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    project_id my-project-id
    location us-central1

    safetySettings {
      category HARM_CATEGORY_HATE_SPEECH
      threshold BLOCK_LOW_AND_ABOVE
      method SEVERITY
    }
  }
}
```

Generation configurations to apply to the request. See the [Google Vertex API Request Docs](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference) for more information on what properties can be set.

```baml BAML
client<llm> MyClient {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    project_id my-project-id
    location us-central1

    generationConfig {
      maxOutputTokens 100
      temperature 1
    }
  }
}
```

For Gemini models (2.5-pro and later), you can enable thinking mode using `thinkingConfig`:

```baml BAML
client<llm> GeminiThinking {
  provider vertex-ai
  options {
    model gemini-2.5-pro
    project_id my-project-id
    location us-central1
    generationConfig {
      thinkingConfig {
        thinkingBudget 1024
        includeThoughts true
      }
    }
  }
}
```

**`thinkingConfig` is only for Gemini models.** For Claude models on Vertex AI, use the `thinking` parameter instead. See [Extended Thinking for Claude on Vertex AI](#extended-thinking-for-claude-on-vertex-ai).

For all other options, see the [official Vertex AI documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal).

## Publishers Other Than Google

If you are using models from publishers other than Google, such as Llama from
Meta, use your project endpoint as the `base_url` in BAML:

```baml
client<llm> VertexLlama {
  provider vertex-ai
  options {
    base_url "https://us-central1-aiplatform.googleapis.com/v1/projects/my-project-id/locations/us-central1/endpoints/"
    location us-central1
  }
}
```

### Anthropic Claude Models on Vertex AI

For Anthropic Claude models, you can use the simplified configuration with `location` and `project_id`:

```baml
client<llm> VertexClaudeSonnet {
  provider vertex-ai
  options {
    model "claude-sonnet-4-5@20250929"
    location us-east5
    project_id my-project-id
    anthropic_version "vertex-2023-10-16"
    credentials env.GOOGLE_APPLICATION_CREDENTIALS
  }
}
```

**Claude Model Names on Vertex AI**: Use the format `model-name@version` (e.g., `claude-sonnet-4-5@20250929`).
Available models include `claude-sonnet-4`, `claude-opus-4`, `claude-sonnet-4-5`, `claude-opus-4-5`, `claude-sonnet-4-6`, and `claude-opus-4-6`.
Check [Google Cloud's Claude documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude) for the latest model names and regional availability.

Alternatively, you can use a custom `base_url` with environment variables:

```baml
client<llm> VertexClaudeWithBaseUrl {
  provider vertex-ai
  options {
    model "claude-sonnet-4-5@20250929"
    anthropic_version "vertex-2023-10-16"
    base_url env.VERTEX_CLAUDE_BASE_URL
  }
}
```

Where `VERTEX_CLAUDE_BASE_URL` would be set to something like:

```bash
VERTEX_CLAUDE_BASE_URL="https://us-east5-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-east5/publishers/anthropic/models"
```

When using a custom `base_url`, do NOT include the model name at the end. BAML will automatically append `/{model}:rawPredict` (or `:streamRawPredict` for streaming) to the base URL.

### Extended Thinking for Claude on Vertex AI

To enable extended thinking for Claude models on Vertex AI, use the `thinking` parameter:

```baml
client<llm> VertexClaudeThinking {
  provider vertex-ai
  options {
    model "claude-sonnet-4-5@20250929"
    location us-east5
    project_id my-project-id
    anthropic_version "vertex-2023-10-16"
    credentials env.GOOGLE_APPLICATION_CREDENTIALS
    max_tokens 4096
    thinking {
      type "enabled"
      budget_tokens 2048
    }
  }
}
```

Extended thinking is available on Claude Sonnet 4.5+, Claude Opus 4.5+, and later models.
For Claude Opus 4.6, Anthropic recommends using adaptive thinking (`type: "adaptive"`) with an `effort` parameter instead of manual mode.