ollama | Boundary Documentation

Ollama supports the OpenAI client, allowing you to use the openai-generic provider with an overridden base_url.

Note that to call Ollama, you must use its OpenAI-compatible /v1 endpoint. See Ollama’s OpenAI compatibility documentation.

You can try out BAML with Ollama at promptfiddle.com, by running OLLAMA_ORIGINS='*' ollama serve. Learn more in here

BAML

1 client<llm> MyClient {
2   provider "openai-generic"
3   options {
4     base_url "http://localhost:11434/v1"
5     model llama3
6   }
7 }

BAML-specific request `options`

These unique parameters (aka options) modify the API request sent to the provider.

You can use this to modify the headers and base_url for example.

base_url

string

The base URL for the API. Default: http://localhost:11434/v1

Note the /v1 at the end of the URL. See Ollama’s OpenAI compatability

headers

object

Additional headers to send with the request.

Example:

BAML

1 client<llm> MyClient {
2   provider ollama
3   options {
4     model "llama3"
5     headers {
6       "X-My-Header" "my-value"
7     }
8   }
9 }

default_role

string

The role to use if the role is not in the allowed_roles. Default: "user" usually, but some models like OpenAI’s gpt-4o will use "system"

Picked the first role in allowed_roles if not “user”, otherwise “user”.

allowed_roles

string[]

Which roles should we forward to the API? Default: ["system", "user", "assistant"] usually, but some models like OpenAI’s o1-mini will use ["user", "assistant"]

When building prompts, any role not in this list will be set to the default_role.

allowed_role_metadata

string[]

Which role metadata should we forward to the API? Default: []

For example you can set this to ["foo", "bar"] to forward the cache policy to the API.

If you do not set allowed_role_metadata, we will not forward any role metadata to the API even if it is set in the prompt.

Then in your prompt you can use something like:

1 client<llm> Foo {
2   provider openai
3   options {
4     allowed_role_metadata: ["foo", "bar"]
5   }
6 }
7 
8 client<llm> FooWithout {
9   provider openai
10   options {
11   }
12 }
13 template_string Foo() #"
14   {{ _.role('user', foo={"type": "ephemeral"}, bar="1", cat=True) }}
15   This will be have foo and bar, but not cat metadata. But only for Foo, not FooWithout.
16   {{ _.role('user') }}
17   This will have none of the role metadata for Foo or FooWithout.
18 "#

You can use the playground to see the raw curl request to see what is being sent to the API.

supports_streaming

boolean

Whether the internal LLM client should use the streaming API. Default: true

Then in your prompt you can use something like:

1 client<llm> MyClientWithoutStreaming {
2   provider anthropic
3   options {
4     model claude-3-haiku-20240307
5     api_key env.ANTHROPIC_API_KEY
6     max_tokens 1000
7     supports_streaming false
8   }
9 }
10 
11 function MyFunction() -> string {
12   client MyClientWithoutStreaming
13   prompt #"Write a short story"#
14 }

1 # This will be streamed from your python code perspective, 
2 # but under the hood it will call the non-streaming HTTP API
3 # and then return a streamable response with a single event
4 b.stream.MyFunction()
5 
6 # This will work exactly the same as before
7 b.MyFunction()

Provider request parameters

These are other parameters that are passed through to the provider, without modification by BAML. For example if the request has a temperature field, you can define it in the client here so every call has that set.

Consult the specific provider’s documentation for more information.

messages

DO NOT USE

BAML will auto construct this field for you from the prompt

stream

DO NOT USE

BAML will auto construct this field for you based on how you call the client in your code

model

string

The model to use.

Model	Description
`llama3`	Meta Llama 3: The most capable openly available LLM to date
`qwen2`	Qwen2 is a new series of large language models from Alibaba group
`phi3`	Phi-3 is a family of lightweight 3B (Mini) and 14B (Medium) state-of-the-art open models by Microsoft
`aya`	Aya 23, released by Cohere, is a new family of state-of-the-art, multilingual models that support 23 languages.
`mistral`	The 7B model released by Mistral AI, updated to version 0.3.
`gemma`	Gemma is a family of lightweight, state-of-the-art open models built by Google DeepMind. Updated to version 1.1
`mixtral`	A set of Mixture of Experts (MoE) model with open weights by Mistral AI in 8x7b and 8x22b parameter sizes.

For the most up-to-date list of models supported by Ollama, see their Model Library.

To use a specific version you would do: "mixtral:8x22b"