AsyncClient / SyncClient

BAML generates both a sync client and an async client. They offer the exact same public API but methods are either synchronous or asynchronous.

BAML Functions

The generated client exposes all the functions that you’ve defined your BAML files as methods. Suppose we have this file named baml_src/literature.baml:

baml_src/literature.baml
1function TellMeAStory() -> string {
2 client "openai/gpt-4o"
3 prompt #"
4 Tell me a story
5 "#
6}
7
8function WriteAPoemAbout(input: string) -> string {
9 client "openai/gpt-4o"
10 prompt #"
11 Write a poem about {{ input }}
12 "#
13}

After running baml-cli generate you can directly call these functions from your code. Here’s an example using the async client:

1from baml_client.async_client import b
2
3async def example():
4 # Call your BAML functions.
5 story = await b.TellMeAStory()
6 poem = await b.WriteAPoemAbout("Roses")

The sync client is exactly the same but it doesn’t need an async runtime, instead it just blocks.

1from baml_client.sync_client import b
2
3def example():
4 # Call your BAML functions.
5 story = b.TellMeAStory()
6 poem = b.WriteAPoemAbout("Roses")

Call Patterns

The client object exposes some references to other objects that call your functions in a different manner.

.stream

The .stream object is used to stream the response from a function.

1from baml_client.async_client import b
2
3async def example():
4 stream = b.stream.TellMeAStory()
5
6 async for partial in stream:
7 print(partial)
8
9 print(await stream.get_final_response())

.request

This feature was added in: v0.79.0

The .request object returns the raw HTTP request but it does not send it. However, the async client still returns an awaitable object because we might need to resolve media types like images and convert them to base64 or the required format in order to send them to the LLM.

1from baml_client.async_client import b
2
3async def example():
4 request = await b.request.TellMeAStory()
5 print(request.url)
6 print(request.headers)
7 print(request.body.json())

.stream_request

This feature was added in: v0.79.0

Same as .request but sets the streaming options to true.

1from baml_client.async_client import b
2
3async def example():
4 request = await b.stream_request.TellMeAStory()
5 print(request.url)
6 print(request.headers)
7 print(request.body.json())

.parse

This feature was added in: v0.79.0

The .parse object is used to parse the response returned by the LLM after the function call. Can be used in combination with .request.

1import requests
2# requests is not async so for simplicity we'll use the sync client.
3from baml_client.sync_client import b
4
5def example():
6 # Get the HTTP request.
7 request = b.request.TellMeAStory()
8
9 # Send the HTTP request.
10 response = requests.post(request.url, headers=request.headers, json=request.body.json())
11
12 # Parse the LLM response.
13 parsed = b.parse.TellMeAStory(response.json()["choices"][0]["message"]["content"])
14
15 # Fully parsed response.
16 print(parsed)

.parse_stream

This feature was added in: v0.79.0

Same as .parse but for streaming responses. Can be used in combination with .stream_request.

1from openai import AsyncOpenAI
2from baml_client.async_client import b
3
4async def example():
5 client = AsyncOpenAI()
6
7 request = await b.stream_request.TellMeAStory()
8 stream = await client.chat.completions.create(**request.body.json())
9
10 llm_response: list[str] = []
11 async for chunk in stream:
12 if len(chunk.choices) > 0 and chunk.choices[0].delta.content is not None:
13 llm_response.append(chunk.choices[0].delta.content)
14 print(b.parse_stream.TellMeAStory("".join(llm_response)))