Modular API | Boundary Documentation

Requires BAML version >=0.79.0

First and foremost, BAML provides a high level API where functions are a first class citizen and their execution is fully transparent to the developer. This means that you can simply call a BAML function and everything from prompt rendering, HTTP request building, LLM API network call and response parsing is handled for you. Basic example:

BAML

1 class Resume {
2   name string
3   experience string[]
4   education string[]
5 }
6 
7 function ExtractResume(resume: string) -> Resume {
8   client "openai/gpt-4o"
9   prompt #"
10     Extract the following information from the resume:
11 
12     ---
13     {{ resume }}
14     ---
15 
16     {{ ctx.output_format }}
17   "#
18 }

Now we can use this function in our server code after running baml-cli generate:

1 from baml_client import b
2 
3 async def run():
4   # HTTP request + LLM response parsing.
5   resume = await b.ExtractResume("John Doe | Software Engineer | BSc in CS")
6   print(resume)

However, sometimes we may want to execute a function without so much abstraction or have access to the HTTP request before sending it. For this, BAML provides a lower level API that exposes the HTTP request and LLM response parser to the caller. Here’s an example that uses the requests library in Python, the fetch API in Node.js and the Net::HTTP library in Ruby to manually send an HTTP request to OpenAI’s API and parse the LLM response.

1 import requests
2 # requests is not async so for simplicity we'll use the sync client.
3 from baml_client.sync_client import b
4 
5 def run():
6   # Get the HTTP request object.
7   req = b.request.ExtractResume("John Doe | Software Engineer | BSc in CS")
8 
9   # Send the HTTP request.
10   res = requests.post(url=req.url, headers=req.headers, json=req.body.json())
11 
12   # Parse the LLM response.
13   parsed = b.parse.ExtractResume(response.json()["choices"][0]["message"]["content"])
14 
15   # Fully parsed Resume type.
16   print(parsed)

Note that request.body.json() returns an object (dict in Python, hash in Ruby) which we are then serializing to JSON, but request.body also exposes the raw binary buffer so we can skip the serialization:

1 res = requests.post(url=req.url, headers=req.headers, data=req.body.raw())

Using Provider SDKs

We can use the same modular API with the official SDKs. Here are some examples:

OpenAI

1 from openai import AsyncOpenAI
2 from baml_client import b
3 
4 async def run():
5   # Initialize the OpenAI client.
6   client = AsyncOpenAI()
7 
8   # Get the HTTP request object.
9   req = await b.request.ExtractResume("John Doe | Software Engineer | BSc in CS")
10 
11   # Use the openai library to send the request.
12   res = await client.chat.completions.create(**req.body.json())
13 
14   # Parse the LLM response.
15   parsed = b.parse.ExtractResume(res.choices[0].message.content)
16 
17   # Fully parsed Resume type.
18   print(parsed)

Anthropic

Remember that the client is defined in the BAML function (or you can use the client registry):

BAML

1 function ExtractResume(resume: string) -> Resume {
2   client "anthropic/claude-3-haiku"
3   // Prompt here...
4 }

1 import anthropic
2 from baml_client import b
3 
4 async def run():
5   # Initialize the Anthropic client.
6   client = anthropic.AsyncAnthropic()
7 
8   # Get the HTTP request object.
9   req = await b.request.ExtractResume("John Doe | Software Engineer | BSc in CS")
10 
11   # Use the anthropic library to send the request.
12   res = await client.messages.create(**req.body.json())
13 
14   # Parse the LLM response.
15   parsed = b.parse.ExtractResume(res.content[0].text)
16 
17   # Fully parsed Resume type.
18   print(parsed)

Google Gemini

Remember that the client is defined in the BAML function (or you can use the client registry):

BAML

1 function ExtractResume(resume: string) -> Resume {
2   client "google-ai/gemini-1.5-pro-001"
3   // Prompt here...
4 }

1 from google import genai
2 from baml_client import b
3 
4 async def run():
5   # Initialize the Gemini client.
6   client = genai.Client()
7 
8   # Get the HTTP request object.
9   req = await b.request.ExtractResume("John Doe | Software Engineer | BSc in CS")
10 
11   # Get the request body.
12   body = req.body.json()
13 
14   # Use the gemini library to send the request.
15   res = await client.aio.models.generate_content(
16     model="gemini-1.5-pro-001",
17     contents=body["contents"],
18     config={
19       "safety_settings": [body["safetySettings"]] # REST API uses camelCase
20     }
21   )
22 
23   # Parse the LLM response.
24   parsed = b.parse.ExtractResume(res.text)
25 
26   # Fully parsed Resume type.
27   print(parsed)

Type Checking

Python

The return type of request.body.json() is Any so you won’t get full type checking in Python when using the SDKs. Here are some workarounds:

1. Using typing.cast

OpenAI

Anthropic

OpenAI

1 import typing
2 from openai.types.chat import ChatCompletion
3 
4 res = typing.cast(ChatCompletion, await client.chat.completions.create(**req.body.json()))

2. Manually setting the arguments

OpenAI

1 body = req.body.json()
2 res = await client.chat.completions.create(model=body["model"], messages=body["messages"])

This will preserve the type hints for the OpenAI SDK but it doesn’t work for Anthropic. On the other hand, Gemini SDK / REST API is built in such a way that it basically forces us to use this pattern as seen in the example above.

TypeScript

TypeScript doesn’t have optional parameters like Python, it uses objects instead so you can just cast to the expected type:

OpenAI

Anthropic

Gemini

OpenAI

1 import { ChatCompletionCreateParamsNonStreaming } from 'openai/resources';
2 
3 const res = await client.chat.completions.create(req.body.json() as ChatCompletionCreateParamsNonStreaming)

Streaming

Stream requests and parsing is also supported. Here’s an example using OpenAI SDK:

1 import typing
2 from openai import AsyncOpenAI, AsyncStream
3 from openai.types.chat import ChatCompletionChunk
4 from baml_client import b
5 
6 async def run():
7   client = AsyncOpenAI()
8 
9   req = await b.stream_request.ExtractResume("John Doe | Software Engineer | BSc in CS")
10 
11   stream = typing.cast(
12     AsyncStream[ChatCompletionChunk],
13     await client.chat.completions.create(**req.body.json())
14   )
15 
16   llm_response: list[str] = []
17 
18   async for chunk in stream:
19     if len(chunk.choices) > 0 and chunk.choices[0].delta.content is not None:
20       llm_response.append(chunk.choices[0].delta.content)
21       # You can parse the partial responses as they come in.
22       print(b.parse_stream.ExtractResume("".join(llm_response)))

OpenAI Batch API Example

Currently, BAML doesn’t support OpenAI’s Batch API out of the box, but you can use the modular API to build the prompts and parse the responses of batch jobs. Here’s an example:

1 import asyncio
2 import json
3 from openai import AsyncOpenAI
4 from baml_py import HTTPRequest as BamlHttpRequest
5 from baml_client import b
6 from baml_client import types
7 
8 async def run():
9   client = AsyncOpenAI()
10 
11   # Build the batch requests with BAML.
12   john_req, jane_req = await asyncio.gather(
13     b.request.ExtractResume("John Doe | Software Engineer | BSc in CS"),
14     b.request.ExtractResume("Jane Smith | Data Scientist | PhD in Statistics"),
15   )
16 
17   # Build the JSONL content.
18   jsonl = to_openai_jsonl(john_req) + to_openai_jsonl(jane_req)
19 
20   # Create the batch input file.
21   batch_input_file = await client.files.create(
22     file=jsonl.encode("utf-8"),
23     purpose="batch",
24   )
25 
26   # Create the batch.
27   batch = await client.batches.create(
28     input_file_id=batch_input_file.id,
29     endpoint="/v1/chat/completions",
30     completion_window="24h",
31     metadata={
32       "description": "BAML Modular API Python Batch Example"
33     },
34   )
35 
36   # Wait for the batch to complete (exponential backoff).
37   backoff = 2
38   attempts = 0
39   max_attempts = 5
40 
41   while True:
42     batch = await client.batches.retrieve(batch.id)
43     attempts += 1
44 
45     if batch.status == "completed":
46         break
47 
48     if attempts >= max_attempts:
49       try:
50         await client.batches.cancel(batch.id)
51       finally:
52         raise Exception("Batch failed to complete in time")
53 
54     await asyncio.sleep(backoff)
55     back_off *= 2
56 
57   # Retrieve the batch output file.
58   output = await client.files.content(batch.output_file_id)
59 
60   # You can match the batch results using the BAML request IDs.
61   expected = {
62     john_req.id: types.Resume(
63       name="John Doe",
64       experience=["Software Engineer"],
65       education=["BSc in CS"]
66     ),
67     jane_req.id: types.Resume(
68       name="Jane Smith",
69       experience=["Data Scientist"],
70       education=["PhD in Statistics"]
71     ),
72   }
73 
74   resumes = {}
75 
76   for line in output.text.splitlines():
77     result = json.loads(line)
78     llm_response = result["response"]["body"]["choices"][0]["message"]["content"]
79 
80     parsed = b.parse.ExtractResume(llm_response)
81     resumes[result["custom_id"]] = parsed
82 
83   print(resumes)
84 
85   # Should be equal.
86   assert resumes == expected
87 
88 
89 def to_openai_jsonl(req: BamlHttpRequest) -> str:
90   """ Helper that converts a BAML HTTP request to OpenAI JSONL format. """
91   line = json.dumps({
92     "custom_id": req.id, # Important for matching the batch results.
93     "method": "POST",
94     "url": "/v1/chat/completions",
95     "body": req.body.json(),
96   })
97 
98   return f"{line}\n"