Multi-Modal (Images / Audio) | Boundary Documentation

You can use audio, image, pdf, or video input types in BAML prompts. Just create an input argument of that type and render it in the prompt.

Switch from “Prompt Review” to “Raw cURL” in the playground to see how BAML translates multi-modal input into the LLM Request body.

1 // "image" is a reserved keyword so we name the arg "img"
2 function DescribeMedia(img: image) -> string {
3   client openai/gpt-4o
4   // Most LLM providers require images or audio to be sent as "user" messages.
5   prompt #"
6     {{_.role("user")}}
7     Describe this image: {{ img }}
8   "#
9 }
10 
11 // See the "testing functions" Guide for more on testing Multimodal functions
12 test Test {
13   functions [DescribeMedia]
14   args {
15     img {
16       url "https://upload.wikimedia.org/wikipedia/en/4/4d/Shrek_%28character%29.png"
17     }
18   }
19 }

See how to test images in the playground.

Try it! Press ‘Run Test’ below!

Calling Multimodal BAML Functions

Images

Calling a BAML function with an image input argument type (see image types)

The from_url and from_base64 methods create an Image object based on input type.

1 from baml_py import Image
2 from baml_client import b
3 
4 async def test_image_input():
5   # from URL
6   res = await b.TestImageInput(
7       img=Image.from_url(
8           "https://upload.wikimedia.org/wikipedia/en/4/4d/Shrek_%28character%29.png"
9       )
10   )
11 
12   # Base64 image
13   image_b64 = "iVBORw0K...."
14   res = await b.TestImageInput(
15     img=Image.from_base64("image/png", image_b64)
16   )

Audio

Calling functions that have audio types. See audio types

1 from baml_py import Audio
2 from baml_client import b
3 
4 async def run():
5   # from URL
6   res = await b.TestAudioInput(
7       img=Audio.from_url(
8           "https://actions.google.com/sounds/v1/emergency/beeper_emergency_call.ogg"
9       )
10   )
11 
12   # Base64
13   b64 = "iVBORw0K...."
14   res = await b.TestAudioInput(
15     audio=Audio.from_base64("audio/ogg", b64)
16   )

Pdf

Calling functions that have pdf types. See pdf types

⚠️ Warning Pdf inputs must be provided as Base64 data (e.g. Pdf.from_base64). URL-based Pdf inputs are not currently supported. Additionally, Pdf inputs are only supported by models that explicitly allow document (Pdf) modalities, such as Gemini 2.x Flash/Pro or VertexAI Gemini. Make sure the client you select advertises Pdf support, otherwise your request will fail.

1 from baml_py import Pdf
2 from baml_client import b
3 
4 async def run():
5   # Base64 data
6   b64 = "JVBERi0K...."
7   res = await b.TestPdfInput(
8     pdf=Pdf.from_base64("application/pdf", b64)
9   )

Video

Calling functions that have video types. See video types

⚠️ Warning Video inputs require a model that supports video understanding (for example Gemini 2.x Flash/Pro). If your chosen model does not list video support your function call will return an error. When you supply a Video as a URL the URL is forwarded unchanged to the model; if the model cannot fetch remote content you must instead pass the bytes via Video.from_base64.

1 from baml_py import Video
2 from baml_client import b
3 
4 async def run():
5   # from URL
6   res = await b.TestVideoInput(
7       video=Video.from_url(
8           "https://example.com/sample.mp4"
9       )
10   )
11 
12   # Base64
13   b64 = "AAAAGGZ0eXBpc29t...."
14   res = await b.TestVideoInput(
15     video=Video.from_base64("video/mp4", b64)
16   )

Multi-modal input

Try it! Press ‘Run Test’ below!

Calling Multimodal BAML Functions

Images

Audio

Pdf

Video