Multi-Modal (Images / Audio)

Multi-modal input

You can use audio, image, pdf, or video input types in BAML prompts. Just create an input argument of that type and render it in the prompt.

Switch from “Prompt Review” to “Raw cURL” in the playground to see how BAML translates multi-modal input into the LLM Request body.

1// "image" is a reserved keyword so we name the arg "img"
2function DescribeMedia(img: image) -> string {
3 client openai/gpt-4o
4 // Most LLM providers require images or audio to be sent as "user" messages.
5 prompt #"
6 {{_.role("user")}}
7 Describe this image: {{ img }}
8 "#
9}
10
11// See the "testing functions" Guide for more on testing Multimodal functions
12test Test {
13 functions [DescribeMedia]
14 args {
15 img {
16 url "https://upload.wikimedia.org/wikipedia/en/4/4d/Shrek_%28character%29.png"
17 }
18 }
19}

See how to test images in the playground.

Try it! Press ‘Run Test’ below!

Calling Multimodal BAML Functions

Images

Calling a BAML function with an image input argument type (see image types)

The from_url and from_base64 methods create an Image object based on input type.

1from baml_py import Image
2from baml_client import b
3
4async def test_image_input():
5 # from URL
6 res = await b.TestImageInput(
7 img=Image.from_url(
8 "https://upload.wikimedia.org/wikipedia/en/4/4d/Shrek_%28character%29.png"
9 )
10 )
11
12 # Base64 image
13 image_b64 = "iVBORw0K...."
14 res = await b.TestImageInput(
15 img=Image.from_base64("image/png", image_b64)
16 )

Audio

Calling functions that have audio types. See audio types

1from baml_py import Audio
2from baml_client import b
3
4async def run():
5 # from URL
6 res = await b.TestAudioInput(
7 img=Audio.from_url(
8 "https://actions.google.com/sounds/v1/emergency/beeper_emergency_call.ogg"
9 )
10 )
11
12 # Base64
13 b64 = "iVBORw0K...."
14 res = await b.TestAudioInput(
15 audio=Audio.from_base64("audio/ogg", b64)
16 )

Pdf

Calling functions that have pdf types. See pdf types

⚠️ Warning Pdf inputs must be provided as Base64 data (e.g. Pdf.from_base64). URL-based Pdf inputs are not currently supported. Additionally, Pdf inputs are only supported by models that explicitly allow document (Pdf) modalities, such as Gemini 2.x Flash/Pro or VertexAI Gemini. Make sure the client you select advertises Pdf support, otherwise your request will fail.

1from baml_py import Pdf
2from baml_client import b
3
4async def run():
5 # Base64 data
6 b64 = "JVBERi0K...."
7 res = await b.TestPdfInput(
8 pdf=Pdf.from_base64("application/pdf", b64)
9 )

Video

Calling functions that have video types. See video types

⚠️ Warning Video inputs require a model that supports video understanding (for example Gemini 2.x Flash/Pro). If your chosen model does not list video support your function call will return an error. When you supply a Video as a URL the URL is forwarded unchanged to the model; if the model cannot fetch remote content you must instead pass the bytes via Video.from_base64.

1from baml_py import Video
2from baml_client import b
3
4async def run():
5 # from URL
6 res = await b.TestVideoInput(
7 video=Video.from_url(
8 "https://example.com/sample.mp4"
9 )
10 )
11
12 # Base64
13 b64 = "AAAAGGZ0eXBpc29t...."
14 res = await b.TestVideoInput(
15 video=Video.from_base64("video/mp4", b64)
16 )