Multi-Modal (Images / Audio)
Multi-modal input
You can use audio, image, pdf, or video input types in BAML prompts. Just create an input argument of that type and render it in the prompt.
Switch from “Prompt Review” to “Raw cURL” in the playground to see how BAML translates multi-modal input into the LLM Request body.
See how to test images in the playground.
Try it! Press ‘Run Test’ below!
Calling Multimodal BAML Functions
Images
Calling a BAML function with an image input argument type (see image types)
The from_url and from_base64 methods create an Image object based on input type.
Audio
Calling functions that have audio types. See audio types
Calling functions that have pdf types. See pdf types
⚠️ Warning Pdf inputs must be provided as Base64 data (e.g.
Pdf.from_base64). URL-based Pdf inputs are not currently supported. Additionally, Pdf inputs are only supported by models that explicitly allow document (Pdf) modalities, such as Gemini 2.x Flash/Pro or VertexAI Gemini. Make sure theclientyou select advertises Pdf support, otherwise your request will fail.
Video
Calling functions that have video types. See video types
⚠️ Warning Video inputs require a model that supports video understanding (for example Gemini 2.x Flash/Pro). If your chosen model does not list video support your function call will return an error. When you supply a Video as a URL the URL is forwarded unchanged to the model; if the model cannot fetch remote content you must instead pass the bytes via
Video.from_base64.
Controlling URL Resolution
By default, BAML automatically handles URL-to-base64 conversion based on what each provider supports. However, you can customize this behavior using the media_url_handler configuration:
Example: Optimizing for Performance
If you’re using Anthropic and want to avoid the latency of URL fetching:
Example: Working with Google Cloud Storage
When using Google AI with images stored in GCS:
Example: Ensuring Compatibility
For maximum compatibility across providers:
Random Thoughts
send_url- Allows providers to fetch URLs reducing payload sizesend_base64- Embedding content avoids external dependenciessend_url_add_mime_type- Required for proper media handling for some providers (if the mime type is not provided, it will be downloaded to determine the mime type)send_base64_unless_google_url- Preserves Google Cloud Storage URLs for Google providers
See the provider documentation for provider-specific defaults and requirements.