Pdf

Pdf values to BAML functions can be created in client libraries. This document explains how to use these functions both at compile time and runtime to handle Pdf data. For more details, refer to pdf types.

Pdf instances can be created from URLs, Base64 data, or local files. URLs are automatically converted to Base64 for models that don’t support them directly (like OpenAI), while other models (like Anthropic, AWS Bedrock, Gemini) accept URLs natively. Please note that many websites will block requests to directly fetch PDFs.

Some models like Vertex AI require the media type to be explicitly specified. Always provide the mediaType parameter when possible for better compatibility.

Usage Examples

1from baml_py import Pdf
2from baml_client import b
3
4async def test_pdf_input():
5 # Create a Pdf object from URL
6 pdf_url = Pdf.from_url("https://example.com/document.pdf")
7 res1 = await b.TestPdfInput(pdf=pdf_url)
8
9 # Create a Pdf object from Base64 data
10 pdf_b64 = "JVBERi0K..."
11 pdf = Pdf.from_base64(pdf_b64)
12 res2 = await b.TestPdfInput(pdf=pdf)

Static Methods

fromUrl
(url: string, mediaType?: string) => Pdf

Creates a Pdf object from a URL. The mediaType parameter is optional but recommended for better model compatibility. If not provided, the media type will be inferred when the content is fetched.

fromBase64
(mediaType: string, base64: string) => Pdf

Creates a Pdf object using Base64 encoded data along with the given MIME type. The mediaType parameter is required.

fromFile
(file: File) => Promise<Pdf>

Only available in browser environments. @boundaryml/baml/browser
Creates a Pdf object from a File object. Available in browser environments only.

fromBlob
(blob: Blob, mediaType?: string) => Promise<Pdf>

Only available in browser environments. @boundaryml/baml/browser
Creates a Pdf object from a Blob object. Available in browser environments only.

Instance Methods

isUrl
() => boolean

Check if the Pdf is stored as a URL.

asUrl
() => string

Get the URL if the Pdf is stored as a URL. Throws an Error if the Pdf is not stored as a URL.

asBase64
() => [string, string]

Get the base64 data and media type if the Pdf is stored as base64. Returns [base64Data, mediaType]. Throws an Error if the Pdf is not stored as base64.

toJSON
() => { url: string } | { base64: string; media_type: string }

Convert the Pdf to a JSON representation. Returns either a URL object or a base64 object with media type, depending on how the Pdf was created.

Model Compatibility

Different AI models have varying levels of support for PDF input methods (As of July 2025):

Provider / APIPDF Input Support
AnthropicAccepts PDFs as a direct https URL or a base‑64 string in a document block.
AWS BedrockPDF must be supplied as raw bytes (base‑64 in the request) or as an Amazon S3 URI (s3:// style). Ordinary https links are not supported.
Google GeminiProvide as inline base‑64 or upload first with media.upload and use the returned file_uri. The model does not fetch http/https URLs for you.
OpenAIPDF support (added March 2025) via base‑64 in the request. Supplying a plain URL is not accepted.
Google Vertex AIAccepts either base‑64 data or a Cloud Storage gs:// URI in a file_data part; you must set mime_type (for PDFs use application/pdf). Generic https URLs are not allowed.

For most models, direct https URLs are not accepted (except Anthropic). Prefer using base64, file uploads, or the appropriate cloud storage/file upload mechanism for your provider. Always specify the correct MIME type (e.g., application/pdf) when required.