Why BAML?

Let’s say you want to extract structured data from resumes. It starts simple enough…

But first, let’s see where we’re going with this story:

BAML: What it is and how it helps - see the full developer experience

It starts simple

You begin with a basic LLM call to extract a name and skills:

1import openai
2
3def extract_resume(text):
4 response = openai.chat.completions.create(
5 model="gpt-4o",
6 messages=[{"role": "user", "content": f"Extract name and skills from: {text}"}]
7 )
8 return response.choices[0].message.content

This works… sometimes. But you need structured data, not free text.

You need structure

So you try JSON mode and add Pydantic for validation:

1from pydantic import BaseModel
2import json
3
4class Resume(BaseModel):
5 name: str
6 skills: list[str]
7
8def extract_resume(text):
9 prompt = f"""Extract resume data as JSON:
10{text}
11
12Return JSON with fields: name (string), skills (array of strings)"""
13
14 response = openai.chat.completions.create(
15 model="gpt-4o",
16 messages=[{"role": "user", "content": prompt}],
17 response_format={"type": "json_object"}
18 )
19
20 data = json.loads(response.choices[0].message.content)
21 return Resume(**data)

Better! But now you need more fields. You add education, experience, and location:

1class Education(BaseModel):
2 school: str
3 degree: str
4 year: int
5
6class Resume(BaseModel):
7 name: str
8 skills: list[str]
9 education: list[Education]
10 location: str
11 years_experience: int

The prompt gets longer and more complex. But wait - how do you test this without burning tokens?

Testing becomes expensive

Every test costs money and takes time:

1# This burns tokens every time you run tests!
2def test_resume_extraction():
3 test_resume = "John Doe, Python expert, MIT 2020..."
4 result = extract_resume(test_resume) # API call = $$$
5 assert result.name == "John Doe"

You try mocking, but then you’re not testing your actual extraction logic. Your prompt could be completely broken and tests would still pass.

Error handling nightmare

Real resumes break your extraction. The LLM returns malformed JSON:

Resume extraction error in traditional approach
1{
2 "name": "John Doe",
3 "skills": ["Python", "JavaScript"
4 // Missing closing bracket!

You add retry logic, JSON fixing, error handling:

1import re
2import time
3
4def extract_resume(text, max_retries=3):
5 for attempt in range(max_retries):
6 try:
7 response = openai.chat.completions.create(...)
8 content = response.choices[0].message.content
9
10 # Try to fix common JSON issues
11 content = fix_json(content)
12
13 data = json.loads(content)
14 return Resume(**data)
15 except (json.JSONDecodeError, ValidationError) as e:
16 if attempt == max_retries - 1:
17 raise
18 time.sleep(2 ** attempt) # Exponential backoff
19
20def fix_json(content):
21 # Remove text before/after JSON
22 json_match = re.search(r'\{.*\}', content, re.DOTALL)
23 if json_match:
24 content = json_match.group(0)
25
26 # Fix common issues
27 content = content.replace(',}', '}')
28 content = content.replace(',]', ']')
29 # ... more fixes
30
31 return content

Your simple extraction function is now 50+ lines of infrastructure code.

Multi-model chaos

Your company wants to use Claude for some tasks (better reasoning) and GPT-4-mini for others (cost savings):

1def extract_resume(text, provider="openai", model="gpt-4o"):
2 if provider == "openai":
3 import openai
4 client = openai.OpenAI()
5 response = client.chat.completions.create(model=model, ...)
6 elif provider == "anthropic":
7 import anthropic
8 client = anthropic.Anthropic()
9 # Different API! Need to rewrite everything
10 response = client.messages.create(model=model, ...)
11 # ... handle different response formats

Each provider has different APIs, different response formats, different capabilities. Your code becomes a mess of if/else statements.

The prompt mystery

Your extraction fails on certain resumes. You need to debug, but what was actually sent to the LLM?

1# What prompt was generated? How many tokens did it use?
2# Why did this specific resume fail?
3# How do I optimize for cost?
4
5# You can't easily see:
6# - The exact prompt that was sent
7# - How the schema was formatted
8# - Token usage breakdown
9# - Why specific fields were missed

You start adding logging, token counting, prompt inspection tools…

Classification gets complex

Now you need to classify seniority levels:

1from enum import Enum
2
3class SeniorityLevel(str, Enum):
4 JUNIOR = "junior"
5 MID = "mid"
6 SENIOR = "senior"
7 STAFF = "staff"
8
9class Resume(BaseModel):
10 name: str
11 skills: list[str]
12 education: list[Education]
13 seniority: SeniorityLevel

But the LLM doesn’t know what these levels mean! You update the prompt:

1prompt = f"""Extract resume data as JSON:
2
3Seniority levels:
4- junior: 0-2 years experience
5- mid: 2-5 years experience
6- senior: 5-10 years experience
7- staff: 10+ years experience
8
9{text}
10
11Return JSON with fields: name, skills, education, seniority..."""

Your prompt is getting huge and your business logic is scattered between code and strings.

Production deployment headaches

In production, you need:

  • Retry policies for rate limits
  • Fallback models when primary is down
  • Cost tracking and optimization
  • Error monitoring and alerting
  • A/B testing different prompts

Your simple extraction function becomes a complex service:

1class ResumeExtractor:
2 def __init__(self):
3 self.primary_client = openai.OpenAI()
4 self.fallback_client = anthropic.Anthropic()
5 self.token_tracker = TokenTracker()
6 self.error_monitor = ErrorMonitor()
7
8 async def extract_with_fallback(self, text):
9 try:
10 return await self._extract_openai(text)
11 except RateLimitError:
12 return await self._extract_anthropic(text)
13 except Exception as e:
14 self.error_monitor.log(e)
15 raise
16
17 def _extract_openai(self, text):
18 # 50+ lines of OpenAI-specific logic
19 pass
20
21 def _extract_anthropic(self, text):
22 # 50+ lines of Anthropic-specific logic
23 pass

Enter BAML

What if you could go back to something simple, but keep all the power?

1class Education {
2 school string
3 degree string
4 year int
5}
6
7enum SeniorityLevel {
8 JUNIOR @description("0-2 years of experience")
9 MID @description("2-5 years of experience")
10 SENIOR @description("5-10 years of experience")
11 STAFF @description("10+ years of experience, technical leadership")
12}
13
14class Resume {
15 name string
16 skills string[]
17 education Education[]
18 seniority SeniorityLevel
19}
20
21function ExtractResume(resume_text: string) -> Resume {
22 client GPT4
23 prompt #"
24 Extract information from this resume.
25
26 For seniority level, consider:
27 {{ ctx.output_format.seniority }}
28
29 Resume:
30 ---
31 {{ resume_text }}
32 ---
33
34 {{ ctx.output_format }}
35 "#
36}

Look what you get immediately:

BAML playground working with resume extraction

BAML playground showing successful resume extraction with clear prompts and structured output

1. Instant Testing

Test in VSCode playground without API calls or token costs:

VSCode playground showing resume extraction with prompt preview
  • See the exact prompt that will be sent to the LLM
  • Test with real data instantly - no API calls needed
  • Save test cases for regression testing
  • Visual prompt preview shows token usage and formatting
VSCode test cases interface

Build up a library of test cases that run instantly

2. Multi-Model Made Simple

1client<llm> GPT4 {
2 provider openai
3 options { model "gpt-4o" }
4}
5
6client<llm> Claude {
7 provider anthropic
8 options { model "claude-3-opus-20240229" }
9}
10
11client<llm> GPT4Mini {
12 provider openai
13 options { model "gpt-4o-mini" }
14}
15
16// Same function, any model - just change the client
17function ExtractResume(resume_text: string) -> Resume {
18 client GPT4 // Switch to Claude or GPT4Mini with one line
19 prompt #"..."#
20}

3. Schema-Aligned Parsing (SAP)

BAML’s breakthrough innovation follows Postel’s Law: “Be conservative in what you do, be liberal in what you accept from others.”

Instead of rejecting imperfect outputs, SAP actively transforms them to match your schema using custom edit distance algorithms.

SAP vs Other Approaches:

ModelFunction CallingPython AST ParserSAP
gpt-3.5-turbo87.5%75.8%92%
gpt-4o87.4%82.1%93%
claude-3-haiku57.3%82.6%91.7%

Key insight: SAP + GPT-3.5 turbo beats GPT-4o + structured outputs, saving you money while improving accuracy.

4. Production Features Built-In

1client<llm> RobustGPT4 {
2 provider openai
3 options { model "gpt-4o" }
4 retry_policy {
5 max_retries 3
6 strategy exponential_backoff
7 }
8}
9
10client<llm> SmartFallback {
11 provider fallback
12 options {
13 clients ["GPT4", "Claude", "GPT4Mini"]
14 }
15}

5. Token Optimization

  • See exact token usage for every call
  • BAML’s schema format uses 80% fewer tokens than JSON Schema
  • Optimize prompts with instant feedback

6. Type Safety Everywhere

Generated BAML client with type safety
1from baml_client import baml as b
2
3# Fully typed, works in Python, TypeScript, Java, Go
4resume = await b.ExtractResume(resume_text)
5print(resume.seniority) # Type: SeniorityLevel

BAML generates fully typed clients for all languages automatically

See how changes instantly update the prompt:

BAML prompt view updating in real-time as types change

Change your types → Prompt automatically updates → See the difference immediately

7. Advanced Streaming with UI Integration

BAML’s semantic streaming lets you build real UIs with loading bars and type-safe implementations:

1class BlogPost {
2 title string @stream.done @stream.not_null
3 content string @stream.with_state
4}

What this enables:

  • Loading bars - Show progress as structured data streams in
  • Semantic guarantees - Title only appears when complete, content streams token by token
  • Type-safe streaming - Full TypeScript/Python types for partial data
  • UI state management - Know exactly what’s loading vs complete

See semantic streaming in action - structured data streaming with loading states

The Bottom Line

You started with: A simple LLM call You ended up with: Hundreds of lines of infrastructure code

With BAML, you get:

  • The simplicity of your first attempt
  • All the production features you built manually
  • Better reliability than you could build yourself
  • 10x faster development iteration
  • Full control and transparency

BAML is what LLM development should have been from the start. Ready to see the difference? Get started with BAML.