Comparing Langchain | Boundary Documentation

Langchain is one of the most popular frameworks for building LLM applications. It provides abstractions for chains, agents, memory, and more.

Let’s dive into how Langchain handles structured extraction and where it falls short.

Why working with LLMs requires more than just Langchain

Langchain makes structured extraction look simple at first:

1 from pydantic import BaseModel, Field
2 from langchain_openai import ChatOpenAI
3 
4 class Resume(BaseModel):
5     name: str
6     skills: List[str]
7 
8 llm = ChatOpenAI(model="gpt-4o")
9 structured_llm = llm.with_structured_output(Resume)
10 result = structured_llm.invoke("John Doe, Python, Rust")

That’s pretty neat! But now let’s add an Education model to make it more realistic:

1 +class Education(BaseModel):
2 +    school: str
3 +    degree: str
4 +    year: int
5 
6 class Resume(BaseModel):
7     name: str
8     skills: List[str]
9 +    education: List[Education]
10 
11 structured_llm = llm.with_structured_output(Resume)
12 result = structured_llm.invoke("""John Doe
13 Python, Rust
14 University of California, Berkeley, B.S. in Computer Science, 2020""")

Still works… but what’s actually happening under the hood? What prompt is being sent? How many tokens are we using?

Let’s dig deeper. Say you want to see what’s actually being sent to the model:

1 # How do you debug this?
2 structured_llm = llm.with_structured_output(Resume)
3 
4 # You need to enable verbose mode or dig into callbacks
5 from langchain.globals import set_debug
6 set_debug(True)
7 
8 # Now you get TONS of debug output...

But even with debug mode, you still can’t easily:

Modify the extraction prompt
See the exact token count
Understand why extraction failed for certain inputs

When things go wrong

Here’s where it gets tricky. Your PM asks: “Can we classify these resumes by seniority level?”

1 from enum import Enum
2 
3 class SeniorityLevel(str, Enum):
4     JUNIOR = "junior"
5     MID = "mid"
6     SENIOR = "senior"
7     STAFF = "staff"
8 
9 class Resume(BaseModel):
10     name: str
11     skills: List[str]
12     education: List[Education]
13     seniority: SeniorityLevel

But now you realize you need to give the LLM context about what each level means:

1 # Wait... how do I tell the LLM that "junior" means 0-2 years experience?
2 # How do I customize the prompt?
3 
4 # You end up doing this:
5 CLASSIFICATION_PROMPT = """
6 Given the resume below, classify the seniority level:
7 - junior: 0-2 years experience
8 - mid: 2-5 years experience  
9 - senior: 5-10 years experience
10 - staff: 10+ years experience
11 
12 Resume: {resume_text}
13 """
14 
15 # Now you need separate chains...
16 classification_chain = LLMChain(llm=llm, prompt=PromptTemplate.from_template(CLASSIFICATION_PROMPT))
17 extraction_chain = llm.with_structured_output(Resume)
18 
19 # And combine them somehow...

Your clean code is starting to look messy. But wait, there’s more!

Multi-model madness

Your company wants to use Claude for some tasks (better reasoning) and GPT-4-mini for others (cost savings). With Langchain:

1 from langchain_anthropic import ChatAnthropic
2 from langchain_openai import ChatOpenAI
3 
4 # Different providers, different imports
5 claude = ChatAnthropic(model="claude-3-opus-20240229")
6 gpt4 = ChatOpenAI(model="gpt-4o")
7 gpt4_mini = ChatOpenAI(model="gpt-4o-mini")
8 
9 # But wait... does Claude support structured outputs the same way?
10 claude_structured = claude.with_structured_output(Resume)  # May not work!
11 
12 # You need provider-specific handling
13 if provider == "anthropic":
14     # Use function calling? XML? JSON mode?
15     # Different providers have different capabilities
16     pass

Testing nightmare

Now you want to test your extraction logic without burning through API credits:

1 # How do you test this?
2 structured_llm = llm.with_structured_output(Resume)
3 
4 # Mock the entire LLM?
5 from unittest.mock import Mock
6 mock_llm = Mock()
7 mock_llm.with_structured_output.return_value.invoke.return_value = Resume(...)
8 
9 # But you're not really testing your extraction logic...
10 # Just that your mocks work

With BAML, testing is visual and instant:

VSCode test case buttons for instant testing

Test your prompts instantly without API calls or mocking

The token mystery

Your CFO asks: “Why is our OpenAI bill so high?” You investigate:

1 # How many tokens does this use?
2 structured_llm = llm.with_structured_output(Resume)
3 result = structured_llm.invoke(long_resume_text)
4 
5 # You need callbacks or token counting utilities
6 from langchain.callbacks import get_openai_callback
7 
8 with get_openai_callback() as cb:
9     result = structured_llm.invoke(long_resume_text)
10     print(f"Tokens: {cb.total_tokens}")  # Finally!

But you still don’t know WHY it’s using so many tokens. Is it the schema format? The prompt template? The retry logic?

Enter BAML

BAML was built specifically for these LLM challenges. Here’s the same resume extraction:

1 class Education {
2   school string
3   degree string
4   year int
5 }
6 
7 class Resume {
8   name string
9   skills string[]
10   education Education[]
11   seniority SeniorityLevel
12 }
13 
14 enum SeniorityLevel {
15   JUNIOR @description("0-2 years of experience")
16   MID @description("2-5 years of experience") 
17   SENIOR @description("5-10 years of experience")
18   STAFF @description("10+ years of experience, technical leadership")
19 }
20 
21 function ExtractResume(resume_text: string) -> Resume {
22   client GPT4
23   prompt #"
24     Extract information from this resume.
25     
26     For seniority level, consider:
27     {{ ctx.output_format.seniority }}
28     
29     Resume:
30     ---
31     {{ resume_text }}
32     ---
33     
34     {{ ctx.output_format }}
35   "#
36 }

Now look what you get:

See exactly what’s sent to the LLM - The prompt is right there!
Test without API calls - Use the VSCode playground
Switch models instantly - Just change client GPT4 to client Claude
Token count visibility - BAML shows exact token usage
Modify prompts easily - It’s just a template string

Multi-model support done right

1 // Define all your clients in one place
2 client<llm> GPT4 {
3   provider openai
4   options {
5     model "gpt-4o"
6     temperature 0.1
7   }
8 }
9 
10 client<llm> GPT4Mini {
11   provider openai
12   options {
13     model "gpt-4o-mini"
14     temperature 0.1
15   }
16 }
17 
18 client<llm> Claude {
19   provider anthropic
20   options {
21     model "claude-3-opus-20240229"
22     max_tokens 4096
23   }
24 }
25 
26 // Same function works with ANY model
27 function ExtractResume(resume_text: string) -> Resume {
28   client GPT4  // Just change this line
29   prompt #"..."#
30 }

Use it in Python:

1 from baml_client import baml as b
2 
3 # Use default model
4 resume = await b.ExtractResume(resume_text)
5 
6 # Override at runtime based on your needs
7 resume_complex = await b.ExtractResume(complex_text, {"client": "Claude"})
8 resume_simple = await b.ExtractResume(simple_text, {"client": "GPT4Mini"})

The bottom line

Langchain is great for building complex LLM applications with chains, agents, and memory. But for structured extraction, you’re fighting against abstractions that hide important details.

BAML gives you what Langchain can’t:

Full prompt transparency - See and control exactly what’s sent to the LLM
Native testing - Test in VSCode without API calls or burning tokens
Multi-model by design - Switch providers with one line, works with any model
Token visibility - Know exactly what you’re paying for and optimize costs
Type safety - Generated clients with autocomplete that always match your schema
Schema-Aligned Parsing - Get structured outputs from any model, even without function calling
Streaming + Structure - Stream structured data with loading bars and type-safe parsing

Why this matters for production:

Faster iteration - See changes instantly without running Python code
Better debugging - Know exactly why extraction failed
Cost optimization - Understand and reduce token usage
Model flexibility - Never get locked into one provider
Team collaboration - Prompts are code, not hidden strings

We built BAML because we were tired of wrestling with framework abstractions when all we wanted was reliable structured extraction with full developer control.

Limitations of BAML

BAML does have some limitations we are continuously working on:

It is a new language. However, it is fully open source and getting started takes less than 10 minutes
Developing requires VSCode. You could use vim but we don’t recommend it
It’s focused on structured extraction - not a full LLM framework like Langchain

If you need complex chains and agents, use Langchain. If you want the best structured extraction experience with full control, try BAML.