Streaming

BAML lets you stream in structured JSON output from LLMs as it comes in.

If you tried streaming in a JSON output from an LLM you’d see something like:

{"items": [{"name": "Appl
{"items": [{"name": "Apple", "quantity": 2, "price": 1.
{"items": [{"name": "Apple", "quantity": 2, "price": 1.50}], "total_cost":
{"items": [{"name": "Apple", "quantity": 2, "price": 1.50}], "total_cost": 3.00} # Completed

BAML automatically fixes this partial JSON, and transforms all your types into Partial types with all Optional fields only during the stream.

You can check out more examples (including streaming in FastAPI and NextJS) in the BAML Examples repo.

Lets stream the output of this function function ExtractReceiptInfo(email: string) -> ReceiptInfo for our example:

1class ReceiptItem {
2 name string
3 description string?
4 quantity int
5 price float
6}
7
8class ReceiptInfo {
9 items ReceiptItem[]
10 total_cost float?
11}
12
13function ExtractReceiptInfo(email: string) -> ReceiptInfo {
14 client GPT4o
15 prompt #"
16 Given the receipt below:
17
18 {{ email }}
19
20 {{ ctx.output_format }}
21 "#
22}

BAML will generate b.stream.ExtractReceiptInfo() for you, which you can use like so:

main.py
1import asyncio
2from baml_client import b, partial_types, types
3
4# Using a stream:
5def example1(receipt: str):
6 stream = b.stream.ExtractReceiptInfo(receipt)
7
8 # partial is a Partial type with all Optional fields
9 for partial in stream:
10 print(f"partial: parsed {len(partial.items)} items (object: {partial})")
11
12 # final is the full, original, validated ReceiptInfo type
13 final = stream.get_final_response()
14 print(f"final: {len(final.items)} items (object: {final})")
15
16# Using only get_final_response() of a stream
17#
18# In this case, you should just use b.ExtractReceiptInfo(receipt) instead,
19# which is slightly faster and more efficient.
20def example2(receipt: str):
21 final = b.stream.ExtractReceiptInfo(receipt).get_final_response()
22 print(f"final: {len(final.items)} items (object: {final})")
23
24# Using the async client:
25async def example3(receipt: str):
26 # Note the import of the async client
27 from baml_client.async_client import b
28 stream = b.stream.ExtractReceiptInfo(receipt)
29 async for partial in stream:
30 print(f"partial: parsed {len(partial.items)} items (object: {partial})")
31
32 final = await stream.get_final_response()
33 print(f"final: {len(final.items)} items (object: {final})")
34
35receipt = """
3604/14/2024 1:05 pm
37
38Ticket: 220000082489
39Register: Shop Counter
40Employee: Connor
41Customer: Sam
42Item # Price
43Guide leash (1 Pair) uni UNI
441 $34.95
45The Index Town Walls
461 $35.00
47Boot Punch
483 $60.00
49Subtotal $129.95
50Tax ($129.95 @ 9%) $11.70
51Total Tax $11.70
52Total $141.65
53"""
54
55if __name__ == '__main__':
56 asyncio.run(example1(receipt))
57 asyncio.run(example2(receipt))
58 asyncio.run(example3(receipt))

Number fields are always streamed in only when the LLM completes them. E.g. if the final number is 129.95, you’ll only see null or 129.95 instead of partial numbers like 1, 12, 129.9, etc.

Built with