Streaming

BAML lets you stream in structured JSON output from LLMs as it comes in.

If you tried streaming in a JSON output from an LLM you’d see something like:

{"items": [{"name": "Appl
{"items": [{"name": "Apple", "quantity": 2, "price": 1.
{"items": [{"name": "Apple", "quantity": 2, "price": 1.50}], "total_cost":
{"items": [{"name": "Apple", "quantity": 2, "price": 1.50}], "total_cost": 3.00} # Completed

BAML automatically fixes this partial JSON, and transforms all your types into Partial types with all Optional fields only during the stream.

You can check out more examples (including streaming in FastAPI and NextJS) in the BAML Examples repo.

Lets stream the output of this function function ExtractReceiptInfo(email: string) -> ReceiptInfo for our example:

BAML will generate b.stream.ExtractReceiptInfo() for you, which you can use like so:

main.py
1import asyncio
2from baml_client import b, partial_types, types
3
4# Using a stream:
5def example1(receipt: str):
6 stream = b.stream.ExtractReceiptInfo(receipt)
7
8 # partial is a Partial type with all Optional fields
9 for partial in stream:
10 print(f"partial: parsed {len(partial.items)} items (object: {partial})")
11
12 # final is the full, original, validated ReceiptInfo type
13 final = stream.get_final_response()
14 print(f"final: {len(final.items)} items (object: {final})")
15
16# Using only get_final_response() of a stream
17#
18# In this case, you should just use b.ExtractReceiptInfo(receipt) instead,
19# which is slightly faster and more efficient.
20def example2(receipt: str):
21 final = b.stream.ExtractReceiptInfo(receipt).get_final_response()
22 print(f"final: {len(final.items)} items (object: {final})")
23
24# Using the async client:
25async def example3(receipt: str):
26 # Note the import of the async client
27 from baml_client.async_client import b
28 stream = b.stream.ExtractReceiptInfo(receipt)
29 async for partial in stream:
30 print(f"partial: parsed {len(partial.items)} items (object: {partial})")
31
32 final = await stream.get_final_response()
33 print(f"final: {len(final.items)} items (object: {final})")
34
35receipt = """
3604/14/2024 1:05 pm
37
38Ticket: 220000082489
39Register: Shop Counter
40Employee: Connor
41Customer: Sam
42Item # Price
43Guide leash (1 Pair) uni UNI
441 $34.95
45The Index Town Walls
461 $35.00
47Boot Punch
483 $60.00
49Subtotal $129.95
50Tax ($129.95 @ 9%) $11.70
51Total Tax $11.70
52Total $141.65
53"""
54
55if __name__ == '__main__':
56 asyncio.run(example1(receipt))
57 asyncio.run(example2(receipt))
58 asyncio.run(example3(receipt))

Number fields are always streamed in only when the LLM completes them. E.g. if the final number is 129.95, you’ll only see null or 129.95 instead of partial numbers like 1, 12, 129.9, etc.