Streaming

BAML lets you stream in structured JSON output from LLMs as it comes in.

If you tried streaming in a JSON output from an LLM you’d see something like:

{"items": [{"name": "Appl
{"items": [{"name": "Apple", "quantity": 2, "price": 1.
{"items": [{"name": "Apple", "quantity": 2, "price": 1.50}], "total_cost":
{"items": [{"name": "Apple", "quantity": 2, "price": 1.50}], "total_cost": 3.00} # Completed

BAML gives you fine-grained control of how it fixes this partial JSON and transforms it into a series of semantically valid partial objects.

You can check out more examples (including streaming in FastAPI and NextJS) in the BAML Examples repo.

Let’s stream the output of this function function ExtractReceiptInfo(email: string) -> ReceiptInfo for our example:

1class ReceiptItem {
2 name string
3 description string?
4 quantity int
5 price float
6}
7
8class ReceiptInfo {
9 items ReceiptItem[]
10 total_cost float?
11}
12
13function ExtractReceiptInfo(email: string) -> ReceiptInfo {
14 client GPT4o
15 prompt #"
16 Given the receipt below:
17
18 {{ email }}
19
20 {{ ctx.output_format }}
21 "#
22}

The BAML code generator creates a set of types in the baml_client library in a module called partial in baml_client. These types are modified from your original types to support streaming.

By default, BAML will convert all Class fields into nullable fields, and fill those fields with non-null values as much as possible given the tokens received so far.

BAML will generate b.stream.ExtractReceiptInfo() for you, which you can use like so:

main.py
1import asyncio
2from baml_client import b, partial_types, types
3
4# Using a stream:
5def example1(receipt: str):
6 stream = b.stream.ExtractReceiptInfo(receipt)
7
8 # partial is a Partial type with all Optional fields
9 for partial in stream:
10 print(f"partial: parsed {len(partial.items)} items (object: {partial})")
11
12 # final is the full, original, validated ReceiptInfo type
13 final = stream.get_final_response()
14 print(f"final: {len(final.items)} items (object: {final})")
15
16# Using only get_final_response() of a stream
17#
18# In this case, you should just use b.ExtractReceiptInfo(receipt) instead,
19# which is slightly faster and more efficient.
20def example2(receipt: str):
21 final = b.stream.ExtractReceiptInfo(receipt).get_final_response()
22 print(f"final: {len(final.items)} items (object: {final})")
23
24# Using the async client:
25async def example3(receipt: str):
26 # Note the import of the async client
27 from baml_client.async_client import b
28 stream = b.stream.ExtractReceiptInfo(receipt)
29 async for partial in stream:
30 print(f"partial: parsed {len(partial.items)} items (object: {partial})")
31
32 final = await stream.get_final_response()
33 print(f"final: {len(final.items)} items (object: {final})")
34
35receipt = """
3604/14/2024 1:05 pm
37
38Ticket: 220000082489
39Register: Shop Counter
40Employee: Connor
41Customer: Sam
42Item # Price
43Guide leash (1 Pair) uni UNI
441 $34.95
45The Index Town Walls
461 $35.00
47Boot Punch
483 $60.00
49Subtotal $129.95
50Tax ($129.95 @ 9%) $11.70
51Total Tax $11.70
52Total $141.65
53"""
54
55if __name__ == '__main__':
56 asyncio.run(example1(receipt))
57 asyncio.run(example2(receipt))
58 asyncio.run(example3(receipt))

Number fields are always streamed in only when the LLM completes them. E.g. if the final number is 129.95, you’ll only see null or 129.95 instead of partial numbers like 1, 12, 129.9, etc.

Semantic Streaming

The BAML language provides several attributes that can be attached to types to control streaming behavior, ensuring that the partial values streamed to you are always valid within your own semantics.

  • @stream.done: Marks a type that should only be streamed when it is done being read from the LLM response.
  • @stream.not_null: Marks a field to indicate that the class containing that field should only be streamed if that field is present (the field needed not be completed)
  • @stream.with_state: Adds metadata to a type indicating whether types appearing

@stream.done

To demonstrate the use of @stream.done, imagine that a ReceiptItem must only be consider valid and can only reach the client when its name, description, quantity and price fields are completely streamed in. To achieve this we can annotate the ReceiptItem class with the @stream.done attribute:

1class ReceiptItem {
2 name string
3 description string?
4 quantity int
5 price float
6 @@stream.done
7}

When generating the client code for ReceiptType none of the fields of ReceiptItem will be converted to optional. And when parsing an LLM response, no ReceiptItem will be streamed out until all of its fields are done being streamed in.

@stream.not_null

Sometimes the presence of a value is important to the correct interpretation of a containing value. This commonly occurs with tags used to determine which part of a union is being used.

For example, in this code block, @stream.not_null on each of the message_type fields will ensure that an Event is never streamed until enough tokens have been received to precisely know what the message type is, allowing you to build UI appropriate to the message type before the other fields have been completed.

1class Message {
2 message_type "greeting" | "within-convo" | "farewell" @stream.not_null
3 gesture ("gesticulate" | "wave" | "shake-hands" | "hug")?
4 message string
5}
6
7class Event {
8 event_message: Message
9 speaker string
10}
11
12function Chat(history: Event[]) -> Event { ... }

You might wonder if it’s sufficient to use @stream.done on the message_type field. @stream.done applies to types, preventing them from streaming out until they are completed. On the other hand, @stream.not_null applies to fields and prevents a containing object from streaming out until that field is present.

A type with @stream.done on it will still be converted to a nullable field in the generated partial types, so this change would not produce the desired result of witholding a Message until its type is known. Messages would be streamed with message_type: null.

This is a subtle distinction between @stream.done and @stream.not_null. As a rule of thumb, remember that @stream.done is about the type itself, and @stream.not_null is about the type’s containing context.

@stream.with_state

It is often useful to know in client code whether a value is finished, or could be updated in future messages. The @stream.with_state attribute lets you attach metadata to types to indicate this state in client code.

1class Message {
2 message_type "greeting" | "within-convo" | "farewell" @stream.not_null
3 gesture ("gesticulate" | "wave" | "shake-hands" | "hug")?
4 message string @stream.with_state
5}

Putting it all together

Let’s put all of these concepts together to design an application that streams a conversation containing stock recommendations, using semantic streaming to ensure that the streamed data obeys our domain’s invariants.

1enum Stock {
2 APPL,
3 MSFT,
4 GOOG,
5 BAML,
6}
7
8// Make recommendations atomic - we do not want a recommendation to be
9// modified by streaming additional messages.
10class Recommendation {
11 stock Stock
12 amount float
13 action "buy" | "sell"
14 @@stream.done
15}
16
17class Message {
18 message_type "greeting" | "conversation" | "farewell" @stream.not_null
19 message string @stream.with_state @stream.not_null
20}
21
22function Respond(
23 history: (Message | Recommendation | UserMessage)[]
24) -> Message | Recommendation { ... }

The above BAML code will generate the following Python definitions in the partial module. The use of streaming attributes has several effects on the generated code:

  • Recommendation does not have any partial field because it was marked @stream.done.
  • The Message.message string is wrapped in StreamState, allowing runtime checking of its completion status. This status could be used to render a spinner as the message streams in.
  • The Message.message_type field may not be null, because it was marked as @stream.not_null.
1class StreamState(BaseModel, Generic[T]):
2 value: T,
3 state: Union[Literal["incomplete"] | Literal[]]
4
5class Stock(str, Enum):
6 APPL = "APPL"
7 MSFT = "MSFT"
8 GOOG = "GOOG"
9 BAML = "BAML"
10
11class Recommendation(BaseClass):
12 stock: Stock
13 amount: float
14 action: Union[Literal["buy"], Literal["sell"]]
15
16class Message(BaseClass):
17 message_type: Union[Literal["gretting"], Literal["conversation"], Literal["farewell"]]
18 message: StreamState[string]
Built with