BAML lets you stream in structured JSON output from LLMs as it comes in.
If you tried streaming in a JSON output from an LLM you’d see something like:
BAML gives you fine-grained control of how it fixes this partial JSON and transforms it into a series of semantically valid partial objects.
Let’s stream the output of this function function ExtractReceiptInfo(email: string) -> ReceiptInfo for our example:
The BAML code generator creates a set of types in the baml_client library
in a module called partial_types in baml_client. These types are modified
from your original types to support streaming.
By default, BAML will convert all Class fields into nullable fields, and fill those fields with non-null values as much as possible given the tokens received so far.
BAML will generate b.stream.ExtractReceiptInfo() for you, which you can use like so:
Number fields are always streamed in only when the LLM completes them. E.g. if the final number is 129.95, you’ll only see null or 129.95 instead of partial numbers like 1, 12, 129.9, etc.
You can cancel ongoing streams using abort controllers, which is essential for responsive applications that allow users to stop generation or implement timeouts.
Allow users to stop streaming generation with a “Stop” button:
For more examples and patterns, see the Abort Controllers guide.
BAML provides powerful attributes to control how your data streams, ensuring that partial values always maintain semantic validity. Here are the three key streaming attributes:
@stream.doneThis attribute ensures a type or field is only streamed when it’s completely finished. It’s useful when you need atomic, fully-formed values.
For example:
A common pattern is streaming a list of items where each item can be one of
several types (e.g. tool calls and messages). You can use @stream.done on
the list element type to ensure each item only appears once it’s fully complete:
When @stream.done is applied to a union type, it propagates to all variants.
This means you don’t need to add @@stream.done to each class individually —
annotating the union is sufficient.
You can also achieve the same behavior by adding @@stream.done to each
class in the union. The (T @stream.done)[] syntax is more concise when
the classes are used in other contexts where you don’t want @@stream.done.
@stream.not_nullThis attribute ensures a containing object is only streamed when this field has a value. It’s particularly useful for discriminator fields or required metadata.
For example:
@stream.with_stateThis attribute adds metadata to track if a field has finished streaming. It’s perfect for showing loading states in UIs.
For example:
This will generate the following code in the partial_types module:
Here’s how these attributes affect your types in generated code:
The return type of a function is not affected by streaming attributes!
Let’s put all of these concepts together to design an application that streams a conversation containing stock recommendations, using semantic streaming to ensure that the streamed data obeys our domain’s invariants.
The above BAML code will generate the following Python definitions in the
partial_types module. The use of streaming attributes has several effects on
the generated code:
Recommendation does not have any partial fields because it was marked
@stream.done.Message.message string is wrapped in StreamState, allowing
runtime checking of its completion status. This status could be used
to render a spinner as the message streams in.Message.message_type field may not be null, because it was marked
as @stream.not_null.