Streaming
BAML lets you stream in structured JSON output from LLMs as it comes in.
If you tried streaming in a JSON output from an LLM you’d see something like:
BAML gives you fine-grained control of how it fixes this partial JSON and transforms it into a series of semantically valid partial objects.
Let’s stream the output of this function function ExtractReceiptInfo(email: string) -> ReceiptInfo
for our example:
extract-receipt-info.baml
The BAML code generator creates a set of types in the baml_client
library
in a module called partial_types
in baml_client
. These types are modified
from your original types to support streaming.
By default, BAML will convert all Class fields into nullable fields, and fill those fields with non-null values as much as possible given the tokens received so far.
Python
TypeScript
Ruby (beta)
OpenAPI
BAML will generate b.stream.ExtractReceiptInfo()
for you, which you can use like so:
Number fields are always streamed in only when the LLM completes them. E.g. if the final number is 129.95, you’ll only see null or 129.95 instead of partial numbers like 1, 12, 129.9, etc.
Semantic Streaming
BAML provides powerful attributes to control how your data streams, ensuring that partial values always maintain semantic validity. Here are the three key streaming attributes:
@stream.done
This attribute ensures a type or field is only streamed when it’s completely finished. It’s useful when you need atomic, fully-formed values.
For example:
@stream.not_null
This attribute ensures a containing object is only streamed when this field has a value. It’s particularly useful for discriminator fields or required metadata.
For example:
@stream.with_state
This attribute adds metadata to track if a field has finished streaming. It’s perfect for showing loading states in UIs.
For example:
This will generate the following code in the partial_types
module:
Python
Typescript
Type Transformation Summary
Here’s how these attributes affect your types in generated code:
The return type of a function is not affected by streaming attributes!
Putting it all together
Let’s put all of these concepts together to design an application that streams a conversation containing stock recommendations, using semantic streaming to ensure that the streamed data obeys our domain’s invariants.
Python
Typescript
The above BAML code will generate the following Python definitions in the
partial_types
module. The use of streaming attributes has several effects on
the generated code:
Recommendation
does not have any partial fields because it was marked@stream.done
.- The
Message.message
string
is wrapped inStreamState
, allowing runtime checking of its completion status. This status could be used to render a spinner as the message streams in. - The
Message.message_type
field may not benull
, because it was marked as@stream.not_null
.