Streaming
BAML lets you stream in structured JSON output from LLMs as it comes in.
If you tried streaming in a JSON output from an LLM you’d see something like:
BAML gives you fine-grained control of how it fixes this partial JSON and transforms it into a series of semantically valid partial objects.
Let’s stream the output of this function function ExtractReceiptInfo(email: string) -> ReceiptInfo
for our example:
extract-receipt-info.baml
The BAML code generator creates a set of types in the baml_client
library
in a module called partial
in baml_client
. These types are modified
from your original types to support streaming.
By default, BAML will convert all Class fields into nullable fields, and fill those fields with non-null values as much as possible given the tokens received so far.
Python
TypeScript
Ruby (beta)
OpenAPI
BAML will generate b.stream.ExtractReceiptInfo()
for you, which you can use like so:
Number fields are always streamed in only when the LLM completes them. E.g. if the final number is 129.95, you’ll only see null or 129.95 instead of partial numbers like 1, 12, 129.9, etc.
Semantic Streaming
The BAML language provides several attributes that can be attached to types to control streaming behavior, ensuring that the partial values streamed to you are always valid within your own semantics.
@stream.done
: Marks a type that should only be streamed when it is done being read from the LLM response.@stream.not_null
: Marks a field to indicate that the class containing that field should only be streamed if that field is present (the field needed not be completed)@stream.with_state
: Adds metadata to a type indicating whether types appearing
@stream.done
To demonstrate the use of @stream.done
, imagine that a ReceiptItem
must only be consider valid and can only reach the client when its name
,
description
, quantity
and price
fields are completely streamed in.
To achieve this we can annotate the ReceiptItem
class with the
@stream.done
attribute:
When generating the client code for ReceiptType
none of the fields of
ReceiptItem
will be converted to optional. And when parsing an LLM response,
no ReceiptItem
will be streamed out until all of its fields are done being
streamed in.
@stream.not_null
Sometimes the presence of a value is important to the correct interpretation of a containing value. This commonly occurs with tags used to determine which part of a union is being used.
For example, in this code block, @stream.not_null
on each of the
message_type
fields will ensure that an Event
is never streamed until
enough tokens have been received to precisely know what the message type is,
allowing you to build UI appropriate to the message type before the other
fields have been completed.
You might wonder if it’s sufficient to use @stream.done
on the
message_type
field. @stream.done
applies to types, preventing them
from streaming out until they are completed. On the other hand,
@stream.not_null
applies to fields and prevents a containing object
from streaming out until that field is present.
A type with @stream.done
on it will still be converted to a
nullable field in the generated partial types, so this change would not
produce the desired result of witholding a Message
until its type is
known. Messages would be streamed with message_type: null
.
This is a subtle distinction between @stream.done
and
@stream.not_null
. As a rule of thumb, remember that @stream.done
is about the type itself, and @stream.not_null
is about the type’s
containing context.
@stream.with_state
It is often useful to know in client code whether a value is finished, or
could be updated in future messages. The @stream.with_state
attribute lets
you attach metadata to types to indicate this state in client code.
BAML
Python
Typescript
Putting it all together
Let’s put all of these concepts together to design an application that streams a conversation containing stock recommendations, using semantic streaming to ensure that the streamed data obeys our domain’s invariants.
Python
Typescript
The above BAML code will generate the following Python definitions in the
partial
module. The use of streaming attributes has several effects on
the generated code:
Recommendation
does not have any partial field because it was marked@stream.done
.- The
Message.message
string
is wrapped inStreamState
, allowing runtime checking of its completion status. This status could be used to render a spinner as the message streams in. - The
Message.message_type
field may not benull
, because it was marked as@stream.not_null
.