Timeouts help you build resilient applications by preventing requests from hanging indefinitely. BAML provides granular timeout controls at multiple stages of the request lifecycle.
Without timeouts, your application can stall when:
Timeouts let you fail fast and either retry or fallback to alternative clients.
Add timeouts to any client by specifying timeout values in the http block within options:
BAML supports four types of timeouts for individual requests, plus a fifth timeout type for composite clients (fallback, round-robin):
connect_timeout_msMaximum time to establish a connection to the LLM provider.
When to use: Detect unreachable endpoints quickly.
time_to_first_token_timeout_msMaximum time to receive the first token after sending the request.
When to use: Detect when the provider accepts your request but takes too long to start generating.
This timeout is especially useful for streaming responses where you want to ensure the LLM starts responding quickly, even if the full response takes longer.
idle_timeout_msMaximum time between receiving data chunks during streaming.
When to use: Detect stalled connections where the provider stops sending data mid-response.
request_timeout_msMaximum total time for the entire request-response cycle.
When to use: Ensure requests complete within your application’s latency requirements.
Each retry attempt gets the full timeout duration:
If the first attempt times out at 30 seconds, the retry mechanism kicks in and the next attempt gets a fresh 30-second timeout.
Total time: Up to 4 attempts × 30s + retry delays = ~2+ minutes
Override timeouts at runtime using the Client Registry:
Timeout errors are a subclass of BamlClientError called BamlTimeoutError. You can catch them specifically:
For more on error handling, see Error Handling.
For most production applications, we recommend starting with:
For fallback clients with stricter requirements:
Begin with generous timeouts and monitor your application’s performance. Tighten timeouts gradually based on real-world data.
Faster models can use stricter timeouts:
Track how often timeouts occur using BAML Studio or your own observability tools. High timeout rates indicate you should either:
Timeouts and abort controllers serve different purposes:
Use timeouts for resilience and SLAs. Use abort controllers when users explicitly cancel operations.
You can use both together: