Configuring Timeouts

Timeouts help you build resilient applications by preventing requests from hanging indefinitely. BAML provides granular timeout controls at multiple stages of the request lifecycle.

Why Use Timeouts?

Without timeouts, your application can stall when:

  • LLM provider endpoints are unreachable
  • Providers accept requests but take too long to respond
  • Network connections stall mid-stream
  • Long-running requests exceed your application’s latency requirements

Timeouts let you fail fast and either retry or fallback to alternative clients.

Quick Start

Add timeouts to any client by specifying timeout values in the http block within options:

1client<llm> MyClient {
2 provider openai
3 options {
4 model "gpt-4"
5 api_key env.OPENAI_API_KEY
6
7 // Set timeouts (all values in milliseconds)
8 http {
9 connect_timeout_ms 5000 // 5 seconds to connect
10 request_timeout_ms 30000 // 30 seconds total
11 }
12 }
13}

Available Timeout Types

BAML supports four types of timeouts for individual requests, plus a fifth timeout type for composite clients (fallback, round-robin):

connect_timeout_ms

Maximum time to establish a connection to the LLM provider.

When to use: Detect unreachable endpoints quickly.

1client<llm> MyClient {
2 provider openai
3 options {
4 model "gpt-4"
5 api_key env.OPENAI_API_KEY
6 http {
7 connect_timeout_ms 3000 // Fail if can't connect within 3s
8 }
9 }
10}

time_to_first_token_timeout_ms

Maximum time to receive the first token after sending the request.

When to use: Detect when the provider accepts your request but takes too long to start generating.

1client<llm> MyClient {
2 provider openai
3 options {
4 model "gpt-4"
5 api_key env.OPENAI_API_KEY
6 http {
7 time_to_first_token_timeout_ms 10000 // First token within 10s
8 }
9 }
10}

This timeout is especially useful for streaming responses where you want to ensure the LLM starts responding quickly, even if the full response takes longer.

idle_timeout_ms

Maximum time between receiving data chunks during streaming.

When to use: Detect stalled connections where the provider stops sending data mid-response.

1client<llm> MyClient {
2 provider openai
3 options {
4 model "gpt-4"
5 api_key env.OPENAI_API_KEY
6 http {
7 idle_timeout_ms 15000 // No more than 15s between chunks
8 }
9 }
10}

request_timeout_ms

Maximum total time for the entire request-response cycle.

When to use: Ensure requests complete within your application’s latency requirements.

1client<llm> MyClient {
2 provider openai
3 options {
4 model "gpt-4"
5 api_key env.OPENAI_API_KEY
6 http {
7 request_timeout_ms 60000 // Complete within 60s total
8 }
9 }
10}

Timeouts with Retry Policies

Each retry attempt gets the full timeout duration:

1retry_policy Aggressive {
2 max_retries 3
3 strategy {
4 type exponential_backoff
5 }
6}
7
8client<llm> MyClient {
9 provider openai
10 retry_policy Aggressive
11 options {
12 model "gpt-4"
13 api_key env.OPENAI_API_KEY
14 http {
15 request_timeout_ms 30000 // 30s per attempt, including retries
16 }
17 }
18}

If the first attempt times out at 30 seconds, the retry mechanism kicks in and the next attempt gets a fresh 30-second timeout.

Total time: Up to 4 attempts × 30s + retry delays = ~2+ minutes

Runtime Timeout Overrides

Override timeouts at runtime using the Client Registry:

Handling Timeout Errors

Timeout errors are a subclass of BamlClientError called BamlTimeoutError. You can catch them specifically:

1from baml_client import b
2from baml_py.errors import BamlTimeoutError, BamlClientError
3
4try:
5 result = await b.ExtractData(input)
6except BamlTimeoutError as e:
7 # Handle timeout specifically
8 print(f"Request timed out: {e.message}")
9 print(f"Timeout type: {e.timeout_type}")
10 print(f"Configured: {e.configured_value_ms}ms, Elapsed: {e.elapsed_ms}ms")
11except BamlClientError as e:
12 # Handle other client errors
13 print(f"Client error: {e.message}")

For more on error handling, see Error Handling.

For most production applications, we recommend starting with:

1client<llm> ProductionClient {
2 provider openai
3 options {
4 model "gpt-4"
5 api_key env.OPENAI_API_KEY
6
7 http {
8 connect_timeout_ms 10000 // 10s to connect
9 time_to_first_token_timeout_ms 30000 // 30s to first token
10 idle_timeout_ms 2000 // 2s between chunks
11 request_timeout_ms 300000 // 5 minutes total
12 }
13 }
14}

For fallback clients with stricter requirements:

1client<llm> FallbackClient {
2 provider fallback
3 options {
4 strategy [Primary, Secondary, Tertiary]
5
6 http {
7 connect_timeout_ms 5000 // Faster failover
8 time_to_first_token_timeout_ms 15000
9 idle_timeout_ms 2000
10 request_timeout_ms 120000 // 2 min per attempt
11 }
12 }
13}

Tips and Best Practices

Start Conservative, Then Optimize

Begin with generous timeouts and monitor your application’s performance. Tighten timeouts gradually based on real-world data.

Different Timeouts for Different Models

Faster models can use stricter timeouts:

1client<llm> FastTurbo {
2 provider openai
3 options {
4 model "gpt-3.5-turbo"
5 api_key env.OPENAI_API_KEY
6 http {
7 request_timeout_ms 15000 // Turbo is fast
8 }
9 }
10}
11
12client<llm> SlowButSmart {
13 provider openai
14 options {
15 model "gpt-4"
16 api_key env.OPENAI_API_KEY
17 http {
18 request_timeout_ms 60000 // GPT-4 needs more time
19 }
20 }
21}

Monitor Timeout Rates

Track how often timeouts occur using BAML Studio or your own observability tools. High timeout rates indicate you should either:

  • Increase timeout values
  • Use faster models
  • Optimize your prompts
  • Add more fallback clients

Timeouts vs Abort Controllers

Timeouts and abort controllers serve different purposes:

  • Timeouts: Automatic, configuration-based time limits
  • Abort controllers: Manual, user-initiated cancellation

Use timeouts for resilience and SLAs. Use abort controllers when users explicitly cancel operations.

You can use both together:

1const controller = new AbortController()
2
3// User clicks "cancel" button
4button.onclick = () => controller.abort()
5
6try {
7 const result = await b.ExtractData(input, {
8 abortController: controller
9 // Client still has its configured timeouts
10 })
11} catch (e) {
12 if (e instanceof BamlAbortError) {
13 console.log('User cancelled')
14 } else if (e instanceof BamlTimeoutError) {
15 console.log('Request timed out')
16 }
17}