Configuring Timeouts | Boundary Documentation

Timeouts help you build resilient applications by preventing requests from hanging indefinitely. BAML provides granular timeout controls at multiple stages of the request lifecycle.

Why Use Timeouts?

Without timeouts, your application can stall when:

LLM provider endpoints are unreachable
Providers accept requests but take too long to respond
Network connections stall mid-stream
Long-running requests exceed your application’s latency requirements

Timeouts let you fail fast and either retry or fallback to alternative clients.

Quick Start

Add timeouts to any client by specifying timeout values in the http block within options:

1 client<llm> MyClient {
2   provider openai
3   options {
4     model "gpt-4"
5     api_key env.OPENAI_API_KEY
6 
7     // Set timeouts (all values in milliseconds)
8     http {
9       connect_timeout_ms 5000      // 5 seconds to connect
10       request_timeout_ms 30000     // 30 seconds total
11     }
12   }
13 }

Available Timeout Types

BAML supports four types of timeouts for individual requests, plus a fifth timeout type for composite clients (fallback, round-robin):

`connect_timeout_ms`

Maximum time to establish a connection to the LLM provider.

When to use: Detect unreachable endpoints quickly.

1 client<llm> MyClient {
2   provider openai
3   options {
4     model "gpt-4"
5     api_key env.OPENAI_API_KEY
6     http {
7       connect_timeout_ms 3000  // Fail if can't connect within 3s
8     }
9   }
10 }

`time_to_first_token_timeout_ms`

Maximum time to receive the first token after sending the request.

When to use: Detect when the provider accepts your request but takes too long to start generating.

1 client<llm> MyClient {
2   provider openai
3   options {
4     model "gpt-4"
5     api_key env.OPENAI_API_KEY
6     http {
7       time_to_first_token_timeout_ms 10000  // First token within 10s
8     }
9   }
10 }

This timeout is especially useful for streaming responses where you want to ensure the LLM starts responding quickly, even if the full response takes longer.

`idle_timeout_ms`

Maximum time between receiving data chunks during streaming.

When to use: Detect stalled connections where the provider stops sending data mid-response.

1 client<llm> MyClient {
2   provider openai
3   options {
4     model "gpt-4"
5     api_key env.OPENAI_API_KEY
6     http {
7       idle_timeout_ms 15000  // No more than 15s between chunks
8     }
9   }
10 }

`request_timeout_ms`

Maximum total time for the entire request-response cycle.

When to use: Ensure requests complete within your application’s latency requirements.

1 client<llm> MyClient {
2   provider openai
3   options {
4     model "gpt-4"
5     api_key env.OPENAI_API_KEY
6     http {
7       request_timeout_ms 60000  // Complete within 60s total
8     }
9   }
10 }

Timeouts with Retry Policies

Each retry attempt gets the full timeout duration:

1 retry_policy Aggressive {
2   max_retries 3
3   strategy {
4     type exponential_backoff
5   }
6 }
7 
8 client<llm> MyClient {
9   provider openai
10   retry_policy Aggressive
11   options {
12     model "gpt-4"
13     api_key env.OPENAI_API_KEY
14     http {
15       request_timeout_ms 30000  // 30s per attempt, including retries
16     }
17   }
18 }

If the first attempt times out at 30 seconds, the retry mechanism kicks in and the next attempt gets a fresh 30-second timeout.

Total time: Up to 4 attempts × 30s + retry delays = ~2+ minutes

Runtime Timeout Overrides

Override timeouts at runtime using the Client Registry:

Handling Timeout Errors

Timeout errors are a subclass of BamlClientError called BamlTimeoutError. You can catch them specifically:

1 from baml_client import b
2 from baml_py.errors import BamlTimeoutError, BamlClientError
3 
4 try:
5     result = await b.ExtractData(input)
6 except BamlTimeoutError as e:
7     # Handle timeout specifically
8     print(f"Request timed out: {e.message}")
9     print(f"Timeout type: {e.timeout_type}")
10     print(f"Configured: {e.configured_value_ms}ms, Elapsed: {e.elapsed_ms}ms")
11 except BamlClientError as e:
12     # Handle other client errors
13     print(f"Client error: {e.message}")

For more on error handling, see Error Handling.

Recommended Production Timeouts

For most production applications, we recommend starting with:

1 client<llm> ProductionClient {
2   provider openai
3   options {
4     model "gpt-4"
5     api_key env.OPENAI_API_KEY
6 
7     http {
8       connect_timeout_ms 10000                // 10s to connect
9       time_to_first_token_timeout_ms 30000    // 30s to first token
10       idle_timeout_ms 2000                    // 2s between chunks
11       request_timeout_ms 300000               // 5 minutes total
12     }
13   }
14 }

For fallback clients with stricter requirements:

1 client<llm> FallbackClient {
2   provider fallback
3   options {
4     strategy [Primary, Secondary, Tertiary]
5 
6     http {
7       connect_timeout_ms 5000                 // Faster failover
8       time_to_first_token_timeout_ms 15000
9       idle_timeout_ms 2000
10       request_timeout_ms 120000               // 2 min per attempt
11     }
12   }
13 }

Tips and Best Practices

Start Conservative, Then Optimize

Begin with generous timeouts and monitor your application’s performance. Tighten timeouts gradually based on real-world data.

Different Timeouts for Different Models

Faster models can use stricter timeouts:

1 client<llm> FastTurbo {
2   provider openai
3   options {
4     model "gpt-3.5-turbo"
5     api_key env.OPENAI_API_KEY
6     http {
7       request_timeout_ms 15000  // Turbo is fast
8     }
9   }
10 }
11 
12 client<llm> SlowButSmart {
13   provider openai
14   options {
15     model "gpt-4"
16     api_key env.OPENAI_API_KEY
17     http {
18       request_timeout_ms 60000  // GPT-4 needs more time
19     }
20   }
21 }

Monitor Timeout Rates

Track how often timeouts occur using BAML Studio or your own observability tools. High timeout rates indicate you should either:

Increase timeout values
Use faster models
Optimize your prompts
Add more fallback clients

Timeouts vs Abort Controllers

Timeouts and abort controllers serve different purposes:

Timeouts: Automatic, configuration-based time limits
Abort controllers: Manual, user-initiated cancellation

Use timeouts for resilience and SLAs. Use abort controllers when users explicitly cancel operations.

You can use both together:

1 const controller = new AbortController()
2 
3 // User clicks "cancel" button
4 button.onclick = () => controller.abort()
5 
6 try {
7   const result = await b.ExtractData(input, {
8     abortController: controller
9     // Client still has its configured timeouts
10   })
11 } catch (e) {
12   if (e instanceof BamlAbortError) {
13     console.log('User cancelled')
14   } else if (e instanceof BamlTimeoutError) {
15     console.log('Request timed out')
16   }
17 }