Many LLMs are subject to fail due to transient errors. The retry policy allows you to configure how many times and how the client should retry a failed operation before giving up.

Syntax

retry_policy PolicyName {
    max_retries int
    strategy {
      type constant_delay
      delay_ms int? // defaults to 200
    } | {
      type exponential_backoff
      delay_ms int? // defaults to 200
      max_delay_ms int? // defaults to 10000
      multiplier float? // defaults to 1.5
    }
}

Properties

NameDescriptionRequired
max_retriesThe maximum number of times the client should retry a failed operation.YES
strategyThe strategy to use for retrying failed operations.NO, defauts to constant_delay(200ms)

You can read more about specific retry strategy param:

Conditions for retrying

If the client encounters a transient error, it will retry the operation. The following errors are considered transient:

NameError CodeRetry
BAD_REQUEST400NO
UNAUTHORIZED401NO
FORBIDDEN403NO
NOT_FOUND404NO
RATE_LIMITED429YES
INTERNAL_ERROR500YES
SERVICE_UNAVAILABLE503YES
UNKNOWN1YES

The UNKNOWN error code is used when the client encounters an error that is not listed above. This is usually a temporary error, but it is not guaranteed.

Example

Each client may have a different retry policy, or no retry policy at all. But you can also reuse the same retry policy across multiple clients.

// in a .baml file

retry_policy MyRetryPolicy {
  max_retries 5
  strategy {
    type exponential_backoff
  }
}

// A client that uses the OpenAI chat API.
client<llm> MyGPT35Client {
  provider baml-openai-chat
  // Set the retry policy to the MyRetryPolicy defined above.
  // Any impl that uses this client will retry failed operations.
  retry_policy MyRetryPolicy
  options {
    model gpt-3.5-turbo
    api_key env.OPENAI_API_KEY
  }
}