Prompt Caching / Message Role Metadata

Recall that an LLM request usually looks like this, where it sometimes has metadata in each message. In this case, Anthropic has a cache_control key.

Anthropic Request
1curl https://api.anthropic.com/v1/messages \
2 -H "content-type: application/json" \
3 -H "anthropic-beta: prompt-caching-2024-07-31" \
4 -d '{
5 "model": "claude-3-5-sonnet-20241022",
6 "max_tokens": 1024,
7 "messages": [
8 {
9 "type": "text",
10 "text": "<the entire contents of Pride and Prejudice>",
11 "cache_control": {"type": "ephemeral"}
12 },
13 {
14 "role": "user",
15 "content": "Analyze the major themes in Pride and Prejudice."
16 }
17 ]
18 }'

This is nearly the same as this BAML code, minus the cache_control metadata:

Let’s add the cache-control metadata to each of our messges in BAML now. There’s just 2 steps:

1

Allow role metadata and header in the client definition

main.baml
1client<llm> AnthropicClient {
2 provider "anthropic"
3 options {
4 model "claude-3-5-sonnet-20241022"
5 allowed_role_metadata ["cache_control"]
6 headers {
7 "anthropic-beta" "prompt-caching-2024-07-31"
8 }
9 }
10}
2

Add the metadata to the messages

main.baml
1function AnalyzeBook(book: string) -> string {
2 client<llm> AnthropicClient
3 prompt #"
4 {{ _.role("user") }}
5 {{ book }}
6 {{ _.role("user", cache_control={"type": "ephemeral"}) }}
7 Analyze the major themes in Pride and Prejudice.
8 "#
9}

We have the “allowed_role_metadata” so that if you swap to other LLM clients, we don’t accidentally forward the wrong metadata to the new provider API.

Remember to check the “raw curl” checkbox in the VSCode Playground to see the exact request being sent!