Prompt Caching / Message Role Metadata

Recall that an LLM request usually looks like this, where it sometimes has metadata in each message. In this case, Anthropic has a cache_control key.

Anthropic Request

1 curl https://api.anthropic.com/v1/messages \
2   -H "content-type: application/json" \
3   -H "anthropic-beta: prompt-caching-2024-07-31" \
4   -d '{
5     "model": "claude-3-5-sonnet-20241022",
6     "max_tokens": 1024,
7     "messages": [
8        {
9         "type": "text", 
10         "text": "<the entire contents of Pride and Prejudice>",
11         "cache_control": {"type": "ephemeral"}
12       },
13       {
14         "role": "user",
15         "content": "Analyze the major themes in Pride and Prejudice."
16       }
17     ]
18   }'

This is nearly the same as this BAML code, minus the cache_control metadata:

Let’s add the cache-control metadata to each of our messages in BAML now. There’s just 2 steps:

Allow role metadata and header in the client definition

main.baml

1 client<llm> AnthropicClient {
2   provider "anthropic"
3   options {
4     model "claude-3-5-sonnet-20241022"
5     allowed_role_metadata ["cache_control"]
6     headers {
7       "anthropic-beta" "prompt-caching-2024-07-31"
8     }
9   }
10 }

Add the metadata to the messages

main.baml

1 function AnalyzeBook(book: string) -> string {
2   client<llm> AnthropicClient
3   prompt #"
4     {{ _.role("user") }}
5     {{ book }}
6     {{ _.role("user", cache_control={"type": "ephemeral"}) }}
7     Analyze the major themes in Pride and Prejudice.
8   "#
9 }

We have the “allowed_role_metadata” so that if you swap to other LLM clients, we don’t accidentally forward the wrong metadata to the new provider API.

Remember to switch from “Prompt Review” to “Raw cURL” in the VSCode Playground to see the exact request being sent!