Prompt Caching / Message Role Metadata

Recall that an LLM request usually looks like this, where it sometimes has metadata in each message. In this case, Anthropic has a cache_control key.

1curl https://api.anthropic.com/v1/messages \
2 -H "content-type: application/json" \
3 -H "anthropic-beta: prompt-caching-2024-07-31" \
4 -d '{
5 "model": "claude-3-5-sonnet-20241022",
6 "max_tokens": 1024,
7 "messages": [
8 {
9 "type": "text",
10 "text": "<the entire contents of Pride and Prejudice>",
11 "cache_control": {"type": "ephemeral"}
12 },
13 {
14 "role": "user",
15 "content": "Analyze the major themes in Pride and Prejudice."
16 }
17 ]
18 }'

This is nearly the same as this BAML code, minus the cache_control metadata:

Let’s add the cache-control metadata to each of our messges in BAML now. There’s just 2 steps:

1

Allow role metadata and header in the client definition

1client<llm> AnthropicClient {
2 provider "anthropic"
3 options {
4 model "claude-3-5-sonnet-20241022"
5 allowed_role_metadata ["cache_control"]
6 headers {
7 "anthropic-beta" "prompt-caching-2024-07-31"
8 }
9 }
10}
2

Add the metadata to the messages

1function AnalyzeBook(book: string) -> string {
2 client<llm> AnthropicClient
3 prompt #"
4 {{ _.role("user") }}
5 {{ book }}
6 {{ _.role("user", cache_control={"type": "ephemeral"}) }}
7 Analyze the major themes in Pride and Prejudice.
8 "#
9}

We have the “allowed_role_metadata” so that if you swap to other LLM clients, we don’t accidentally forward the wrong metadata to the new provider API.

Remember to check the “raw curl” checkbox in the VSCode Playground to see the exact request being sent!