Token Optimization

Experimenting with Data Format for Token Efficiency

When working with LLMs, token usage directly impacts both cost and latency. Different serialization formats can affect how many tokens are used to represent your data—but the optimal format depends on your specific use case and LLM.

Important: Test Before Adopting

Every optimization has trade-offs. Reducing token count doesn’t automatically improve accuracy, and different LLMs may respond differently to different formats. You should:

  • Test with your actual data and prompts
  • Measure accuracy alongside token savings
  • Compare multiple formats (JSON, YAML, TOON, or custom)
  • Validate with your specific LLM

What works for one use case may not work for another.

Available Format Options

BAML’s format filter lets you experiment with different serializations:

1{{ data|format(type="json") }} {# Standard JSON #}
2{{ data|format(type="yaml") }} {# YAML format #}
3{{ data|format(type="toon") }} {# TOON format #}

TOON Format

TOON (Token-Oriented Object Notation) is a compact format that uses:

  • Indentation-based structure (like YAML)
  • Tabular format for arrays of objects (declare keys once, stream rows)
  • Minimal punctuation
  • Explicit array lengths and field headers (LLM-friendly guardrails)

What TOON is good for:

  • Uniform arrays of objects with many similar items
  • Data that’s already highly structured and tabular
  • When LLM validation of structure matters (explicit lengths help)

When TOON may NOT help:

  • Deeply nested or non-uniform structures (JSON-compact often uses fewer tokens)
  • Semi-uniform arrays (~40-60% tabular eligibility) where savings diminish
  • Pure flat tables (CSV is more compact)
  • Latency-critical applications (benchmark on your setup - some models process compact JSON faster despite higher token count)

Learn more: TOON specification and benchmarks

Understanding Data Structure

Tabular Eligibility

TOON’s efficiency comes from its tabular format for arrays. Your data’s “tabular eligibility” affects how much TOON can help:

  • High eligibility (80-100%): Mostly uniform arrays of objects with the same fields → TOON excels
  • Medium eligibility (40-60%): Mix of uniform and non-uniform data → savings diminish, may not be worth it
  • Low eligibility (0-20%): Deeply nested, varied structures → JSON-compact may use fewer tokens

Example - High eligibility:

1{ "users": [
2 {"id": 1, "name": "Alice", "role": "admin"},
3 {"id": 2, "name": "Bob", "role": "user"}
4]}

All users have identical fields → 100% tabular → TOON helps

Example - Low eligibility:

1{ "config": {
2 "server": { "host": "localhost", "port": 8080 },
3 "database": { "name": "prod", "pool": { "min": 5, "max": 20 }}
4}}

Deeply nested with no arrays of uniform objects → 0% tabular → JSON-compact likely better

Considerations for Format Selection

When to Experiment with Compact Formats

  • Passing large datasets with high tabular eligibility
  • Token costs are significant
  • After you’ve validated accuracy with standard formats

When to Stick with Standard Formats

  • Starting a new project (establish baseline first)
  • Your LLM performs poorly with alternative formats
  • Data structure is very small or deeply nested
  • Team familiarity matters more than token cost

Experimentation Example

Here’s how you might test different formats for a product analysis task:

Baseline: Using JSON

1class Product {
2 id int
3 name string
4 price float
5 in_stock bool
6}
7
8function AnalyzeProducts(products: Product[]) -> string {
9 client GPT4
10 prompt #"
11 Analyze these products and provide insights:
12
13 {{ products }}
14
15 Focus on pricing trends and inventory status.
16 "#
17}

When you pass products to this function, they’re serialized as JSON:

1[
2 { "id": 1, "name": "Widget", "price": 9.99, "in_stock": true },
3 { "id": 2, "name": "Gadget", "price": 19.99, "in_stock": false }
4]

Experiment: Trying TOON

To test if TOON works for your use case, try the format(type="toon") filter:

1function AnalyzeProducts(products: Product[]) -> string {
2 client GPT4
3 prompt #"
4 Analyze these products and provide insights:
5
6 {{ products|format(type="toon") }}
7
8 Focus on pricing trends and inventory status.
9 "#
10}

The same data serialized as TOON:

[2]{id,name,price,in_stock}:
1,Widget,9.99,true
2,Gadget,19.99,false

Next steps: Test with your actual prompts and measure both token usage and accuracy.

TOON Options

Custom Indentation

Control spacing for better readability:

1{{ products|format(type="toon", indent=4) }}

Alternative Delimiters

Choose the delimiter that works best for your data:

1{# Comma-separated (default) #}
2{{ products|format(type="toon", delimiter="comma") }}
3
4{# Tab-separated #}
5{{ products|format(type="toon", delimiter="tab") }}
6
7{# Pipe-separated #}
8{{ products|format(type="toon", delimiter="pipe") }}

Delimiter trade-offs:

  • Tab (\t): Often tokenizes more efficiently than commas; tabs rarely appear in data (less quote-escaping needed); but some editors/terminals may display tabs inconsistently
  • Pipe (|): Middle ground between comma and tab; explicit visual separator
  • Comma (default): Most familiar, but may require more quoting if your data contains commas

Test different delimiters with your actual data - the best choice depends on your content.

Length Markers

Add length indicators for clarity:

1{{ products|format(type="toon", length_marker="#") }}

Output:

#2[2]{id,name,price,in_stock}:
1,Widget,9.99,true
2,Gadget,19.99,false

Real-World Use Case: Transaction Analysis

Here’s a complete example analyzing financial transactions:

1class Transaction {
2 id string
3 date string
4 amount float
5 category string
6 merchant string
7 status string
8}
9
10function AnalyzeTransactions(
11 transactions: Transaction[],
12 question: string
13) -> string {
14 client GPT4
15 prompt #"
16 {{ _.role("system") }}
17 You are a financial analyst. Answer questions about transaction data.
18
19 {{ _.role("user") }}
20 Transaction data:
21 {{ transactions|format(type="toon", delimiter="pipe") }}
22
23 Question: {{ question }}
24 "#
25}
26
27test AnalyzeSpending {
28 functions [AnalyzeTransactions]
29 args {
30 transactions [
31 {
32 id: "tx_001",
33 date: "2025-01-15",
34 amount: 45.99,
35 category: "Dining",
36 merchant: "Coffee Shop",
37 status: "completed"
38 },
39 {
40 id: "tx_002",
41 date: "2025-01-16",
42 amount: 120.00,
43 category: "Shopping",
44 merchant: "Electronics Store",
45 status: "completed"
46 }
47 ]
48 question "What's my largest expense category this week?"
49 }
50}

Understanding Trade-offs

Token reduction is only valuable if accuracy and reliability are maintained. Consider:

What You Might Gain

  • Lower token costs per API call
  • Ability to fit more data in context windows
  • Validation benefits (TOON’s explicit array lengths can help LLMs detect truncated data)

What You Might Lose

  • LLM comprehension (benchmark results show format performance varies by model and dataset type)
  • Latency (some models may process compact JSON faster despite higher token count - measure TTFT and total time)
  • Debugging ease (non-standard formats are harder to inspect)
  • Team velocity (custom formats require explanation and documentation)
  • Accuracy (format changes can affect model output quality)

Real Benchmark Insights

According to TOON’s benchmarks:

  • TOON excels with uniform employee records, e-commerce orders, GitHub repo lists
  • JSON-compact wins on semi-uniform event logs, some deeply nested configs
  • Model-dependent: GPT-5-nano showed 90.9% accuracy with both TOON and JSON-compact, while Claude Haiku showed 59.8% (TOON) vs 57.4% (JSON)
  • Structure matters: Tabular eligibility strongly predicts which format will be more efficient

Critical: Lost accuracy, increased debugging time, or degraded user experience typically cost far more than token savings. Always measure end-to-end impact on your specific workload, not just token counts.

Experimentation Guidelines

1. Start with a Baseline

Always establish a baseline with a standard format first:

1function AnalyzeData(data: Dataset[]) -> Analysis {
2 client GPT4
3 prompt #"
4 {{ _.role("user") }}
5 Data:
6 {{ data|format(type="json") }} // Start with JSON baseline
7
8 Provide analysis.
9 "#
10}

Measure: Accuracy, token usage, latency, cost

2. Test Alternative Formats

Try different formats and compare results:

1{# Experiment 1: YAML #}
2{{ data|format(type="yaml") }}
3
4{# Experiment 2: TOON #}
5{{ data|format(type="toon") }}
6
7{# Experiment 3: TOON with options #}
8{{ data|format(type="toon", delimiter="pipe") }}

Measure: Do you maintain accuracy? How much do tokens reduce?

3. Consider Your Data Structure

Different formats work better for different structures. From TOON benchmarks:

1{# High tabular eligibility: uniform arrays #}
2{{ products|format(type="toon") }} // Try TOON
3{{ products|format(type="json") }} // Compare vs JSON
4
5{# Low tabular eligibility: deeply nested config #}
6{{ config|format(type="json") }} // JSON-compact often better
7{{ config|format(type="yaml") }} // Or try YAML
8
9{# Medium eligibility: mixed structures #}
10{{ events|format(type="json") }} // Test multiple formats
11{{ events|format(type="toon") }} // Results vary

Key insight: For pure flat tables, CSV is more compact than TOON. For deeply nested data, compact JSON may win. TOON’s sweet spot is uniform arrays of objects with multiple fields.

4. Test with Your LLM

Different models may respond differently to format changes. Test with the specific LLM you’re using.

Tip from TOON documentation: When using TOON, show the format instead of describing it. Models parse the structure naturally from examples - the indentation and headers are usually self-documenting.

How to Measure Impact

Using BAML Playground

  1. Write your function with your baseline format (usually JSON)
  2. Run it in the playground and record:
    • Actual token count (shown in playground)
    • LLM response quality and accuracy
    • Time to first token (TTFT) and total latency
    • Any parsing errors
    • Response consistency across multiple runs
  3. Change to an alternative format
  4. Compare ALL metrics: tokens, accuracy, latency, error rates
  5. Run multiple test cases with diverse inputs
  6. Verify edge cases and error scenarios

Important: Lower token count doesn’t guarantee lower latency. Some models may process familiar formats (like JSON) faster even if they use more tokens. Measure end-to-end response time.

In Production

  • Use BAML Studio to monitor token usage AND accuracy
  • Track accuracy metrics alongside token/cost metrics
  • A/B test formats if possible (measure both cost and quality)
  • Monitor latency - cheaper formats that are slower may not be worth it
  • Be ready to roll back quickly if quality or performance degrades

Next Steps

  • Don’t assume, test: Try different formats with your actual data
  • Measure what matters: Track accuracy, not just token counts
  • Start small: Test on non-critical workloads first
  • Document results: Note which formats work best for which use cases
  • Consider alternatives: Custom serialization, selective fields, or prompt redesign might also help

See Also