Token Optimization
Experimenting with Data Format for Token Efficiency
When working with LLMs, token usage directly impacts both cost and latency. Different serialization formats can affect how many tokens are used to represent your data—but the optimal format depends on your specific use case and LLM.
Important: Test Before Adopting
Every optimization has trade-offs. Reducing token count doesn’t automatically improve accuracy, and different LLMs may respond differently to different formats. You should:
- Test with your actual data and prompts
- Measure accuracy alongside token savings
- Compare multiple formats (JSON, YAML, TOON, or custom)
- Validate with your specific LLM
What works for one use case may not work for another.
Available Format Options
BAML’s format filter lets you experiment with different serializations:
TOON Format
TOON (Token-Oriented Object Notation) is a compact format that uses:
- Indentation-based structure (like YAML)
- Tabular format for arrays of objects (declare keys once, stream rows)
- Minimal punctuation
- Explicit array lengths and field headers (LLM-friendly guardrails)
What TOON is good for:
- Uniform arrays of objects with many similar items
- Data that’s already highly structured and tabular
- When LLM validation of structure matters (explicit lengths help)
When TOON may NOT help:
- Deeply nested or non-uniform structures (JSON-compact often uses fewer tokens)
- Semi-uniform arrays (~40-60% tabular eligibility) where savings diminish
- Pure flat tables (CSV is more compact)
- Latency-critical applications (benchmark on your setup - some models process compact JSON faster despite higher token count)
Learn more: TOON specification and benchmarks
Understanding Data Structure
Tabular Eligibility
TOON’s efficiency comes from its tabular format for arrays. Your data’s “tabular eligibility” affects how much TOON can help:
- High eligibility (80-100%): Mostly uniform arrays of objects with the same fields → TOON excels
- Medium eligibility (40-60%): Mix of uniform and non-uniform data → savings diminish, may not be worth it
- Low eligibility (0-20%): Deeply nested, varied structures → JSON-compact may use fewer tokens
Example - High eligibility:
All users have identical fields → 100% tabular → TOON helps
Example - Low eligibility:
Deeply nested with no arrays of uniform objects → 0% tabular → JSON-compact likely better
Considerations for Format Selection
When to Experiment with Compact Formats
- Passing large datasets with high tabular eligibility
- Token costs are significant
- After you’ve validated accuracy with standard formats
When to Stick with Standard Formats
- Starting a new project (establish baseline first)
- Your LLM performs poorly with alternative formats
- Data structure is very small or deeply nested
- Team familiarity matters more than token cost
Experimentation Example
Here’s how you might test different formats for a product analysis task:
Baseline: Using JSON
When you pass products to this function, they’re serialized as JSON:
Experiment: Trying TOON
To test if TOON works for your use case, try the format(type="toon") filter:
The same data serialized as TOON:
Next steps: Test with your actual prompts and measure both token usage and accuracy.
TOON Options
Custom Indentation
Control spacing for better readability:
Alternative Delimiters
Choose the delimiter that works best for your data:
Delimiter trade-offs:
- Tab (
\t): Often tokenizes more efficiently than commas; tabs rarely appear in data (less quote-escaping needed); but some editors/terminals may display tabs inconsistently - Pipe (
|): Middle ground between comma and tab; explicit visual separator - Comma (default): Most familiar, but may require more quoting if your data contains commas
Test different delimiters with your actual data - the best choice depends on your content.
Length Markers
Add length indicators for clarity:
Output:
Real-World Use Case: Transaction Analysis
Here’s a complete example analyzing financial transactions:
Understanding Trade-offs
Token reduction is only valuable if accuracy and reliability are maintained. Consider:
What You Might Gain
- Lower token costs per API call
- Ability to fit more data in context windows
- Validation benefits (TOON’s explicit array lengths can help LLMs detect truncated data)
What You Might Lose
- LLM comprehension (benchmark results show format performance varies by model and dataset type)
- Latency (some models may process compact JSON faster despite higher token count - measure TTFT and total time)
- Debugging ease (non-standard formats are harder to inspect)
- Team velocity (custom formats require explanation and documentation)
- Accuracy (format changes can affect model output quality)
Real Benchmark Insights
According to TOON’s benchmarks:
- TOON excels with uniform employee records, e-commerce orders, GitHub repo lists
- JSON-compact wins on semi-uniform event logs, some deeply nested configs
- Model-dependent: GPT-5-nano showed 90.9% accuracy with both TOON and JSON-compact, while Claude Haiku showed 59.8% (TOON) vs 57.4% (JSON)
- Structure matters: Tabular eligibility strongly predicts which format will be more efficient
Critical: Lost accuracy, increased debugging time, or degraded user experience typically cost far more than token savings. Always measure end-to-end impact on your specific workload, not just token counts.
Experimentation Guidelines
1. Start with a Baseline
Always establish a baseline with a standard format first:
Measure: Accuracy, token usage, latency, cost
2. Test Alternative Formats
Try different formats and compare results:
Measure: Do you maintain accuracy? How much do tokens reduce?
3. Consider Your Data Structure
Different formats work better for different structures. From TOON benchmarks:
Key insight: For pure flat tables, CSV is more compact than TOON. For deeply nested data, compact JSON may win. TOON’s sweet spot is uniform arrays of objects with multiple fields.
4. Test with Your LLM
Different models may respond differently to format changes. Test with the specific LLM you’re using.
Tip from TOON documentation: When using TOON, show the format instead of describing it. Models parse the structure naturally from examples - the indentation and headers are usually self-documenting.
How to Measure Impact
Using BAML Playground
- Write your function with your baseline format (usually JSON)
- Run it in the playground and record:
- Actual token count (shown in playground)
- LLM response quality and accuracy
- Time to first token (TTFT) and total latency
- Any parsing errors
- Response consistency across multiple runs
- Change to an alternative format
- Compare ALL metrics: tokens, accuracy, latency, error rates
- Run multiple test cases with diverse inputs
- Verify edge cases and error scenarios
Important: Lower token count doesn’t guarantee lower latency. Some models may process familiar formats (like JSON) faster even if they use more tokens. Measure end-to-end response time.
In Production
- Use BAML Studio to monitor token usage AND accuracy
- Track accuracy metrics alongside token/cost metrics
- A/B test formats if possible (measure both cost and quality)
- Monitor latency - cheaper formats that are slower may not be worth it
- Be ready to roll back quickly if quality or performance degrades
Next Steps
- Don’t assume, test: Try different formats with your actual data
- Measure what matters: Track accuracy, not just token counts
- Start small: Test on non-critical workloads first
- Document results: Note which formats work best for which use cases
- Consider alternatives: Custom serialization, selective fields, or prompt redesign might also help
See Also
- Jinja Filters Reference - Complete filter documentation
- BAML Studio - Monitor token usage in production
- Prompt Caching - Additional cost optimization