LLM Context Engineering
TeaLeaf’s primary use case is context engineering for Large Language Model applications. This guide explains why and how.
The Problem
LLM context windows are limited and expensive. Typical structured data (tool definitions, conversation history, user profiles) consumes tokens proportional to format verbosity:
{
"messages": [
{"role": "user", "content": "Hello", "tokens": 2},
{"role": "assistant", "content": "Hi there!", "tokens": 3},
{"role": "user", "content": "What's the weather?", "tokens": 5},
{"role": "assistant", "content": "Let me check...", "tokens": 4}
]
}
Every message repeats "role", "content", "tokens". With 50+ messages, this overhead adds up.
The TeaLeaf Approach
@struct Message (role: string, content: string, tokens: int?)
messages: @table Message [
(user, Hello, 2),
(assistant, "Hi there!", 3),
(user, "What's the weather?", 5),
(assistant, "Let me check...", 4),
]
Field names defined once. Data is positional. For 50 messages, this saves ~40% in text size and ~80% in binary.
Context Assembly Pattern
Define Schemas for Your Context
@struct Tool (name: string, description: string, params: []string)
@struct Message (role: string, content: string, tokens: int?)
@struct UserProfile (id: int, name: string, preferences: []string)
system_prompt: """
You are a helpful assistant with access to the user's profile
and conversation history. Use the tools when appropriate.
"""
user: @table UserProfile [
(42, "Alice", ["concise_responses", "code_examples"]),
]
tools: @table Tool [
(search, "Search the web for information", ["query"]),
(calculate, "Evaluate a mathematical expression", ["expression"]),
(weather, "Get current weather for a location", ["city", "country"]),
]
history: @table Message [
(user, Hello, 2),
(assistant, "Hi there! How can I help?", 7),
]
Binary Caching
Compiled .tlbx files make excellent context caches:
#![allow(unused)]
fn main() {
use tealeaf::{TeaLeafBuilder, ToTeaLeaf};
// Build context document
let doc = TeaLeafBuilder::new()
.add_value("system_prompt", Value::String(system_prompt))
.add_vec("tools", &tools)
.add_vec("history", &messages)
.add("user", &user_profile)
.build();
// Cache as binary (fast to read back)
doc.compile("context_cache.tlbx", true)?;
// Later: load instantly from binary
let cached = tealeaf::Reader::open("context_cache.tlbx")?;
}
Sending to LLM
Convert to compact text for maximum token efficiency:
#![allow(unused)]
fn main() {
use tealeaf::FormatOptions;
let doc = TeaLeaf::load("context.tl")?;
// Maximum token savings: compact whitespace + compact floats
let opts = FormatOptions::compact().with_compact_floats();
let context_text = doc.to_tl_with_options(&opts);
// Send context_text as part of the prompt
}
Two levels of compaction are available:
| Option | What it does | Savings |
|---|---|---|
compact | Removes insignificant whitespace (spaces after : and ,, indentation, blank lines) | ~10-12% over pretty |
compact_floats | Strips .0 from whole-number floats (35934000000.0 → 35934000000) | Additional savings on numeric data |
The compact_floats option is especially effective for financial and scientific datasets with many whole-number float values. The trade-off is that re-parsing produces Int instead of Float for those values – see Round-Trip Fidelity.
For readable debugging, use the pretty-printed variant:
#![allow(unused)]
fn main() {
let pretty_text = doc.to_tl_with_schemas();
}
Or convert to JSON for APIs that expect it:
#![allow(unused)]
fn main() {
let json = doc.to_json()?;
}
Size Comparison: Real-World Context
For a typical LLM context with 50 messages, 10 tools, and a user profile:
| Format | Approximate Size |
|---|---|
| JSON | ~15 KB |
| TeaLeaf Text (pretty) | ~8 KB |
| TeaLeaf Text (compact) | ~7 KB |
| TeaLeaf Binary | ~4 KB |
| TeaLeaf Binary (compressed) | ~3 KB |
Token savings are significant but less than byte savings. BPE tokenizers partially compress repeated JSON field names, so byte savings overstate token savings by 5-18 percentage points depending on data repetitiveness. On real-world data (14 tasks, 7 domains), expect ~51% fewer data tokens. Savings range from 27% (small datasets) to 77% (tabular data with high schema repetition).
Token Comparison (verified via OpenAI tokenizer)
| Dataset | JSON tokens | TeaLeaf tokens | Savings |
|---|---|---|---|
| Healthcare records | 903 | 572 | 37% |
| Retail orders | 9,829 | 5,632 | 43% |
At the API level, prompt instructions are identical for both formats, diluting data-only savings (~36%) to ~30% of total input tokens.
Structured Outputs
LLMs can also produce TeaLeaf-formatted responses:
@struct Insight (category: string, finding: string, confidence: float)
analysis: @table Insight [
(revenue, "Q4 revenue grew 15% YoY", 0.92),
(churn, "Customer churn decreased by 3%", 0.87),
(forecast, "Projected 20% growth in Q1", 0.73),
]
This can then be parsed and processed programmatically:
#![allow(unused)]
fn main() {
let response = TeaLeaf::parse(&llm_output)?;
if let Some(Value::Array(insights)) = response.get("analysis") {
for insight in insights {
// Process each structured insight
}
}
}
Best Practices
- Define schemas for all structured context – tool definitions, messages, profiles
- Use
@tablefor arrays of uniform objects – conversation history, search results - Use compact text for LLM input –
FormatOptions::compact().with_compact_floats()for maximum token savings - Cache compiled binary for frequently-used context segments
- String deduplication helps when context has repetitive strings (roles, tool names)
- Separate static and dynamic context – compile static context once, merge at runtime
Benchmark Results
The accuracy-benchmark suite compares TeaLeaf vs JSON vs TOON on Claude Sonnet 4.5 and GPT-5.2:
Real-world results (14 tasks, 7 domains, Claude Sonnet 4.5 + GPT-5.2):
| Metric | TeaLeaf | JSON | TOON |
|---|---|---|---|
| Anthropic accuracy | 0.942 | 0.945 | 0.939 |
| OpenAI accuracy | 0.925 | 0.924 | 0.928 |
| Input token savings | -51% | baseline | -20% |
- No accuracy loss – scores within noise across all three formats
- Savings range from 27% (small datasets) to 77% (tabular data)
- Evidence package with prompts, responses, and analysis available in
accuracy-benchmark/evidence/ - See the benchmark README for full methodology and results.