Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

LLM Context Engineering

TeaLeaf’s primary use case is context engineering for Large Language Model applications. This guide explains why and how.

The Problem

LLM context windows are limited and expensive. Typical structured data (tool definitions, conversation history, user profiles) consumes tokens proportional to format verbosity:

{
  "messages": [
    {"role": "user", "content": "Hello", "tokens": 2},
    {"role": "assistant", "content": "Hi there!", "tokens": 3},
    {"role": "user", "content": "What's the weather?", "tokens": 5},
    {"role": "assistant", "content": "Let me check...", "tokens": 4}
  ]
}

Every message repeats "role", "content", "tokens". With 50+ messages, this overhead adds up.

The TeaLeaf Approach

@struct Message (role: string, content: string, tokens: int?)

messages: @table Message [
  (user, Hello, 2),
  (assistant, "Hi there!", 3),
  (user, "What's the weather?", 5),
  (assistant, "Let me check...", 4),
]

Field names defined once. Data is positional. For 50 messages, this saves ~40% in text size and ~80% in binary.

Context Assembly Pattern

Define Schemas for Your Context

@struct Tool (name: string, description: string, params: []string)
@struct Message (role: string, content: string, tokens: int?)
@struct UserProfile (id: int, name: string, preferences: []string)

system_prompt: """
  You are a helpful assistant with access to the user's profile
  and conversation history. Use the tools when appropriate.
"""

user: @table UserProfile [
  (42, "Alice", ["concise_responses", "code_examples"]),
]

tools: @table Tool [
  (search, "Search the web for information", ["query"]),
  (calculate, "Evaluate a mathematical expression", ["expression"]),
  (weather, "Get current weather for a location", ["city", "country"]),
]

history: @table Message [
  (user, Hello, 2),
  (assistant, "Hi there! How can I help?", 7),
]

Binary Caching

Compiled .tlbx files make excellent context caches:

#![allow(unused)]
fn main() {
use tealeaf::{TeaLeafBuilder, ToTeaLeaf};

// Build context document
let doc = TeaLeafBuilder::new()
    .add_value("system_prompt", Value::String(system_prompt))
    .add_vec("tools", &tools)
    .add_vec("history", &messages)
    .add("user", &user_profile)
    .build();

// Cache as binary (fast to read back)
doc.compile("context_cache.tlbx", true)?;

// Later: load instantly from binary
let cached = tealeaf::Reader::open("context_cache.tlbx")?;
}

Sending to LLM

Convert to compact text for maximum token efficiency:

#![allow(unused)]
fn main() {
use tealeaf::FormatOptions;

let doc = TeaLeaf::load("context.tl")?;

// Maximum token savings: compact whitespace + compact floats
let opts = FormatOptions::compact().with_compact_floats();
let context_text = doc.to_tl_with_options(&opts);
// Send context_text as part of the prompt
}

Two levels of compaction are available:

OptionWhat it doesSavings
compactRemoves insignificant whitespace (spaces after : and ,, indentation, blank lines)~10-12% over pretty
compact_floatsStrips .0 from whole-number floats (35934000000.035934000000)Additional savings on numeric data

The compact_floats option is especially effective for financial and scientific datasets with many whole-number float values. The trade-off is that re-parsing produces Int instead of Float for those values – see Round-Trip Fidelity.

For readable debugging, use the pretty-printed variant:

#![allow(unused)]
fn main() {
let pretty_text = doc.to_tl_with_schemas();
}

Or convert to JSON for APIs that expect it:

#![allow(unused)]
fn main() {
let json = doc.to_json()?;
}

Size Comparison: Real-World Context

For a typical LLM context with 50 messages, 10 tools, and a user profile:

FormatApproximate Size
JSON~15 KB
TeaLeaf Text (pretty)~8 KB
TeaLeaf Text (compact)~7 KB
TeaLeaf Binary~4 KB
TeaLeaf Binary (compressed)~3 KB

Token savings are significant but less than byte savings. BPE tokenizers partially compress repeated JSON field names, so byte savings overstate token savings by 5-18 percentage points depending on data repetitiveness. On real-world data (14 tasks, 7 domains), expect ~51% fewer data tokens. Savings range from 27% (small datasets) to 77% (tabular data with high schema repetition).

Token Comparison (verified via OpenAI tokenizer)

DatasetJSON tokensTeaLeaf tokensSavings
Healthcare records90357237%
Retail orders9,8295,63243%

At the API level, prompt instructions are identical for both formats, diluting data-only savings (~36%) to ~30% of total input tokens.

Structured Outputs

LLMs can also produce TeaLeaf-formatted responses:

@struct Insight (category: string, finding: string, confidence: float)

analysis: @table Insight [
  (revenue, "Q4 revenue grew 15% YoY", 0.92),
  (churn, "Customer churn decreased by 3%", 0.87),
  (forecast, "Projected 20% growth in Q1", 0.73),
]

This can then be parsed and processed programmatically:

#![allow(unused)]
fn main() {
let response = TeaLeaf::parse(&llm_output)?;
if let Some(Value::Array(insights)) = response.get("analysis") {
    for insight in insights {
        // Process each structured insight
    }
}
}

Best Practices

  1. Define schemas for all structured context – tool definitions, messages, profiles
  2. Use @table for arrays of uniform objects – conversation history, search results
  3. Use compact text for LLM inputFormatOptions::compact().with_compact_floats() for maximum token savings
  4. Cache compiled binary for frequently-used context segments
  5. String deduplication helps when context has repetitive strings (roles, tool names)
  6. Separate static and dynamic context – compile static context once, merge at runtime

Benchmark Results

The accuracy-benchmark suite compares TeaLeaf vs JSON vs TOON on Claude Sonnet 4.5 and GPT-5.2:

Real-world results (14 tasks, 7 domains, Claude Sonnet 4.5 + GPT-5.2):

MetricTeaLeafJSONTOON
Anthropic accuracy0.9420.9450.939
OpenAI accuracy0.9250.9240.928
Input token savings-51%baseline-20%
  • No accuracy loss – scores within noise across all three formats
  • Savings range from 27% (small datasets) to 77% (tabular data)
  • Evidence package with prompts, responses, and analysis available in accuracy-benchmark/evidence/
  • See the benchmark README for full methodology and results.