LLM Context Engineering

TeaLeaf’s primary use case is context engineering for Large Language Model applications. This guide explains why and how.

The Problem

LLM context windows are limited and expensive. Typical structured data (tool definitions, conversation history, user profiles) consumes tokens proportional to format verbosity:

{
  "messages": [
    {"role": "user", "content": "Hello", "tokens": 2},
    {"role": "assistant", "content": "Hi there!", "tokens": 3},
    {"role": "user", "content": "What's the weather?", "tokens": 5},
    {"role": "assistant", "content": "Let me check...", "tokens": 4}
  ]
}

Every message repeats "role", "content", "tokens". With 50+ messages, this overhead adds up.

The TeaLeaf Approach

@struct Message (role: string, content: string, tokens: int?)

messages: @table Message [
  (user, Hello, 2),
  (assistant, "Hi there!", 3),
  (user, "What's the weather?", 5),
  (assistant, "Let me check...", 4),
]

Field names defined once. Data is positional. For 50 messages, this saves ~40% in text size and ~80% in binary.

Context Assembly Pattern

Define Schemas for Your Context

@struct Tool (name: string, description: string, params: []string)
@struct Message (role: string, content: string, tokens: int?)
@struct UserProfile (id: int, name: string, preferences: []string)

system_prompt: """
  You are a helpful assistant with access to the user's profile
  and conversation history. Use the tools when appropriate.
"""

user: @table UserProfile [
  (42, "Alice", ["concise_responses", "code_examples"]),
]

tools: @table Tool [
  (search, "Search the web for information", ["query"]),
  (calculate, "Evaluate a mathematical expression", ["expression"]),
  (weather, "Get current weather for a location", ["city", "country"]),
]

history: @table Message [
  (user, Hello, 2),
  (assistant, "Hi there! How can I help?", 7),
]

Binary Caching

Compiled .tlbx files make excellent context caches:

#![allow(unused)]
fn main() {
use tealeaf::{TeaLeafBuilder, ToTeaLeaf};

// Build context document
let doc = TeaLeafBuilder::new()
    .add_value("system_prompt", Value::String(system_prompt))
    .add_vec("tools", &tools)
    .add_vec("history", &messages)
    .add("user", &user_profile)
    .build();

// Cache as binary (fast to read back)
doc.compile("context_cache.tlbx", true)?;

// Later: load instantly from binary
let cached = tealeaf::Reader::open("context_cache.tlbx")?;
}

Sending to LLM

Convert to text for LLM consumption:

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("context.tl")?;
let context_text = doc.to_tl_with_schemas();
// Send context_text as part of the prompt
}

Or convert specific sections:

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("context.tl")?;
let json = doc.to_json()?;
// Use JSON for APIs that expect it
}

Size Comparison: Real-World Context

For a typical LLM context with 50 messages, 10 tools, and a user profile:

Format	Approximate Size
JSON	~15 KB
TeaLeaf Text	~8 KB
TeaLeaf Binary	~4 KB
TeaLeaf Binary (compressed)	~3 KB

Token savings are significant but less than byte savings. BPE tokenizers partially compress repeated JSON field names, so byte savings overstate token savings by 5-18 percentage points depending on data repetitiveness. For typical structured data, expect ~36% fewer data tokens (median), with savings increasing for larger and more structured datasets.

Token Comparison (verified via OpenAI tokenizer)

Dataset	JSON tokens	TeaLeaf tokens	Savings
Healthcare records	903	572	37%
Retail orders	9,829	5,632	43%

At the API level, prompt instructions are identical for both formats, diluting data-only savings (~36%) to ~30% of total input tokens.

Structured Outputs

LLMs can also produce TeaLeaf-formatted responses:

@struct Insight (category: string, finding: string, confidence: float)

analysis: @table Insight [
  (revenue, "Q4 revenue grew 15% YoY", 0.92),
  (churn, "Customer churn decreased by 3%", 0.87),
  (forecast, "Projected 20% growth in Q1", 0.73),
]

This can then be parsed and processed programmatically:

#![allow(unused)]
fn main() {
let response = TeaLeaf::parse(&llm_output)?;
if let Some(Value::Array(insights)) = response.get("analysis") {
    for insight in insights {
        // Process each structured insight
    }
}
}

Best Practices

Define schemas for all structured context – tool definitions, messages, profiles
Use @table for arrays of uniform objects – conversation history, search results
Cache compiled binary for frequently-used context segments
Use text format for LLM input – models understand the schema notation
String deduplication helps when context has repetitive strings (roles, tool names)
Separate static and dynamic context – compile static context once, merge at runtime

Benchmark Results

The accuracy-benchmark suite tests 12 tasks across 10 business domains on Claude Sonnet 4.5 and GPT-5.2:

~36% fewer data tokens compared to JSON (savings increase with larger datasets)
No accuracy loss – scores within noise across all providers
See the benchmark README for full methodology and results.

Keyboard shortcuts

TeaLeaf Documentation