Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

TeaLeaf Data Format

A schema-aware data format with human-readable text and compact binary representation.

~36% fewer data tokens than JSON for LLM applications, with zero accuracy loss.

v2.0.0-beta.8


What is TeaLeaf?

TeaLeaf is a data format that bridges the gap between human-readable configuration and machine-efficient binary storage. A single .tl source file can be read and edited by humans, compiled to a compact .tlbx binary, and converted to/from JSON – all with schemas inline.

TeaLeaf – schemas with nested structures, compact positional data:

# Schema: define structure once
@struct Location (city, country)
@struct Department (name, location: Location)
@struct Employee (
  id: int,
  name,
  role,
  department: Department,
  skills: []string,
)

# Data: field names not repeated
employees: @table Employee [
  (1, "Alice", "Engineer",
    ("Platform", ("Seattle", "USA")),
    ["rust", "python"])
  (2, "Bob", "Designer",
    ("Product", ("Austin", "USA")),
    ["figma", "css"])
  (3, "Carol", "Manager",
    ("Platform", ("Seattle", "USA")),
    ["leadership", "agile"])
]

JSON – no schema, names repeated:

{
  "employees": [
    {
      "id": 1,
      "name": "Alice",
      "role": "Engineer",
      "department": {
        "name": "Platform",
        "location": { "city": "Seattle", "country": "USA" }
      },
      "skills": ["rust", "python"]
    },
    {
      "id": 2,
      "name": "Bob",
      "role": "Designer",
      "department": {
        "name": "Product",
        "location": { "city": "Austin", "country": "USA" }
      },
      "skills": ["figma", "css"]
    },
    {
      "id": 3,
      "name": "Carol",
      "role": "Manager",
      "department": {
        "name": "Platform",
        "location": { "city": "Seattle", "country": "USA" }
      },
      "skills": ["leadership", "agile"]
    }
  ]
}

Key Features

FeatureDescription
Dual formatHuman-readable text (.tl) and compact binary (.tlbx)
Inline schemas@struct definitions live alongside data – no external .proto files
JSON interopBidirectional conversion with automatic schema inference
String deduplicationBinary format stores each unique string once
CompressionPer-section ZLIB compression with null bitmaps
Comments# line comments in the text format
Language bindingsNative Rust, .NET (via FFI + source generator)
CLI toolingtealeaf compile, decompile, validate, info, JSON conversion

Why TeaLeaf?

The existing data format landscape presents trade-offs that TeaLeaf attempts to bridge. TeaLeaf does not attempt to replace any of the formats listed below, but rather presents a different perspective that users can objectively compare to identify if it fits their specific use cases.

FormatObservation
JSONVerbose, no comments, no schema
YAMLIndentation-sensitive, error-prone at scale
ProtobufSchema external, binary-only, requires codegen
AvroSchema embedded but not human-readable
CSV/TSVToo simple for nested or typed data
MessagePack/CBORCompact but schemaless

TeaLeaf unifies these concerns:

  • Human-readable text format with explicit types and comments
  • Compact binary with embedded schemas – no external schema files needed
  • Schema-first design – field names defined once, not repeated per record
  • No codegen required – schemas discovered at runtime
  • Built-in JSON conversion for easy integration with existing tools

Primary Use Case: LLM API Data Payloads

TeaLeaf is well-suited for assembling and managing context for large language models – sending business data, analytics, and structured payloads to LLM APIs where token efficiency directly impacts API costs.

Why TeaLeaf for LLM context:

  • ~36% fewer data tokens — verified across Claude Sonnet 4.5 and GPT-5.2 (12 tasks, 10 domains; savings increase with larger datasets)
  • Zero accuracy lossbenchmark scores within noise (0.988 vs 0.978 Anthropic, 0.901 vs 0.899 OpenAI)
  • Binary format for fast cached context retrieval
  • String deduplication (roles, field names, common values stored once)
  • Human-readable text for prompt authoring

Token savings example (retail orders dataset):

FormatCharactersTokens (GPT-5.x)Savings
JSON36,7919,829
TeaLeaf14,5425,63243% fewer tokens

Size Comparison

FormatSmall Object10K Points1K Users
JSON1.00x1.00x1.00x
Protobuf0.38x0.65x0.41x
MessagePack0.35x0.63x0.38x
TeaLeaf Text1.38x0.87x0.63x
TeaLeaf Compressed3.56x0.15x0.47x

TeaLeaf has 64-byte header overhead (not ideal for tiny objects). For large arrays with compression, TeaLeaf achieves 6-7x better compression than JSON.

Trade-off: TeaLeaf decode is ~2-5x slower than Protobuf due to dynamic key-based access. Choose TeaLeaf when size matters more than decode speed.

Project Structure

tealeaf/
├── tealeaf-core/       # Rust core: parser, compiler, reader, CLI
├── tealeaf-derive/     # Rust proc-macro: #[derive(ToTeaLeaf, FromTeaLeaf)]
├── tealeaf-ffi/        # C-compatible FFI layer
├── bindings/
│   └── dotnet/         # .NET bindings + source generator
├── canonical/          # Canonical test fixtures
├── spec/               # Format specification
└── examples/           # Example files and workflows

License

TeaLeaf is licensed under the MIT License.

Source code: github.com/krishjag/tealeaf