TeaLeaf Data Format
A schema-aware data format with human-readable text and compact binary representation.
~36% fewer data tokens than JSON for LLM applications, with zero accuracy loss.
v2.0.0-beta.8
What is TeaLeaf?
TeaLeaf is a data format that bridges the gap between human-readable configuration and machine-efficient binary storage. A single .tl source file can be read and edited by humans, compiled to a compact .tlbx binary, and converted to/from JSON – all with schemas inline.
TeaLeaf – schemas with nested structures, compact positional data:
# Schema: define structure once
@struct Location (city, country)
@struct Department (name, location: Location)
@struct Employee (
id: int,
name,
role,
department: Department,
skills: []string,
)
# Data: field names not repeated
employees: @table Employee [
(1, "Alice", "Engineer",
("Platform", ("Seattle", "USA")),
["rust", "python"])
(2, "Bob", "Designer",
("Product", ("Austin", "USA")),
["figma", "css"])
(3, "Carol", "Manager",
("Platform", ("Seattle", "USA")),
["leadership", "agile"])
]
JSON – no schema, names repeated:
{
"employees": [
{
"id": 1,
"name": "Alice",
"role": "Engineer",
"department": {
"name": "Platform",
"location": { "city": "Seattle", "country": "USA" }
},
"skills": ["rust", "python"]
},
{
"id": 2,
"name": "Bob",
"role": "Designer",
"department": {
"name": "Product",
"location": { "city": "Austin", "country": "USA" }
},
"skills": ["figma", "css"]
},
{
"id": 3,
"name": "Carol",
"role": "Manager",
"department": {
"name": "Platform",
"location": { "city": "Seattle", "country": "USA" }
},
"skills": ["leadership", "agile"]
}
]
}
Key Features
| Feature | Description |
|---|---|
| Dual format | Human-readable text (.tl) and compact binary (.tlbx) |
| Inline schemas | @struct definitions live alongside data – no external .proto files |
| JSON interop | Bidirectional conversion with automatic schema inference |
| String deduplication | Binary format stores each unique string once |
| Compression | Per-section ZLIB compression with null bitmaps |
| Comments | # line comments in the text format |
| Language bindings | Native Rust, .NET (via FFI + source generator) |
| CLI tooling | tealeaf compile, decompile, validate, info, JSON conversion |
Why TeaLeaf?
The existing data format landscape presents trade-offs that TeaLeaf attempts to bridge. TeaLeaf does not attempt to replace any of the formats listed below, but rather presents a different perspective that users can objectively compare to identify if it fits their specific use cases.
| Format | Observation |
|---|---|
| JSON | Verbose, no comments, no schema |
| YAML | Indentation-sensitive, error-prone at scale |
| Protobuf | Schema external, binary-only, requires codegen |
| Avro | Schema embedded but not human-readable |
| CSV/TSV | Too simple for nested or typed data |
| MessagePack/CBOR | Compact but schemaless |
TeaLeaf unifies these concerns:
- Human-readable text format with explicit types and comments
- Compact binary with embedded schemas – no external schema files needed
- Schema-first design – field names defined once, not repeated per record
- No codegen required – schemas discovered at runtime
- Built-in JSON conversion for easy integration with existing tools
Primary Use Case: LLM API Data Payloads
TeaLeaf is well-suited for assembling and managing context for large language models – sending business data, analytics, and structured payloads to LLM APIs where token efficiency directly impacts API costs.
Why TeaLeaf for LLM context:
- ~36% fewer data tokens — verified across Claude Sonnet 4.5 and GPT-5.2 (12 tasks, 10 domains; savings increase with larger datasets)
- Zero accuracy loss — benchmark scores within noise (0.988 vs 0.978 Anthropic, 0.901 vs 0.899 OpenAI)
- Binary format for fast cached context retrieval
- String deduplication (roles, field names, common values stored once)
- Human-readable text for prompt authoring
Token savings example (retail orders dataset):
| Format | Characters | Tokens (GPT-5.x) | Savings |
|---|---|---|---|
| JSON | 36,791 | 9,829 | — |
| TeaLeaf | 14,542 | 5,632 | 43% fewer tokens |
Size Comparison
| Format | Small Object | 10K Points | 1K Users |
|---|---|---|---|
| JSON | 1.00x | 1.00x | 1.00x |
| Protobuf | 0.38x | 0.65x | 0.41x |
| MessagePack | 0.35x | 0.63x | 0.38x |
| TeaLeaf Text | 1.38x | 0.87x | 0.63x |
| TeaLeaf Compressed | 3.56x | 0.15x | 0.47x |
TeaLeaf has 64-byte header overhead (not ideal for tiny objects). For large arrays with compression, TeaLeaf achieves 6-7x better compression than JSON.
Trade-off: TeaLeaf decode is ~2-5x slower than Protobuf due to dynamic key-based access. Choose TeaLeaf when size matters more than decode speed.
Project Structure
tealeaf/
├── tealeaf-core/ # Rust core: parser, compiler, reader, CLI
├── tealeaf-derive/ # Rust proc-macro: #[derive(ToTeaLeaf, FromTeaLeaf)]
├── tealeaf-ffi/ # C-compatible FFI layer
├── bindings/
│ └── dotnet/ # .NET bindings + source generator
├── canonical/ # Canonical test fixtures
├── spec/ # Format specification
└── examples/ # Example files and workflows
Quick Links
- Getting Started: Installation | Quick Start | Concepts
- Format: Text Format | Type System | Binary Format
- CLI: Command Reference
- Rust: Overview | Derive Macros
- .NET: Overview | Source Generator
- FFI: API Reference
- Guides: LLM Context | Performance
License
TeaLeaf is licensed under the MIT License.
Source code: github.com/krishjag/tealeaf