TeaLeaf Data Format
A schema-aware data format with human-readable text and compact binary representation.
~36% fewer data tokens than JSON for LLM applications, with zero accuracy loss.
v2.0.0-beta.8
What is TeaLeaf?
TeaLeaf is a data format that bridges the gap between human-readable configuration and machine-efficient binary storage. A single .tl source file can be read and edited by humans, compiled to a compact .tlbx binary, and converted to/from JSON – all with schemas inline.
TeaLeaf – schemas with nested structures, compact positional data:
# Schema: define structure once
@struct Location (city, country)
@struct Department (name, location: Location)
@struct Employee (
id: int,
name,
role,
department: Department,
skills: []string,
)
# Data: field names not repeated
employees: @table Employee [
(1, "Alice", "Engineer",
("Platform", ("Seattle", "USA")),
["rust", "python"])
(2, "Bob", "Designer",
("Product", ("Austin", "USA")),
["figma", "css"])
(3, "Carol", "Manager",
("Platform", ("Seattle", "USA")),
["leadership", "agile"])
]
JSON – no schema, names repeated:
{
"employees": [
{
"id": 1,
"name": "Alice",
"role": "Engineer",
"department": {
"name": "Platform",
"location": { "city": "Seattle", "country": "USA" }
},
"skills": ["rust", "python"]
},
{
"id": 2,
"name": "Bob",
"role": "Designer",
"department": {
"name": "Product",
"location": { "city": "Austin", "country": "USA" }
},
"skills": ["figma", "css"]
},
{
"id": 3,
"name": "Carol",
"role": "Manager",
"department": {
"name": "Platform",
"location": { "city": "Seattle", "country": "USA" }
},
"skills": ["leadership", "agile"]
}
]
}
Key Features
| Feature | Description |
|---|---|
| Dual format | Human-readable text (.tl) and compact binary (.tlbx) |
| Inline schemas | @struct definitions live alongside data – no external .proto files |
| JSON interop | Bidirectional conversion with automatic schema inference |
| String deduplication | Binary format stores each unique string once |
| Compression | Per-section ZLIB compression with null bitmaps |
| Comments | # line comments in the text format |
| Language bindings | Native Rust, .NET (via FFI + source generator) |
| CLI tooling | tealeaf compile, decompile, validate, info, JSON conversion |
Why TeaLeaf?
The existing data format landscape presents trade-offs that TeaLeaf attempts to bridge. TeaLeaf does not attempt to replace any of the formats listed below, but rather presents a different perspective that users can objectively compare to identify if it fits their specific use cases.
| Format | Observation |
|---|---|
| JSON | Verbose, no comments, no schema |
| YAML | Indentation-sensitive, error-prone at scale |
| Protobuf | Schema external, binary-only, requires codegen |
| Avro | Schema embedded but not human-readable |
| CSV/TSV | Too simple for nested or typed data |
| MessagePack/CBOR | Compact but schemaless |
TeaLeaf unifies these concerns:
- Human-readable text format with explicit types and comments
- Compact binary with embedded schemas – no external schema files needed
- Schema-first design – field names defined once, not repeated per record
- No codegen required – schemas discovered at runtime
- Built-in JSON conversion for easy integration with existing tools
Primary Use Case: LLM API Data Payloads
TeaLeaf is well-suited for assembling and managing context for large language models – sending business data, analytics, and structured payloads to LLM APIs where token efficiency directly impacts API costs.
Why TeaLeaf for LLM context:
- ~36% fewer data tokens — verified across Claude Sonnet 4.5 and GPT-5.2 (12 tasks, 10 domains; savings increase with larger datasets)
- Zero accuracy loss — benchmark scores within noise (0.988 vs 0.978 Anthropic, 0.901 vs 0.899 OpenAI)
- Binary format for fast cached context retrieval
- String deduplication (roles, field names, common values stored once)
- Human-readable text for prompt authoring
Token savings example (retail orders dataset):
| Format | Characters | Tokens (GPT-5.x) | Savings |
|---|---|---|---|
| JSON | 36,791 | 9,829 | — |
| TeaLeaf | 14,542 | 5,632 | 43% fewer tokens |
Size Comparison
| Format | Small Object | 10K Points | 1K Users |
|---|---|---|---|
| JSON | 1.00x | 1.00x | 1.00x |
| Protobuf | 0.38x | 0.65x | 0.41x |
| MessagePack | 0.35x | 0.63x | 0.38x |
| TeaLeaf Text | 1.38x | 0.87x | 0.63x |
| TeaLeaf Compressed | 3.56x | 0.15x | 0.47x |
TeaLeaf has 64-byte header overhead (not ideal for tiny objects). For large arrays with compression, TeaLeaf achieves 6-7x better compression than JSON.
Trade-off: TeaLeaf decode is ~2-5x slower than Protobuf due to dynamic key-based access. Choose TeaLeaf when size matters more than decode speed.
Project Structure
tealeaf/
├── tealeaf-core/ # Rust core: parser, compiler, reader, CLI
├── tealeaf-derive/ # Rust proc-macro: #[derive(ToTeaLeaf, FromTeaLeaf)]
├── tealeaf-ffi/ # C-compatible FFI layer
├── bindings/
│ └── dotnet/ # .NET bindings + source generator
├── canonical/ # Canonical test fixtures
├── spec/ # Format specification
└── examples/ # Example files and workflows
Quick Links
- Getting Started: Installation | Quick Start | Concepts
- Format: Text Format | Type System | Binary Format
- CLI: Command Reference
- Rust: Overview | Derive Macros
- .NET: Overview | Source Generator
- FFI: API Reference
- Guides: LLM Context | Performance
License
TeaLeaf is licensed under the MIT License.
Source code: github.com/krishjag/tealeaf
Installation
Pre-built Binaries
Download the latest release from GitHub Releases.
| Platform | Architecture | Download |
|---|---|---|
| Windows | x64 | tealeaf-windows-x64.zip |
| Windows | ARM64 | tealeaf-windows-arm64.zip |
| Linux | x64 (glibc) | tealeaf-linux-x64.tar.gz |
| Linux | ARM64 (glibc) | tealeaf-linux-arm64.tar.gz |
| Linux | x64 (musl) | tealeaf-linux-musl-x64.tar.gz |
| macOS | x64 (Intel) | tealeaf-macos-x64.tar.gz |
| macOS | ARM64 (Apple Silicon) | tealeaf-macos-arm64.tar.gz |
Quick Install
Windows (PowerShell)
# Download and extract to current directory
Invoke-WebRequest -Uri "https://github.com/krishjag/tealeaf/releases/latest/download/tealeaf-windows-x64.zip" -OutFile tealeaf.zip
Expand-Archive tealeaf.zip -DestinationPath .
# Optional: add to PATH
$env:PATH += ";$PWD"
Linux / macOS
# Download and extract (replace with your platform)
curl -LO https://github.com/krishjag/tealeaf/releases/latest/download/tealeaf-linux-x64.tar.gz
tar -xzf tealeaf-linux-x64.tar.gz
# Optional: move to PATH
sudo mv tealeaf /usr/local/bin/
Build from Source
Requires the Rust toolchain (1.70+).
git clone https://github.com/krishjag/tealeaf.git
cd tealeaf
cargo build --release --package tealeaf-core
The binary will be at target/release/tealeaf (or tealeaf.exe on Windows).
Verify Installation
tealeaf --version
# tealeaf 2.0.0-beta.8
tealeaf help
Rust Crate
Add tealeaf-core to your Cargo.toml:
[dependencies]
tealeaf-core = { version = "2.0.0-beta.8", features = ["derive"] }
The derive feature enables #[derive(ToTeaLeaf, FromTeaLeaf)] macros.
.NET NuGet Package
dotnet add package TeaLeaf
The NuGet package includes everything needed:
TeaLeaf.Annotations–[TeaLeaf],[TLSkip], and other attributesTeaLeaf.Generators– C# incremental source generator (bundled as an analyzer)- Native libraries for all supported platforms (Windows, Linux, macOS – x64 and ARM64)
No additional packages required. [TeaLeaf] classes get compile-time serialization methods automatically.
Note: The .NET package requires .NET 8.0 or later. The source generator requires a C# compiler with incremental generator support.
Quick Start
This guide walks through the core TeaLeaf workflow: write text, compile to binary, and convert to/from JSON.
1. Write a TeaLeaf File
Create example.tl:
# Define schemas
@struct address (street: string, city: string, zip: string)
@struct user (
id: int,
name: string,
email: string?,
address: address,
active: bool,
)
# Data uses schemas -- field names defined once, not repeated
users: @table user [
(1, "Alice", "alice@example.com", ("123 Main St", "Seattle", "98101"), true),
(2, "Bob", ~, ("456 Oak Ave", "Austin", "78701"), false),
]
# Plain key-value pairs
app_version: "2.0.0-beta.2"
debug: false
2. Validate
Check that the file is syntactically correct:
tealeaf validate example.tl
3. Compile to Binary
Compile to the compact binary format:
tealeaf compile example.tl -o example.tlbx
4. Inspect
View information about either format:
tealeaf info example.tl
tealeaf info example.tlbx
5. Convert to JSON
# Text to JSON
tealeaf to-json example.tl -o example.json
# Binary to JSON
tealeaf tlbx-to-json example.tlbx -o example_from_binary.json
6. Convert from JSON
# JSON to TeaLeaf text (with automatic schema inference)
tealeaf from-json example.json -o reconstructed.tl
# JSON to TeaLeaf binary
tealeaf json-to-tlbx example.json -o direct.tlbx
7. Decompile
Convert binary back to text:
tealeaf decompile example.tlbx -o decompiled.tl
Complete Workflow
example.tl ──compile──> example.tlbx ──decompile──> decompiled.tl
│ │
├──to-json──> example.json <──tlbx-to-json──┘
│ │
└──from-json─────┘
Using the Rust API
use tealeaf::TeaLeaf;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Parse text format
let doc = TeaLeaf::load("example.tl")?;
// Access values
if let Some(users) = doc.get("users") {
println!("Users: {:?}", users);
}
// Compile to binary
doc.compile("example.tlbx", true)?;
// Convert to JSON
let json = doc.to_json()?;
println!("{}", json);
Ok(())
}
With Derive Macros
use tealeaf::{TeaLeaf, ToTeaLeaf, FromTeaLeaf, ToTeaLeafExt};
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
id: i32,
name: String,
#[tealeaf(optional)]
email: Option<String>,
active: bool,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let user = User {
id: 1,
name: "Alice".into(),
email: Some("alice@example.com".into()),
active: true,
};
// Serialize to TeaLeaf text
let text = user.to_tl_string("user");
println!("{}", text);
// Compile directly to binary
user.to_tlbx("user", "user.tlbx", false)?;
// Deserialize from binary
let reader = tealeaf::Reader::open("user.tlbx")?;
let loaded = User::from_tealeaf_value(&reader.get("user")?)?;
Ok(())
}
Using the .NET API
Source Generator (Compile-Time)
using TeaLeaf;
using TeaLeaf.Annotations;
[TeaLeaf]
public partial class User
{
public int Id { get; set; }
public string Name { get; set; } = "";
[TLOptional]
public string? Email { get; set; }
public bool Active { get; set; }
}
// Serialize
var user = new User { Id = 1, Name = "Alice", Active = true };
string text = user.ToTeaLeafText();
string json = user.ToTeaLeafJson();
user.CompileToTeaLeaf("user.tlbx");
// Deserialize
using var doc = TLDocument.ParseFile("user.tlbx");
var loaded = User.FromTeaLeaf(doc);
Reflection Serializer (Runtime)
using TeaLeaf;
var user = new User { Id = 1, Name = "Alice", Active = true };
// Serialize
string docText = TeaLeafSerializer.ToDocument(user);
TeaLeafSerializer.Compile(user, "user.tlbx");
// Deserialize
var loaded = TeaLeafSerializer.FromText<User>(docText);
Next Steps
- Core Concepts – understand schemas, types, and the text format
- CLI Reference – all available commands
- Rust Guide – Rust API in depth
- .NET Guide – .NET bindings in depth
Core Concepts
This page introduces the fundamental ideas behind TeaLeaf.
Dual Format
TeaLeaf has two representations of the same data:
Text (.tl) | Binary (.tlbx) | |
|---|---|---|
| Purpose | Authoring, version control, review | Storage, transmission, deployment |
| Human-readable | Yes | No |
| Comments | Yes (#) | Stripped during compilation |
| Schemas | Inline @struct definitions | Embedded in schema table |
| Size | Larger (field names in data) | Compact (positional, deduplicated) |
| Speed | Slower to parse | Fast random-access via memory mapping |
The .tl file is the source of truth. Binary files are compiled artifacts – regenerate them when the source changes.
Schemas
Schemas define the structure of your data using @struct:
@struct point (x: int, y: int)
@struct line (start: point, end: point, color: string?)
Key properties:
- Inline – schemas live in the same file as data
- Positional – binary encoding uses field order, not names
- Nestable – structs can reference other structs
- Nullable – fields marked with
?accept null (~)
Schemas enable @table for compact tabular data:
points: @table point [
(0, 0),
(100, 200),
(-50, 75),
]
Without schemas, the same data would require repeating field names:
# Without schemas -- verbose
points: [
{x: 0, y: 0},
{x: 100, y: 200},
{x: -50, y: 75},
]
Type System
TeaLeaf has a rich type system with primitives, containers, and modifiers.
Primitives
| Type | Description | Example |
|---|---|---|
bool | Boolean | true, false |
int / int32 | 32-bit signed integer | 42, -17 |
int64 | 64-bit signed integer | 9999999999 |
uint / uint32 | 32-bit unsigned integer | 255 |
float / float64 | 64-bit float | 3.14, 6.022e23 |
string | UTF-8 text | "hello", alice |
bytes | Raw binary data | b"cafef00d" |
timestamp | ISO 8601 date/time | 2024-01-15T10:30:00Z |
Containers
| Syntax | Description |
|---|---|
[]T | Array of type T |
T? | Nullable type T |
@map { ... } | Ordered key-value map |
{ key: value } | Untyped object |
Null
The tilde ~ represents null:
optional_field: ~
Key-Value Documents
A TeaLeaf document is a collection of named key-value sections:
# Each top-level entry is a "section" in the binary format
config: {host: localhost, port: 8080}
users: @table user [(1, alice), (2, bob)]
version: "2.0.0-beta.2"
Keys become section names in the binary file. You access values by key at runtime.
References
References allow data reuse and graph structures:
# Define a reference
!seattle: {city: "Seattle", state: "WA"}
# Use it in multiple places
office: !seattle
warehouse: !seattle
Tagged Values
Tags add a discriminator label to values, enabling sum types:
events: [
:click {x: 100, y: 200},
:scroll {delta: -50},
:keypress {key: "Enter"},
]
Unions
Named discriminated unions:
@union shape {
circle (radius: float),
rectangle (width: float, height: float),
point (),
}
shapes: [:circle (5.0), :rectangle (10.0, 20.0), :point ()]
Union definitions are preserved through binary compilation and decompilation, including variant names, field names, and field types.
Compilation Pipeline
.tl (text)
│
├── parse ──> in-memory document (TeaLeaf / TLDocument)
│ │
│ ├── compile ──> .tlbx (binary)
│ ├── to_json ──> .json
│ └── to_tl_text ──> .tl (round-trip)
│
.tlbx (binary)
│
├── reader ──> random-access values (zero-copy with mmap)
│ │
│ ├── decompile ──> .tl
│ └── to_json ──> .json
│
.json
│
└── from_json ──> in-memory document
│
└── (with schema inference for arrays)
File Includes
Split large files into modules:
@include "schemas/common.tl"
@include "./data/users.tl"
Paths are resolved relative to the including file.
Next Steps
- Text Format – complete syntax reference
- Type System – all types and modifiers in detail
- Schemas – schema definitions, tables, and nesting
Text Format
The TeaLeaf text format (.tl) is the human-readable representation. This page is the complete syntax reference.
Comments
Comments begin with # and extend to end of line:
# This is a line comment
name: alice # inline comment
Comments are stripped during compilation to binary.
Strings
Simple (Unquoted)
Bare identifiers that contain no whitespace or special characters:
name: alice
host: localhost
status: active
Valid characters: letters, digits, _, -, .
Quoted
Double-quoted strings with escape sequences:
greeting: "hello world"
path: "C:\\Users\\name"
message: "line1\nline2"
tab_separated: "col1\tcol2"
Escape sequences: \\, \", \n, \t, \r, \b (backspace), \f (form feed), \uXXXX (Unicode code point, 4 hex digits)
Multiline (Triple-Quoted)
Triple-quoted strings with automatic leading whitespace removal:
description: """
This is a multiline string.
Leading whitespace is trimmed based on
the indentation of the first content line.
Useful for documentation blocks.
"""
Numbers
Integers
count: 42
negative: -17
zero: 0
Floats
price: 3.14
scientific: 6.022e23
negative_exp: 1.5e-10
Numbers with exponent notation but no decimal point (e.g., 1e3) are parsed as floats.
Hexadecimal
color: 0xFF5500
mask: 0x00A1
Binary Literals
flags: 0b1010
byte_val: 0b11110000
Both lowercase (0x, 0b) and uppercase (0X, 0B) prefixes are accepted.
Negative hex and binary literals are supported: -0xFF, -0b1010.
Bytes Literals
payload: b"cafef00d"
empty: b""
checksum: b"CAFE"
Hex digits only (uppercase or lowercase), even length, no spaces.
Special Float Values
not_a_number: NaN
positive_infinity: inf
negative_infinity: -inf
These keywords represent IEEE 754 special values. In JSON export, NaN and infinity values are converted to null.
Boolean and Null
enabled: true
disabled: false
missing: ~
The tilde (~) is the null literal.
Timestamps
ISO 8601 formatted date/time values:
# Date only
created: 2024-01-15
# Date and time (UTC)
updated: 2024-01-15T10:30:00Z
# With milliseconds
precise: 2024-01-15T10:30:00.123Z
# With timezone offset
local: 2024-01-15T10:30:00+05:30
Format: YYYY-MM-DD[THH:MM[:SS[.sss]][Z|+HH:MM|-HH:MM]]
Seconds (:SS) are optional and default to 00. Timestamps are stored internally as Unix milliseconds (i64).
Objects
Curly-brace delimited key-value collections:
# Inline
point: {x: 10, y: 20}
# Multi-line
config: {
host: localhost,
port: 8080,
debug: false,
}
Trailing commas are allowed.
Arrays
Square-bracket delimited ordered collections:
numbers: [1, 2, 3, 4, 5]
mixed: [1, "hello", true, ~]
nested: [[1, 2], [3, 4]]
empty: []
Tuples
Parenthesized value lists. Outside of @table, tuples are parsed as plain arrays:
# This is an array [0, 0], NOT a struct
origin: (0, 0)
Inside a @table context, tuples are bound to the table’s schema:
@struct point (x: int, y: int)
points: @table point [
(0, 0), # bound to point schema
(100, 200),
]
Maps
Ordered key-value maps with the @map directive. Unlike objects, maps support non-string keys:
# String keys
headers: @map {
"Content-Type": "application/json",
"Accept": "*/*",
}
# Integer keys
status_codes: @map {
200: "OK",
404: "Not Found",
500: "Internal Server Error",
}
# Mixed value types
config: @map {
name: "myapp",
port: 8080,
debug: true,
}
Maps preserve insertion order and support heterogeneous key types.
References
Define named values and reuse them:
# Define a reference
!node_a: {label: "Start", value: 1}
!node_b: {label: "End", value: 2}
# Use references
edges: [
{from: !node_a, to: !node_b, weight: 1.0},
{from: !node_b, to: !node_a, weight: 0.5},
]
# References can be used multiple times
nodes: [!node_a, !node_b]
References can be defined at the top level or inside objects.
Tagged Values
A colon prefix adds a discriminator tag to any value:
events: [
:click {x: 100, y: 200},
:scroll {delta: -50},
:keypress {key: "Enter"},
]
Tags are useful for discriminated unions and variant types.
Unions
Named discriminated unions with @union:
@union shape {
circle (radius: float),
rectangle (width: float, height: float),
point (),
}
shapes: [
:circle (5.0),
:rectangle (10.0, 20.0),
:point (),
]
Union definitions are encoded in the binary schema table alongside struct definitions, preserving variant names, field names, and field types through compilation and decompilation.
Root Array
The @root-array directive marks the document as representing a top-level JSON array. This is primarily used for JSON round-trip fidelity.
When a root-level JSON array is imported via from-json, TeaLeaf stores each element as a numbered key (0, 1, 2, …) and emits @root-array so that to-json reconstructs the original array structure:
@root-array
0: {id: 1, name: alice}
1: {id: 2, name: bob}
2: {id: 3, name: carol}
Without @root-array, exporting to JSON would produce {"0": {...}, "1": {...}, ...}. With it, the output is [{...}, {...}, ...].
The directive takes no arguments and must appear before any data pairs.
Unknown Directives
Unknown directives (e.g., @custom) at the document top level are silently ignored. If a same-line argument follows the directive (e.g., @custom foo or @custom [1,2,3]), it is consumed and discarded. Arguments on the next line are not consumed — they are parsed as normal statements. This enables forward compatibility: files authored for a newer spec version can be partially parsed by older implementations that do not recognize new directives.
When an unknown directive appears as a value (e.g., key: @unknown [1,2,3]), it is treated as null. The argument expression is consumed but discarded.
File Includes
Import other TeaLeaf files:
@include "schemas/common.tl"
@include "./shared/config.tl"
Paths are resolved relative to the including file. Included schemas are available for @table use in the including file.
Formatting Rules
- Trailing commas are allowed in objects, arrays, tuples, and maps
- Whitespace is flexible – indent as you like
- Key names follow identifier rules: start with letter or
_, then letters, digits,_,-,. - Quoted keys are supported for names with special characters:
"Content-Type": "application/json"
Type System
TeaLeaf has a rich type system covering primitives, containers, and type modifiers.
Primitive Types
| Type | Aliases | Description | Binary Size |
|---|---|---|---|
bool | true/false | 1 byte | |
int8 | Signed 8-bit integer | 1 byte | |
int16 | Signed 16-bit integer | 2 bytes | |
int | int32 | Signed 32-bit integer | 4 bytes |
int64 | Signed 64-bit integer | 8 bytes | |
uint8 | Unsigned 8-bit integer | 1 byte | |
uint16 | Unsigned 16-bit integer | 2 bytes | |
uint | uint32 | Unsigned 32-bit integer | 4 bytes |
uint64 | Unsigned 64-bit integer | 8 bytes | |
float32 | 32-bit IEEE 754 float | 4 bytes | |
float | float64 | 64-bit IEEE 754 float | 8 bytes |
string | UTF-8 text | variable | |
bytes | Raw binary data | variable | |
json_number | Arbitrary-precision numeric string (from JSON) | variable | |
timestamp | Unix milliseconds (i64) + timezone offset (i16) | 10 bytes |
Type Modifiers
field: string # required string
field: string? # nullable string (can be ~)
field: []string # required array of strings
field: []string? # nullable array of strings (the field itself can be ~)
field: []user # array of structs
The ? modifier applies to the field, not array elements. However, the parser does accept ~ (null) values inside arrays, including schema-typed arrays. Null elements are tracked in the null bitmap.
Value Types (Not Schema Types)
The following are value types that appear in data but cannot be declared as field types in @struct:
| Type | Description |
|---|---|
object | Untyped { key: value } collections |
map | Ordered @map { key: value } with any key type |
ref | Reference (!name) to another value |
tagged | Tagged value (:tag value) |
For structured fields, define a named struct and use it as the field type. For tagged values with a known set of variants, define a @union – this provides schema metadata (variant names, field names, field types) that is preserved in the binary format.
Type Widening
When reading binary data, automatic safe conversions apply:
int8→int16→int32→int64uint8→uint16→uint32→uint64float32→float64
Narrowing conversions are not automatic and require recompilation.
Type Inference
Standalone Values
When writing, the smallest representation is selected:
- Integers:
i8if fits, elsei16, elsei32, elsei64 - Unsigned:
u8if fits, elseu16, elseu32, elseu64 - Floats: always
f64at runtime
Homogeneous Arrays
Arrays of uniform type use optimized encoding:
| Array Contents | Encoding Strategy |
|---|---|
Schema-typed objects (matching a @struct) | Struct array encoding with null bitmaps |
Value::Int arrays | Packed Int32 encoding |
Value::String arrays | String table indices (u32) |
| All other arrays (UInt, Float, Bool, mixed, etc.) | Heterogeneous encoding with per-element type tags |
Type Coercion at Compile Time
When compiling schema-bound data, type mismatches use default values rather than erroring:
| Target Type | Mismatch Behavior |
|---|---|
| Numeric fields | Integers/floats coerce; non-numeric becomes 0 |
| String fields | Non-string becomes empty string "" |
| Bytes fields | Non-bytes becomes empty bytes (length 0) |
| Timestamp fields | Non-timestamp becomes epoch (0) |
This “best effort” approach prioritizes successful compilation over strict validation. Validate at the application level before compilation for strict type checking.
Bytes Literal
The text format supports b"..." hex literals for byte data:
payload: b"cafef00d"
empty: b""
checksum: b"CA FE" # ERROR -- no spaces allowed
- Contents are hex digits only (uppercase or lowercase)
- Length must be even (2 hex chars per byte)
dumps()anddecompileemitb"..."forValue::Bytes, enabling full text round-trip- JSON export encodes bytes as
"0xcafef00d"strings; JSON import does not auto-convert back to bytes
Schemas
Schemas are the foundation of TeaLeaf’s compact encoding. They define structure once so data can use positional encoding.
Defining Schemas
Use @struct to define a schema:
@struct point (x: int, y: int)
With multiple fields and types:
@struct user (
id: int,
name: string,
email: string?,
active: bool,
)
Optional Type Annotations
Field types can be omitted – they default to string:
@struct config (host, port: int, debug: bool)
# host defaults to string type
Using Schemas with @table
The @table directive binds an array of tuples to a schema:
@struct user (id: int, name: string, email: string)
users: @table user [
(1, "Alice", "alice@example.com"),
(2, "Bob", "bob@example.com"),
(3, "Carol", "carol@example.com"),
]
Each tuple’s values are matched positionally to the schema fields.
Nested Structs
Structs can reference other structs. Nested tuples inherit schema binding from their parent field type:
@struct address (street: string, city: string, zip: string)
@struct person (
name: string,
home: address,
work: address?,
)
people: @table person [
(
"Alice Smith",
("123 Main St", "Berlin", "10115"), # Parsed as address
("456 Office Blvd", "Berlin", "10117"), # Parsed as address
),
]
Deep Nesting
Schemas can nest arbitrarily deep:
@struct method (type: string, last_four: string)
@struct payment (amount: float, method: method)
@struct order (id: int, customer: string, payment: payment)
orders: @table order [
(1, "Alice", (99.99, ("credit", "4242"))),
(2, "Bob", (49.50, ("debit", "1234"))),
]
Array Fields
Schema fields can be arrays of primitives or other structs:
@struct employee (
id: int,
name: string,
skills: []string,
scores: []int,
)
employees: @table employee [
(1, "Alice", ["rust", "python"], [95, 88]),
(2, "Bob", ["java"], [72]),
]
Nullable Fields
The ? modifier makes a field nullable:
@struct user (
id: int,
name: string,
email: string?, # can be ~
phone: string?, # can be ~
)
users: @table user [
(1, "Alice", "alice@example.com", "+1-555-0100"),
(2, "Bob", ~, ~), # email and phone are null
]
Binary Encoding Benefits
Schemas enable significant binary compression:
- Positional storage – field names stored once in the schema table, not per row
- Null bitmaps – one bit per nullable field per row, instead of full null markers
- Type-homogeneous arrays – packed encoding when all elements match a schema
- String deduplication – repeated values like city names stored once in the string table
Example Size Savings
For 1,000 user records with 5 fields:
| Approach | Approximate Size |
|---|---|
| JSON (field names repeated) | ~80KB |
| TeaLeaf text (schema + tuples) | ~35KB |
| TeaLeaf binary (compressed) | ~15KB |
Schema Compatibility
Compatible Changes
| Change | Notes |
|---|---|
| Rename field | Data is positional; names are documentation only |
| Widen type | int8 → int64, float32 → float64 (automatic) |
Incompatible Changes (Require Recompile)
| Change | Resolution |
|---|---|
| Add field | Recompile source .tl file |
| Remove field | Recompile source .tl file |
| Reorder fields | Recompile source .tl file |
| Narrow type | Recompile source .tl file |
Recompilation Workflow
The .tl file is the master. When schemas change:
tealeaf compile data.tl -o data.tlbx
TeaLeaf prioritizes simplicity over automatic schema evolution:
- No migration machinery – recompile when schemas change
- No version negotiation – the embedded schema is the source of truth
- Explicit over implicit – tuples require values for all fields
Binary Format
The TeaLeaf binary format (.tlbx) is the compact, machine-efficient representation. This page documents the binary layout.
Constants
| Constant | Value |
|---|---|
| Magic | TLBX (4 bytes, ASCII) |
| Version Major | 2 |
| Version Minor | 0 |
| Header Size | 64 bytes |
File Structure
┌──────────────────┐
│ Header (64 B) │
├──────────────────┤
│ String Table │
├──────────────────┤
│ Schema Table │
├──────────────────┤
│ Section Index │
├──────────────────┤
│ Data Sections │
└──────────────────┘
All multi-byte values are little-endian.
Header (64 bytes)
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 4 | Magic | TLBX |
| 4 | 2 | Version Major | 2 |
| 6 | 2 | Version Minor | 0 |
| 8 | 4 | Flags | bit 0: compress (advisory), bit 1: root_array |
| 12 | 4 | Reserved | (unused) |
| 16 | 8 | String Table Offset | u64 LE |
| 24 | 8 | Schema Table Offset | u64 LE |
| 32 | 8 | Index Offset | u64 LE |
| 40 | 8 | Data Offset | u64 LE |
| 48 | 4 | String Count | u32 LE |
| 52 | 4 | Schema Count | u32 LE |
| 56 | 4 | Section Count | u32 LE |
| 60 | 4 | Reserved | (for future checksum; currently 0) |
Flag semantics:
- Bit 0 (COMPRESS): Advisory. Indicates one or more sections use ZLIB (deflate) compression. Compression is determined per-section via the entry flags in the section index. This flag is a hint for tooling only.
- Bit 1 (ROOT_ARRAY): Indicates the source document was a root-level JSON array.
String Table
All unique strings are deduplicated and stored once:
┌─────────────────────────┐
│ Size: u32 │
│ Count: u32 │
├─────────────────────────┤
│ Offsets: [u32 × Count] │
│ Lengths: [u32 × Count] │
├─────────────────────────┤
│ String Data (UTF-8) │
└─────────────────────────┘
Strings are referenced by 32-bit index throughout the file. This provides:
- Deduplication –
"Seattle"stored once, even if used 1,000 times - Fast lookup – O(1) index-based access
- Compact references – 4 bytes per reference instead of the full string
Schema Table
The schema table stores both struct and union definitions:
┌──────────────────────────────────────┐
│ Size: u32 │
│ Struct Count: u16 │
│ Union Count: u16 │
├──────────────────────────────────────┤
│ Struct Offsets: [u32 × struct_count] │
│ Struct Definitions │
├──────────────────────────────────────┤
│ Union Offsets: [u32 × union_count] │
│ Union Definitions │
└──────────────────────────────────────┘
Backward compatibility: The
Union Countfield at offset +6 was previously reserved (always 0). Old readers that ignore this field and only readStruct Countstructs continue to work – they simply skip the union data.
Struct Definition
Schema:
name_idx: u32 (string table index)
field_count: u16
flags: u16 (reserved)
Field (repeated × field_count):
name_idx: u32 (string table index)
type: u8 (TLType code)
flags: u8 (bit 0: nullable, bit 1: is_array)
extra: u16 (type reference -- see below)
Field extra values:
- For
STRUCT(0x22) fields: string table index of the struct type name (0xFFFF= untyped object) - For
TAGGED(0x31) fields: string table index of the union type name (0xFFFF= untyped tagged value) - For all other field types:
0xFFFF
Union Definition
Union:
name_idx: u32 (string table index)
variant_count: u16
flags: u16 (reserved)
Variant (repeated × variant_count):
name_idx: u32 (string table index)
field_count: u16
flags: u16 (reserved)
Field (repeated × field_count):
name_idx: u32 (string table index)
type: u8 (TLType code)
flags: u8 (bit 0: nullable, bit 1: is_array)
extra: u16 (same semantics as struct field extra)
Each union variant uses the same 8-byte field entry format as struct fields.
Type Codes
0x00 NULL 0x0A FLOAT32 0x20 ARRAY 0x30 REF
0x01 BOOL 0x0B FLOAT64 0x21 OBJECT 0x31 TAGGED
0x02 INT8 0x10 STRING 0x22 STRUCT 0x32 TIMESTAMP
0x03 INT16 0x11 BYTES 0x23 MAP
0x04 INT32 0x12 JSONNUMBER 0x24 TUPLE (reserved)
0x05 INT64
0x06 UINT8
0x07 UINT16
0x08 UINT32
0x09 UINT64
TUPLE(0x24) is reserved but not currently emitted. Tuples in text are parsed as arrays.
JSONNUMBER(0x12) stores arbitrary-precision numeric strings that exceed the range of i64, u64, or f64. Stored as a string table index, identical to STRING encoding.
Section Index
Maps named sections to data locations:
┌─────────────────────────┐
│ Size: u32 │
│ Count: u32 │
├─────────────────────────┤
│ Entries (32 B each) │
└─────────────────────────┘
Each entry (32 bytes):
| Field | Type | Description |
|---|---|---|
key_idx | u32 | String table index for section name |
offset | u64 | Absolute file offset to data |
size | u32 | Compressed size in bytes |
uncompressed_size | u32 | Original size before compression |
schema_idx | u16 | Schema index (0xFFFF if none) |
type | u8 | TLType code |
flags | u8 | bit 0: compressed, bit 1: is_array |
item_count | u32 | Count for arrays/maps |
reserved | u32 | (future use) |
Data Encoding
Primitives
| Type | Encoding |
|---|---|
| Null | 0 bytes |
| Bool | 1 byte (0x00 or 0x01) |
| Int8/UInt8 | 1 byte |
| Int16/UInt16 | 2 bytes, LE |
| Int32/UInt32 | 4 bytes, LE |
| Int64/UInt64 | 8 bytes, LE |
| Float32 | 4 bytes, IEEE 754 LE |
| Float64 | 8 bytes, IEEE 754 LE |
| String | u32 index into string table |
| Bytes | varint length + raw bytes |
| Timestamp | i64 Unix milliseconds (LE, 8 bytes) + i16 timezone offset in minutes (LE, 2 bytes). Total: 10 bytes |
Varint Encoding
Used for bytes length:
- Continuation bit (
0x80) + 7 value bits - Least-significant group first
Arrays (Top-Level, Homogeneous)
For Value::Int (when all values fit in i32) or Value::String arrays:
Count: u32
Element Type: u8 (Int32 or String)
Elements: [packed data]
All other uniform-type arrays (UInt, Bool, Float, Timestamp, Int64) use heterogeneous encoding.
Arrays (Top-Level, Heterogeneous)
For mixed-type arrays:
Count: u32
Element Type: 0xFF (marker)
Elements: [type: u8, data, type: u8, data, ...]
Arrays (Schema-Typed Fields)
Array fields within @struct use homogeneous encoding for ANY element type:
Count: u32
Element Type: u8 (field's declared type)
Elements: [packed typed values]
Objects
Field Count: u16
Fields: [
key_idx: u32 (string table index)
type: u8 (TLType code)
data: [type-specific]
]
Struct Arrays (Optimal Encoding)
Count: u32
Schema Index: u16
Null Bitmap Size: u16
Rows: [
Null Bitmap: [u8 × bitmap_size]
Values: [non-null field values only]
]
The null bitmap tracks which fields are null:
- Bit
iset = fieldiis null - Only non-null values are stored
- Bitmap size =
ceil((field_count + 7) / 8)
Maps
Count: u32
Entries: [
key_type: u8
key_data: [type-specific]
value_type: u8
value_data: [type-specific]
]
References
name_idx: u32 (string table index for reference name)
Tagged Values
tag_idx: u32 (string table index for tag name)
value_type: u8 (TLType code)
value_data: [type-specific]
Compression
- Algorithm: ZLIB (deflate)
- Threshold: Compress if data > 64 bytes AND compressed < 90% of original
- Granularity: Per-section (each section compressed independently)
- Flag: Bit 0 of entry flags indicates compression
- Decompression: Readers check the flag and decompress transparently
JSON Interoperability
TeaLeaf provides built-in bidirectional JSON conversion for easy integration with existing tools and systems.
JSON to TeaLeaf
CLI
# JSON to TeaLeaf text (with automatic schema inference)
tealeaf from-json input.json -o output.tl
# JSON to TeaLeaf binary
tealeaf json-to-tlbx input.json -o output.tlbx
Rust API
#![allow(unused)]
fn main() {
let doc = TeaLeaf::from_json(json_string)?;
// With automatic schema inference for arrays
let doc = TeaLeaf::from_json_with_schemas(json_string)?;
}
.NET API
using var doc = TLDocument.FromJson(jsonString);
Type Mappings (JSON → TeaLeaf)
| JSON Type | TeaLeaf Type |
|---|---|
null | Null |
true / false | Bool |
| number (integer) | Int (or UInt if > i64::MAX) |
| number (decimal, finite f64) | Float |
| number (exceeds i64/u64/f64) | JsonNumber |
| string | String |
| array | Array |
| object | Object |
Limitations
JSON import is “plain JSON only” – it does not recognize the special JSON forms used for TeaLeaf export:
| JSON Form | Result |
|---|---|
{"$ref": "name"} | Plain Object (not a Ref) |
{"$tag": "...", "$value": ...} | Plain Object (not a Tagged) |
[[key, value], ...] | Plain Array (not a Map) |
| ISO 8601 strings | Plain String (not a Timestamp) |
For full round-trip fidelity with these types, use binary format (.tlbx) or reconstruct programmatically.
TeaLeaf to JSON
CLI
# Text to JSON
tealeaf to-json input.tl -o output.json
# Binary to JSON
tealeaf tlbx-to-json input.tlbx -o output.json
Both commands write to stdout if -o is not specified.
Rust API
#![allow(unused)]
fn main() {
let json = doc.to_json()?; // pretty-printed
let json = doc.to_json_compact()?; // minified
}
.NET API
string json = doc.ToJson(); // pretty-printed
string json = doc.ToJsonCompact(); // minified
Type Mappings (TeaLeaf → JSON)
| TeaLeaf Type | JSON Representation |
|---|---|
| Null | null |
| Bool | true / false |
| Int, UInt | number |
| Float | number |
| JsonNumber | number (parsed back to JSON number) |
| String | string |
| Bytes | string (hex with 0x prefix) |
| Array | array |
| Object | object |
| Map | array of [key, value] pairs |
| Timestamp | string (ISO 8601) |
| Ref | {"$ref": "name"} |
| Tagged | {"$tag": "tagname", "$value": value} |
Schema Inference
When converting JSON to TeaLeaf, the from-json command (and from_json_with_schemas API) can automatically infer schemas from arrays of uniform objects.
How It Works
- Array Detection – identifies arrays of objects with identical field sets
- Name Inference – singularizes parent key names (
"products"→productschema) - Type Inference – determines field types across all array items
- Nullable Detection – fields with any
nullvalues become nullable (string?) - Nested Schemas – creates separate schemas for nested objects within array elements
Example
Input JSON:
{
"customers": [
{
"id": 1,
"name": "Alice",
"billing_address": {"street": "123 Main", "city": "Boston"}
},
{
"id": 2,
"name": "Bob",
"billing_address": {"street": "456 Oak", "city": "Denver"}
}
]
}
Inferred TeaLeaf output:
@struct billing_address (city: string, street: string)
@struct customer (billing_address: billing_address, id: int, name: string)
customers: @table customer [
(("Boston", "123 Main"), 1, "Alice"),
(("Denver", "456 Oak"), 2, "Bob"),
]
Nested Schema Inference
When array elements contain nested objects, TeaLeaf creates schemas for those nested objects if they have uniform structure across all items:
- Nested objects become their own
@structdefinitions - Parent schemas reference nested schemas by name (not
objecttype) - Deeply nested objects are handled recursively
Round-Trip Considerations
| Path | Fidelity |
|---|---|
.tl → .json → .tl | Lossy – schemas, comments, refs, tags, timestamps, maps are simplified |
.tl → .tlbx → .tl | Lossless for data (comments stripped) |
.tl → .tlbx → .json | Same as .tl → .json |
.json → .tl → .json | Generally lossless for JSON-native types |
.json → .tlbx → .json | Generally lossless for JSON-native types |
For types that don’t round-trip through JSON (Ref, Tagged, Map, Timestamp, Bytes), use the binary format for lossless storage.
Grammar
The formal grammar for the TeaLeaf text format in EBNF notation.
EBNF Grammar
document = { directive | pair | ref_def } ;
directive = struct_def | union_def | include | root_array ;
struct_def = "@struct" name "(" fields ")" ;
union_def = "@union" name "{" variants "}" ;
include = "@include" string ;
root_array = "@root-array" ;
variants = variant { "," variant } ;
variant = name "(" [ fields ] ")" ;
fields = field { "," field } ;
field = name [ ":" type ] ; (* type defaults to string if omitted *)
type = [ "[]" ] base_type [ "?" ] ;
base_type = "bool" | "int" | "int8" | "int16" | "int32" | "int64"
| "uint" | "uint8" | "uint16" | "uint32" | "uint64"
| "float" | "float32" | "float64" | "string" | "bytes"
| "timestamp" | name ;
pair = key ":" value ;
key = name | string ;
value = primitive | object | array | tuple | table | map
| tagged | ref | timestamp ;
primitive = string | bytes_lit | number | bool | "~" ;
bytes_lit = "b\"" { hexdigit hexdigit } "\"" ;
object = "{" [ ( pair | ref_def ) { "," ( pair | ref_def ) } ] "}" ;
array = "[" [ value { "," value } ] "]" ;
tuple = "(" [ value { "," value } ] ")" ;
table = "@table" name array ;
map = "@map" "{" [ map_entry { "," map_entry } ] "}" ;
map_entry = map_key ":" value ;
map_key = string | name | integer ;
tagged = ":" name value ;
ref = "!" name ;
ref_def = "!" name ":" value ;
timestamp = date [ "T" time [ timezone ] ] ;
date = digit{4} "-" digit{2} "-" digit{2} ;
time = digit{2} ":" digit{2} [ ":" digit{2} [ "." digit{1,3} ] ] ;
timezone = "Z" | ( "+" | "-" ) digit{2} [ ":" digit{2} | digit{2} ] ;
string = name | '"' chars '"' | '"""' multiline '"""' ;
number = integer | float | hex | binary ;
integer = [ "-" ] digit+ ;
float = [ "-" ] digit+ "." digit+ [ ("e"|"E") ["+"|"-"] digit+ ]
| [ "-" ] digit+ ("e"|"E") ["+"|"-"] digit+
| "NaN" | "inf" | "-inf" ;
hex = [ "-" ] ("0x" | "0X") hexdigit+ ;
binary = [ "-" ] ("0b" | "0B") ("0"|"1")+ ;
bool = "true" | "false" ;
name = (letter | "_") { letter | digit | "_" | "-" | "." } ;
comment = "#" { any } newline ;
chars = { any_char | escape } ;
escape = "\\" | "\\\"" | "\\n" | "\\t" | "\\r" | "\\b" | "\\f"
| "\\u" hexdigit hexdigit hexdigit hexdigit ;
Production Notes
Document Structure
A document is a sequence of:
- Directives –
@struct,@union,@include,@root-array(processed before data) - Pairs –
key: value(the actual data) - Reference definitions –
!name: value(reusable named values)
Key Rules
- Keys can be bare identifiers (
name) or quoted strings ("Content-Type") - Trailing commas are allowed in all list contexts (arrays, objects, tuples, maps, fields)
- Comments (
#to end of line) can appear anywhere whitespace is valid - Whitespace is insignificant except inside strings
Type Defaults
When a field type is omitted in a @struct, it defaults to string:
@struct config (host, port: int, debug: bool)
# "host" is implicitly string
Tuple Semantics
Standalone tuples are parsed as arrays. Only within a @table context do tuples acquire schema binding:
# This is an array [1, 2, 3]
plain: (1, 2, 3)
# These are schema-bound tuples
@struct point (x: int, y: int)
points: @table point [(0, 0), (1, 1)]
Root Array Directive
The @root-array directive marks the document as representing a root-level JSON array rather than a JSON object. This is used for JSON round-trip fidelity – when a JSON array is imported via from-json, the directive is emitted so that to-json produces an array at the top level instead of an object:
@root-array
0: {id: 1, name: alice}
1: {id: 2, name: bob}
Without @root-array, the JSON output would be {"0": {...}, "1": {...}}. With it, the output is [{...}, {...}].
Map Key Restrictions
Map keys are restricted to hashable types: strings, names, and integers. Complex values (objects, arrays) cannot be map keys.
Reference Scoping
References can be defined at:
- Top level –
!name: valuealongside pairs - Inside objects –
{!ref: value, field: !ref}
References are resolved within the document scope.
CLI Overview
The tealeaf command-line tool provides all operations for working with TeaLeaf files.
Usage
tealeaf <command> [options]
Commands
| Command | Description |
|---|---|
compile | Compile text (.tl) to binary (.tlbx) |
decompile | Decompile binary (.tlbx) to text (.tl) |
info | Show file information (auto-detects format) |
validate | Validate text format syntax |
to-json | Convert TeaLeaf text to JSON |
from-json | Convert JSON to TeaLeaf text |
tlbx-to-json | Convert TeaLeaf binary to JSON |
json-to-tlbx | Convert JSON to TeaLeaf binary |
help | Show help text |
Global Options
tealeaf help # Show usage
tealeaf -h # Show usage
tealeaf --help # Show usage
Exit Codes
| Code | Meaning |
|---|---|
0 | Success |
1 | Error (parse error, I/O error, invalid arguments) |
Error messages are written to stderr. Data output goes to stdout (when no -o flag is specified).
Quick Examples
# Full workflow
tealeaf validate data.tl
tealeaf compile data.tl -o data.tlbx
tealeaf info data.tlbx
tealeaf to-json data.tl -o data.json
tealeaf decompile data.tlbx -o recovered.tl
# JSON conversion
tealeaf from-json api_response.json -o structured.tl
tealeaf json-to-tlbx api_response.json -o compact.tlbx
tealeaf tlbx-to-json compact.tlbx -o exported.json
compile
Compile a TeaLeaf text file (.tl) to the compact binary format (.tlbx).
Usage
tealeaf compile <input.tl> -o <output.tlbx>
Arguments
| Argument | Required | Description |
|---|---|---|
<input.tl> | Yes | Path to the TeaLeaf text file |
-o <output.tlbx> | Yes | Path for the output binary file |
Description
The compile command:
- Parses the text file (including any
@includedirectives) - Builds the string table (deduplicates all strings)
- Encodes schemas into the schema table
- Encodes each top-level key-value pair as a data section
- Applies per-section ZLIB compression (enabled by default)
- Writes the binary file with the 64-byte header
Compression is applied to sections larger than 64 bytes where the compressed size is less than 90% of the original.
Examples
# Basic compilation
tealeaf compile config.tl -o config.tlbx
# Compile and inspect
tealeaf compile data.tl -o data.tlbx
tealeaf info data.tlbx
Output
On success, prints:
- Input and output file paths
- Input size, output size, and compression ratio (percentage)
Error Cases
| Error | Cause |
|---|---|
| Parse error | Invalid TeaLeaf syntax in input file |
| I/O error | Input file not found or output path not writable |
| Include error | Referenced @include file not found |
See Also
decompile– reverse operationvalidate– check syntax without compiling- Binary Format – binary layout details
decompile
Convert a TeaLeaf binary file (.tlbx) back to the human-readable text format (.tl).
Usage
tealeaf decompile <input.tlbx> -o <output.tl>
Arguments
| Argument | Required | Description |
|---|---|---|
<input.tlbx> | Yes | Path to the TeaLeaf binary file |
-o <output.tl> | Yes | Path for the output text file |
Description
The decompile command:
- Opens the binary file and reads the header
- Loads the string table and schema table
- Reads the section index
- Decompresses sections as needed
- Reconstructs
@structdefinitions from the schema table - Writes each section as a key-value pair in text format
Notes
- Comments are not preserved – comments from the original
.tlare stripped during compilation - Formatting may differ – the decompiled output uses the default formatting, which may differ from the original source
- Data is lossless – all values, schemas, and structure are preserved
- Bytes are lossless – bytes values are written as
b"..."hex literals, which round-trip correctly
Examples
# Decompile a binary file
tealeaf decompile data.tlbx -o data_recovered.tl
# Round-trip verification
tealeaf compile original.tl -o compiled.tlbx
tealeaf decompile compiled.tlbx -o roundtrip.tl
tealeaf compile roundtrip.tl -o roundtrip.tlbx
# compiled.tlbx and roundtrip.tlbx should be equivalent
See Also
info
Display information about a TeaLeaf file. Auto-detects whether the file is text or binary format.
Usage
tealeaf info <file>
Arguments
| Argument | Required | Description |
|---|---|---|
<file> | Yes | Path to a .tl or .tlbx file |
Description
The info command auto-detects the file format (by checking for the TLBX magic bytes) and displays relevant information.
For Text Files (.tl)
- Number of top-level keys
- Key names
- Number of schema definitions
- Schema details (name, fields, types)
For Binary Files (.tlbx)
- Version information
- File size
- Header details (offsets, counts)
- String table statistics (count, total size)
- Schema table details (names, field counts)
- Section index (key names, sizes, compression ratios)
Examples
# Inspect a text file
tealeaf info config.tl
# Inspect a binary file
tealeaf info data.tlbx
See Also
validate
Validate a TeaLeaf text file for syntactic correctness without compiling it.
Usage
tealeaf validate <file.tl>
Arguments
| Argument | Required | Description |
|---|---|---|
<file.tl> | Yes | Path to the TeaLeaf text file |
Description
The validate command parses the text file and reports any syntax errors. It does not produce any output files.
Validation checks include:
- Lexical analysis (valid tokens, string escaping)
- Structural parsing (matched brackets, valid directives)
- Schema reference validity (
@tablereferences defined@struct) - Include file resolution
- Type syntax in schema definitions
Examples
# Validate a file
tealeaf validate config.tl
# Validate before compiling
tealeaf validate data.tl && tealeaf compile data.tl -o data.tlbx
Exit Codes
| Code | Meaning |
|---|---|
0 | File is valid |
1 | Validation errors found |
On success, prints ✓ Valid along with schema and key counts. On failure, prints ✗ Invalid: <error message> and exits with code 1.
See Also
to-json / from-json
Convert between TeaLeaf text format and JSON.
to-json
Convert a TeaLeaf text file to JSON.
Usage
tealeaf to-json <input.tl> [-o <output.json>]
Arguments
| Argument | Required | Description |
|---|---|---|
<input.tl> | Yes | Path to the TeaLeaf text file |
-o <output.json> | No | Output file path. If omitted, writes to stdout |
Examples
# Write to file
tealeaf to-json data.tl -o data.json
# Write to stdout
tealeaf to-json data.tl
# Pipe to another tool
tealeaf to-json data.tl | jq '.users'
Output Format
The output is pretty-printed JSON. See JSON Interoperability for type mapping details.
from-json
Convert a JSON file to TeaLeaf text format with automatic schema inference.
Usage
tealeaf from-json <input.json> -o <output.tl>
Arguments
| Argument | Required | Description |
|---|---|---|
<input.json> | Yes | Path to the JSON file |
-o <output.tl> | Yes | Path for the output TeaLeaf text file |
Schema Inference
from-json automatically infers schemas from JSON arrays of uniform objects:
- Array Detection – identifies arrays where all elements are objects with identical keys
- Name Inference – singularizes the parent key name (
"users"→userschema) - Type Inference – determines field types across all items
- Nullable Detection – fields with any
nullbecome nullable (string?) - Nested Schemas – creates schemas for nested uniform objects
Examples
# Convert with schema inference
tealeaf from-json api_data.json -o structured.tl
# Full pipeline: JSON → TeaLeaf text → Binary
tealeaf from-json data.json -o data.tl
tealeaf compile data.tl -o data.tlbx
Example: Schema Inference in Action
Input (employees.json):
{
"employees": [
{"id": 1, "name": "Alice", "dept": "Engineering"},
{"id": 2, "name": "Bob", "dept": "Design"}
]
}
Output (employees.tl):
@struct employee (dept: string, id: int, name: string)
employees: @table employee [
("Engineering", 1, "Alice"),
("Design", 2, "Bob"),
]
See Also
tlbx-to-json/json-to-tlbx– binary format JSON conversion- JSON Interoperability – type mappings and round-trip details
tlbx-to-json / json-to-tlbx
Convert between TeaLeaf binary format and JSON directly, without going through the text format.
tlbx-to-json
Convert a TeaLeaf binary file to JSON.
Usage
tealeaf tlbx-to-json <input.tlbx> [-o <output.json>]
Arguments
| Argument | Required | Description |
|---|---|---|
<input.tlbx> | Yes | Path to the TeaLeaf binary file |
-o <output.json> | No | Output file path. If omitted, writes to stdout |
Examples
# Write to file
tealeaf tlbx-to-json data.tlbx -o data.json
# Write to stdout
tealeaf tlbx-to-json data.tlbx
# Pipe to jq for filtering
tealeaf tlbx-to-json data.tlbx | jq '.config'
Notes
- Produces the same JSON output as
to-jsonon the equivalent text file - Reads the binary directly – no intermediate text conversion
json-to-tlbx
Convert a JSON file directly to TeaLeaf binary format.
Usage
tealeaf json-to-tlbx <input.json> -o <output.tlbx>
Arguments
| Argument | Required | Description |
|---|---|---|
<input.json> | Yes | Path to the JSON file |
-o <output.tlbx> | Yes | Path for the output binary file |
Examples
# Direct JSON to binary
tealeaf json-to-tlbx api_data.json -o compact.tlbx
# Verify the result
tealeaf info compact.tlbx
tealeaf tlbx-to-json compact.tlbx -o verify.json
Notes
- Performs schema inference (same as
from-json) - Compiles directly to binary – no intermediate
.tlfile - Compression is enabled by default
Workflow Comparison
# Two-step (via text)
tealeaf from-json data.json -o data.tl
tealeaf compile data.tl -o data.tlbx
# One-step (direct)
tealeaf json-to-tlbx data.json -o data.tlbx
Both approaches produce equivalent binary output.
See Also
to-json/from-json– text format JSON conversion- JSON Interoperability – type mappings and limitations
Rust Guide: Overview
TeaLeaf is written in Rust. The tealeaf-core crate provides the full API for parsing, compiling, reading, and converting TeaLeaf documents.
Crates
| Crate | Description |
|---|---|
tealeaf-core | Core library: parser, compiler, reader, CLI, JSON conversion |
tealeaf-derive | Proc-macro crate: #[derive(ToTeaLeaf, FromTeaLeaf)] |
tealeaf-ffi | C-compatible FFI layer for language bindings |
Installation
Add to your Cargo.toml:
[dependencies]
tealeaf-core = { version = "2.0.0-beta.8", features = ["derive"] }
The derive feature pulls in tealeaf-derive for proc-macro support.
Core Types
TeaLeaf
The main document type:
#![allow(unused)]
fn main() {
use tealeaf::TeaLeaf;
// Parse from text
let doc = TeaLeaf::parse("name: Alice\nage: 30")?;
// Load from file
let doc = TeaLeaf::load("data.tl")?;
// Load from JSON
let doc = TeaLeaf::from_json(json_str)?;
// With schema inference
let doc = TeaLeaf::from_json_with_schemas(json_str)?;
}
Value
The value enum representing all TeaLeaf types:
#![allow(unused)]
fn main() {
use tealeaf::Value;
pub enum Value {
Null,
Bool(bool),
Int(i64),
UInt(u64),
Float(f64),
String(String),
Bytes(Vec<u8>),
Array(Vec<Value>),
Object(ObjectMap<String, Value>), // IndexMap alias, preserves insertion order
Map(Vec<(Value, Value)>),
Ref(String),
Tagged(String, Box<Value>),
Timestamp(i64, i16), // (unix_millis, tz_offset_minutes)
JsonNumber(String), // arbitrary-precision number (raw JSON decimal string)
}
}
Schema and Field
Schema definitions:
#![allow(unused)]
fn main() {
use tealeaf::{Schema, Field, FieldType};
let schema = Schema {
name: "user".to_string(),
fields: vec![
Field { name: "id".into(), field_type: FieldType { base: "int".into(), nullable: false, is_array: false } },
Field { name: "name".into(), field_type: FieldType { base: "string".into(), nullable: false, is_array: false } },
Field { name: "email".into(), field_type: FieldType { base: "string".into(), nullable: true, is_array: false } },
],
};
}
Accessing Data
#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("data.tl")?;
// Get a value by key
if let Some(Value::String(name)) = doc.get("name") {
println!("Name: {}", name);
}
// Get a schema
if let Some(schema) = doc.schema("user") {
for field in &schema.fields {
println!(" {}: {}", field.name, field.field_type.base);
}
}
}
Output Operations
#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("data.tl")?;
// Compile to binary
doc.compile("data.tlbx", true)?; // true = enable compression
// Convert to JSON
let json = doc.to_json()?; // pretty-printed
let json = doc.to_json_compact()?; // minified
// Convert to TeaLeaf text (with schemas)
let text = doc.to_tl_with_schemas();
}
Conversion Traits
Two traits enable Rust struct ↔ TeaLeaf conversion:
#![allow(unused)]
fn main() {
pub trait ToTeaLeaf {
fn to_tealeaf_value(&self) -> Value;
fn collect_schemas() -> IndexMap<String, Schema>;
fn tealeaf_field_type() -> FieldType;
}
pub trait FromTeaLeaf: Sized {
fn from_tealeaf_value(value: &Value) -> Result<Self, ConvertError>;
}
}
These are typically derived via #[derive(ToTeaLeaf, FromTeaLeaf)] – see Derive Macros.
Extension Trait
ToTeaLeafExt provides convenience methods for any ToTeaLeaf implementor:
#![allow(unused)]
fn main() {
pub trait ToTeaLeafExt: ToTeaLeaf {
fn to_tealeaf_doc(&self, key: &str) -> TeaLeaf;
fn to_tl_string(&self, key: &str) -> String;
fn to_tlbx(&self, key: &str, path: &str, compress: bool) -> Result<()>;
fn to_tealeaf_json(&self, key: &str) -> Result<String>;
}
}
Example:
#![allow(unused)]
fn main() {
let user = User { id: 1, name: "Alice".into(), active: true };
// One-liner serialization
let text = user.to_tl_string("user");
user.to_tlbx("user", "user.tlbx", true)?;
let json = user.to_tealeaf_json("user")?;
}
Next Steps
- Derive Macros –
#[derive(ToTeaLeaf, FromTeaLeaf)] - Attributes Reference – all
#[tealeaf(...)]attributes - Builder API – programmatic document construction
- Schemas & Types – working with schemas in Rust
- Error Handling – error types and patterns
Derive Macros
The tealeaf-derive crate provides two proc-macros for automatic Rust struct ↔ TeaLeaf conversion.
Setup
Enable the derive feature:
[dependencies]
tealeaf-core = { version = "2.0.0-beta.8", features = ["derive"] }
ToTeaLeaf
Converts a Rust struct or enum to a TeaLeaf Value:
#![allow(unused)]
fn main() {
use tealeaf::{ToTeaLeaf, ToTeaLeafExt};
#[derive(ToTeaLeaf)]
struct Config {
host: String,
port: i32,
debug: bool,
}
let config = Config { host: "localhost".into(), port: 8080, debug: true };
// Serialize to TeaLeaf text
let text = config.to_tl_string("config");
// @struct config (host: string, port: int, debug: bool)
// config: (localhost, 8080, true)
// Compile directly to binary
config.to_tlbx("config", "config.tlbx", true)?;
// Convert to JSON
let json = config.to_tealeaf_json("config")?;
// Get as Value
let value = config.to_tealeaf_value();
// Get schemas
let schemas = Config::collect_schemas();
}
FromTeaLeaf
Deserializes a TeaLeaf Value back to a Rust struct:
#![allow(unused)]
fn main() {
use tealeaf::{Reader, FromTeaLeaf};
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Config {
host: String,
port: i32,
debug: bool,
}
let reader = Reader::open("config.tlbx")?;
let value = reader.get("config")?;
let config = Config::from_tealeaf_value(&value)?;
}
Struct Example
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
id: i64,
name: String,
#[tealeaf(optional)]
email: Option<String>,
active: bool,
#[tealeaf(rename = "join_date", type = "timestamp")]
joined: i64,
}
}
This generates:
- Schema:
@struct user (id: int64, name: string, email: string?, active: bool, join_date: timestamp) ToTeaLeaf: serializes to a positional tuple matching the schemaFromTeaLeaf: deserializes from an object or struct-array row
Enum Example
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
enum Shape {
Circle { radius: f64 },
Rectangle { width: f64, height: f64 },
Point,
}
let shapes = vec![
Shape::Circle { radius: 5.0 },
Shape::Rectangle { width: 10.0, height: 20.0 },
Shape::Point,
];
}
Enum variants are serialized as tagged values:
shapes: [:circle {radius: 5.0}, :rectangle {width: 10.0, height: 20.0}, :point ~]
Nested Structs
Structs can reference other ToTeaLeaf/FromTeaLeaf types:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Address {
street: String,
city: String,
zip: String,
}
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Person {
name: String,
home: Address,
#[tealeaf(optional)]
work: Option<Address>,
}
}
The collect_schemas() method automatically collects schemas from nested types.
Collections
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Team {
name: String,
members: Vec<String>, // []string
scores: Vec<i32>, // []int
leads: Vec<Person>, // []person (nested struct array)
}
}
Supported Types
| Rust Type | TeaLeaf Type |
|---|---|
bool | bool |
i8, i16, i32 | int8, int16, int |
i64 | int64 |
u8, u16, u32 | uint8, uint16, uint |
u64 | uint64 |
f32 | float32 |
f64 | float |
String, &str | string |
Vec<u8> | bytes |
Vec<T> | []T |
Option<T> | T? (nullable) |
IndexMap<String, T> | object (order-preserving) |
HashMap<String, T> | object |
| Custom struct (with derive) | named struct reference |
See Also
- Attributes Reference – all
#[tealeaf(...)]attributes - Builder API – manual document construction
Attributes Reference
All attributes use the #[tealeaf(...)] namespace and can be applied to structs, enums, or individual fields.
Container Attributes
Applied to a struct or enum:
rename = "name"
Override the schema name used in TeaLeaf output:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
#[tealeaf(rename = "app_config")]
struct Config {
host: String,
port: i32,
}
// Generates: @struct app_config (host: string, port: int)
}
Without rename, the struct name is converted to snake_case (Config → config).
key = "name"
Override the default document key when serializing:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
#[tealeaf(key = "my_config")]
struct Config { /* ... */ }
}
root_array
Mark a struct as a root-level array element (changes serialization to omit the wrapping key):
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
#[tealeaf(root_array)]
struct LogEntry {
timestamp: i64,
message: String,
}
}
Field Attributes
Applied to individual struct fields:
rename = "name"
Override the field name in the schema:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
#[tealeaf(rename = "user_name")]
name: String,
}
// Generates: @struct user (user_name: string)
}
skip
Exclude a field from serialization/deserialization:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
name: String,
#[tealeaf(skip)]
internal_cache: Option<Vec<u8>>,
}
}
Skipped fields must implement Default for deserialization.
optional
Mark a field as nullable in the schema:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
name: String,
#[tealeaf(optional)]
email: Option<String>, // string?
}
}
Note: Fields of type
Option<T>are automatically detected as optional. The#[tealeaf(optional)]attribute is mainly useful for documentation or when using wrapper types.
type = "tealeaf_type"
Override the TeaLeaf type for a field:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Event {
#[tealeaf(type = "timestamp")]
created_at: i64, // Would normally be int64, but we want timestamp
#[tealeaf(type = "uint64")]
large_count: i64, // Override the default signed type
}
}
Valid type names: bool, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float, float32, float64, string, bytes, timestamp.
flatten
Inline the fields of a nested struct into the parent:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Metadata {
created_by: String,
version: i32,
}
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Document {
title: String,
#[tealeaf(flatten)]
meta: Metadata,
}
// Generates: @struct document (title: string, created_by: string, version: int)
// Instead of: @struct document (title: string, meta: metadata)
}
default
Use Default::default() when deserializing a missing field:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Config {
host: String,
#[tealeaf(default)]
port: i32, // defaults to 0 if missing
}
}
default = "expr"
Use a custom expression for the default value:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Config {
host: String,
#[tealeaf(default = "8080")]
port: i32,
#[tealeaf(default = "true")]
debug: bool,
}
}
Combining Attributes
Multiple attributes can be combined:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Event {
#[tealeaf(rename = "ts", type = "timestamp")]
timestamp: i64,
#[tealeaf(optional, rename = "msg")]
message: Option<String>,
#[tealeaf(skip)]
cached_hash: u64,
#[tealeaf(flatten)]
metadata: EventMeta,
}
}
Attribute Summary Table
| Attribute | Level | Description |
|---|---|---|
rename = "name" | Container or Field | Override schema/field name |
key = "name" | Container | Override document key |
root_array | Container | Serialize as root array element |
skip | Field | Exclude from serialization |
optional | Field | Mark as nullable (T?) |
type = "name" | Field | Override TeaLeaf type |
flatten | Field | Inline nested struct fields |
default | Field | Use Default::default() |
default = "expr" | Field | Use custom default expression |
Builder API
The TeaLeafBuilder provides a fluent API for constructing TeaLeaf documents programmatically.
Basic Usage
#![allow(unused)]
fn main() {
use tealeaf::{TeaLeafBuilder, Value};
let doc = TeaLeafBuilder::new()
.add_value("name", Value::String("Alice".into()))
.add_value("age", Value::Int(30))
.add_value("active", Value::Bool(true))
.build();
// Compile to binary
doc.compile("output.tlbx", true)?;
// Convert to JSON
let json = doc.to_json()?;
}
Methods
new()
Create a new empty builder:
#![allow(unused)]
fn main() {
let builder = TeaLeafBuilder::new();
}
add_value(key, value)
Add a raw Value to the document:
#![allow(unused)]
fn main() {
builder.add_value("count", Value::Int(42))
}
add<T: ToTeaLeaf>(key, dto)
Add a struct that implements ToTeaLeaf. Automatically collects schemas from the type:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
struct Config {
host: String,
port: i32,
}
let config = Config { host: "localhost".into(), port: 8080 };
let doc = TeaLeafBuilder::new()
.add("config", &config)
.build();
}
add_vec<T: ToTeaLeaf>(key, items)
Add an array of ToTeaLeaf items. Automatically collects schemas:
#![allow(unused)]
fn main() {
let users = vec![
User { id: 1, name: "Alice".into() },
User { id: 2, name: "Bob".into() },
];
let doc = TeaLeafBuilder::new()
.add_vec("users", &users)
.build();
}
add_schema(schema)
Manually add a schema definition:
#![allow(unused)]
fn main() {
use tealeaf::{Schema, Field, FieldType};
let schema = Schema {
name: "point".to_string(),
fields: vec![
Field {
name: "x".into(),
field_type: FieldType { base: "int".into(), nullable: false, is_array: false },
},
Field {
name: "y".into(),
field_type: FieldType { base: "int".into(), nullable: false, is_array: false },
},
],
};
let doc = TeaLeafBuilder::new()
.add_schema(schema)
.add_value("origin", Value::Array(vec![Value::Int(0), Value::Int(0)]))
.build();
}
root_array()
Mark the document as a root-level array (rather than a key-value document):
#![allow(unused)]
fn main() {
let doc = TeaLeafBuilder::new()
.root_array()
.add_value("items", Value::Array(vec![
Value::Int(1),
Value::Int(2),
Value::Int(3),
]))
.build();
}
build()
Finalize and return the TeaLeaf document:
#![allow(unused)]
fn main() {
let doc = builder.build();
}
Complete Example
use tealeaf::{TeaLeafBuilder, ToTeaLeaf, FromTeaLeaf, Value};
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Address {
street: String,
city: String,
}
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Employee {
id: i64,
name: String,
address: Address,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let employees = vec![
Employee {
id: 1,
name: "Alice".into(),
address: Address { street: "123 Main".into(), city: "Seattle".into() },
},
Employee {
id: 2,
name: "Bob".into(),
address: Address { street: "456 Oak".into(), city: "Austin".into() },
},
];
let doc = TeaLeafBuilder::new()
.add_value("company", Value::String("Acme Corp".into()))
.add_vec("employees", &employees)
.add_value("version", Value::Int(1))
.build();
// Output
doc.compile("company.tlbx", true)?;
println!("{}", doc.to_tl_with_schemas());
println!("{}", doc.to_json()?);
Ok(())
}
Schemas & Types
Working with schemas and the type system in Rust.
Schema Structure
#![allow(unused)]
fn main() {
pub struct Schema {
pub name: String,
pub fields: Vec<Field>,
}
pub struct Field {
pub name: String,
pub field_type: FieldType,
}
pub struct FieldType {
pub base: String, // "int", "string", "user", etc.
pub nullable: bool, // field: T?
pub is_array: bool, // field: []T
}
}
Creating Schemas Manually
#![allow(unused)]
fn main() {
use tealeaf::{Schema, Field, FieldType};
let user_schema = Schema {
name: "user".to_string(),
fields: vec![
Field {
name: "id".into(),
field_type: FieldType { base: "int".into(), nullable: false, is_array: false },
},
Field {
name: "name".into(),
field_type: FieldType { base: "string".into(), nullable: false, is_array: false },
},
Field {
name: "tags".into(),
field_type: FieldType { base: "string".into(), nullable: false, is_array: true },
},
Field {
name: "email".into(),
field_type: FieldType { base: "string".into(), nullable: true, is_array: false },
},
],
};
}
Collecting Schemas from Derive
When using #[derive(ToTeaLeaf)], schemas are collected automatically:
#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
struct Address { street: String, city: String }
#[derive(ToTeaLeaf)]
struct User { name: String, home: Address }
// Collects schemas for both `user` and `address`
let schemas = User::collect_schemas();
assert!(schemas.contains_key("user"));
assert!(schemas.contains_key("address"));
}
Accessing Schemas from Documents
#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("data.tl")?;
// Get a specific schema
if let Some(schema) = doc.schema("user") {
println!("Schema: {} ({} fields)", schema.name, schema.fields.len());
for field in &schema.fields {
let nullable = if field.field_type.nullable { "?" } else { "" };
let array = if field.field_type.is_array { "[]" } else { "" };
println!(" {}: {}{}{}", field.name, array, field.field_type.base, nullable);
}
}
// Iterate all schemas
for (name, schema) in &doc.schemas {
println!("{}: {} fields", name, schema.fields.len());
}
}
Accessing Schemas from Binary Reader
Schemas are embedded in the binary format. Parse a key’s value and inspect the document schemas:
#![allow(unused)]
fn main() {
use tealeaf::Reader;
let reader = Reader::open("data.tlbx")?;
// List available keys
for key in reader.keys() {
let value = reader.get(key)?;
println!("{}: {:?}", key, value);
}
}
For full schema introspection, decompile the binary back to a TeaLeaf document and access doc.schemas.
Value Type System
The Value enum maps to TeaLeaf types:
| Variant | TeaLeaf Type | Notes |
|---|---|---|
Value::Null | null | ~ in text |
Value::Bool(b) | bool | |
Value::Int(i) | int/int8/int16/int32/int64 | Size chosen by inference |
Value::UInt(u) | uint/uint8/uint16/uint32/uint64 | Size chosen by inference |
Value::Float(f) | float/float64 | Always f64 at runtime |
Value::String(s) | string | |
Value::Bytes(b) | bytes | |
Value::Array(v) | array | Heterogeneous or typed |
Value::Object(m) | object | String-keyed map |
Value::Map(pairs) | map | Ordered, any key type |
Value::Ref(name) | ref | !name reference |
Value::Tagged(tag, val) | tagged | :tag value |
Value::Timestamp(ms, tz) | timestamp | Unix milliseconds + timezone offset (minutes) |
Value::JsonNumber(s) | json-number | Arbitrary-precision number (raw JSON decimal string) |
Type Inference at Write Time
When compiling, the writer selects the smallest encoding:
#![allow(unused)]
fn main() {
// Value::Int(42) → int8 in binary (fits in i8)
// Value::Int(1000) → int16 (fits in i16)
// Value::Int(100_000) → int32 (fits in i32)
// Value::Int(5_000_000_000) → int64
}
Schema-Typed Data
When data matches a schema (via @table), binary encoding uses:
- Positional storage (no field name repetition)
- Null bitmaps (one bit per nullable field)
- Type-homogeneous arrays (packed encoding for
[]int,[]string, etc.)
Error Handling
TeaLeaf uses the thiserror crate for structured error types.
Error Types
The main error enum:
| Error Variant | Description |
|---|---|
Io | File I/O error (wraps std::io::Error) |
InvalidMagic | Binary file doesn’t start with TLBX magic bytes |
InvalidVersion | Unsupported binary format version |
InvalidType | Unknown type code in binary data |
InvalidUtf8 | String encoding error |
UnexpectedToken | Parse error – expected one token, got another |
UnexpectedEof | Premature end of input |
UnknownStruct | @table references a struct that hasn’t been defined |
MissingField | Required field not provided in data |
ParseError | Generic parse error with message |
ValueOutOfRange | Numeric value exceeds target type range |
Conversion Errors
The ConvertError type is used by FromTeaLeaf:
#![allow(unused)]
fn main() {
pub enum ConvertError {
MissingField { struct_name: String, field: String },
TypeMismatch { expected: String, got: String, path: String },
Nested { path: String, source: Box<ConvertError> },
Custom(String),
}
}
Handling Errors
Parse Errors
#![allow(unused)]
fn main() {
use tealeaf::TeaLeaf;
match TeaLeaf::parse(input) {
Ok(doc) => { /* use doc */ },
Err(e) => {
eprintln!("Parse error: {}", e);
// e.g., "Unexpected token: expected ':', got '}' at line 5"
}
}
}
I/O Errors
#![allow(unused)]
fn main() {
match TeaLeaf::load("nonexistent.tl") {
Ok(doc) => { /* ... */ },
Err(e) => {
// Will be an Io variant wrapping std::io::Error
eprintln!("Could not load file: {}", e);
}
}
}
Binary Format Errors
#![allow(unused)]
fn main() {
use tealeaf::Reader;
match Reader::open("corrupted.tlbx") {
Ok(reader) => { /* ... */ },
Err(e) => {
// Could be InvalidMagic, InvalidVersion, etc.
eprintln!("Binary read error: {}", e);
}
}
}
Conversion Errors
#![allow(unused)]
fn main() {
use tealeaf::{FromTeaLeaf, Value};
let value = Value::String("not a number".into());
match i32::from_tealeaf_value(&value) {
Ok(n) => println!("Got: {}", n),
Err(e) => {
// ConvertError::TypeMismatch { expected: "Int", got: "String" }
eprintln!("Conversion failed: {}", e);
}
}
}
Error Propagation
All errors implement std::error::Error and Display, so they work with ? and anyhow/eyre:
#![allow(unused)]
fn main() {
fn process_file(path: &str) -> Result<(), Box<dyn std::error::Error>> {
let doc = TeaLeaf::load(path)?;
let json = doc.to_json()?;
doc.compile("output.tlbx", true)?;
Ok(())
}
}
Validation Without Errors
For checking validity without consuming the error:
#![allow(unused)]
fn main() {
let is_valid = TeaLeaf::parse(input).is_ok();
}
The CLI validate command uses this pattern to report validity without stopping on errors.
.NET Guide: Overview
TeaLeaf provides .NET bindings through a NuGet package that includes a C# source generator and a reflection-based serializer, both backed by the native Rust library via P/Invoke.
Architecture
┌─────────────────────────────────────────────┐
│ Your .NET Application │
├─────────────────────┬───────────────────────┤
│ Source Generator │ Reflection Serializer│
│ (compile-time) │ (runtime) │
├─────────────────────┴───────────────────────┤
│ TeaLeaf Managed Layer (TLDocument, TLValue)│
├─────────────────────────────────────────────┤
│ P/Invoke (NativeMethods.cs) │
├─────────────────────────────────────────────┤
│ tealeaf_ffi.dll / .so / .dylib (Rust) │
└─────────────────────────────────────────────┘
Installation
dotnet add package TeaLeaf
The single package bundles everything:
| Component | What it provides |
|---|---|
TeaLeaf | Managed wrapper types (TLDocument, TLValue, TLReader), reflection serializer |
TeaLeaf.Annotations | Attributes ([TeaLeaf], [TLSkip], etc.) – included as a dependency |
TeaLeaf.Generators | C# incremental source generator – bundled as an analyzer |
| Native libraries | tealeaf_ffi for all supported platforms (win/linux/osx, x64/arm64) |
Two Serialization Approaches
1. Source Generator (Recommended)
Zero-reflection, compile-time code generation:
[TeaLeaf]
public partial class User
{
public int Id { get; set; }
public string Name { get; set; } = "";
[TLOptional] public string? Email { get; set; }
}
// Generated methods
string schema = User.GetTeaLeafSchema();
string text = user.ToTeaLeafText();
string json = user.ToTeaLeafJson();
user.CompileToTeaLeaf("user.tlbx");
var loaded = User.FromTeaLeaf(doc);
Requirements:
- Class must be
partial - Annotated with
[TeaLeaf] - Properties must have public getters (and setters for deserialization)
2. Reflection Serializer
For generic types, dynamic scenarios, or types you don’t control:
using var doc = TeaLeafSerializer.ToDocument(user);
string text = TeaLeafSerializer.ToText(user);
string json = TeaLeafSerializer.ToJson(user);
var loaded = TeaLeafSerializer.Deserialize<User>(doc);
Core Types
TLDocument
The in-memory document, wrapping a native handle:
// Parse text
using var doc = TLDocument.Parse("name: alice\nage: 30");
// Load from file
using var doc = TLDocument.ParseFile("data.tl");
// From JSON
using var doc = TLDocument.FromJson(jsonString);
// Access values
string[] keys = doc.Keys;
using var value = doc["name"];
// Output
string text = doc.ToText();
string json = doc.ToJson();
doc.Compile("output.tlbx", compress: true);
TLValue
Represents any TeaLeaf value with type-safe accessors:
using var val = doc["users"];
// Type checking
TLType type = val.Type;
bool isNull = val.IsNull;
// Primitive access
bool? b = val.AsBool();
long? i = val.AsInt();
double? f = val.AsFloat();
string? s = val.AsString();
byte[]? bytes = val.AsBytes();
DateTimeOffset? ts = val.AsDateTime();
// Collection access
int len = val.ArrayLength;
using var elem = val[0];
using var field = val["name"];
string[] keys = val.ObjectKeys;
// Dynamic conversion
object? obj = val.ToObject();
TLReader
Binary file reader with optional memory mapping:
// Standard read
using var reader = TLReader.Open("data.tlbx");
// Memory-mapped (zero-copy for large files)
using var reader = TLReader.OpenMmap("data.tlbx");
// Access
string[] keys = reader.Keys;
using var val = reader["users"];
// Schema introspection
int schemaCount = reader.SchemaCount;
string name = reader.GetSchemaName(0);
Next Steps
- Source Generator – compile-time code generation in detail
- Attributes Reference – all available annotations
- Reflection Serializer – runtime serialization
- Native Types –
TLDocument,TLValue,TLReaderAPI - Diagnostics – compiler warnings and errors
- Platform Support – supported runtimes and architectures
Source Generator
The TeaLeaf source generator is a C# incremental source generator (IIncrementalGenerator) that generates serialization and deserialization code at compile time.
How It Works
- Roslyn detects classes annotated with
[TeaLeaf] - ModelAnalyzer examines the type’s properties, attributes, and nested types
- TLTextEmitter generates serialization methods
- DeserializerEmitter generates deserialization methods
- Generated code is added as a partial class extension
Requirements
- The class must be
partial - Annotated with
[TeaLeaf](fromTeaLeaf.Annotations) - Public properties with getters (and setters for deserialization)
- .NET 8.0+ with incremental source generator support
Basic Example
using TeaLeaf.Annotations;
[TeaLeaf]
public partial class User
{
public int Id { get; set; }
public string Name { get; set; } = "";
[TLOptional] public string? Email { get; set; }
public bool Active { get; set; }
}
Generated Methods
For each [TeaLeaf] class, the generator produces:
GetTeaLeafSchema()
Returns the @struct definition as a string:
string schema = User.GetTeaLeafSchema();
// "@struct user (id: int, name: string, email: string?, active: bool)"
ToTeaLeafText()
Serializes the instance to TeaLeaf text body format:
string text = user.ToTeaLeafText();
// "(1, \"Alice\", \"alice@example.com\", true)"
ToTeaLeafDocument(string key = "user")
Returns a complete TeaLeaf text document with schemas:
string doc = user.ToTeaLeafDocument();
// "@struct user (id: int, name: string, email: string?, active: bool)\nuser: (1, ...)"
ToTLDocument(string key = "user")
Parses through the native engine to create a TLDocument:
using var doc = user.ToTLDocument();
string json = doc.ToJson();
doc.Compile("user.tlbx");
ToTeaLeafJson(string key = "user")
Serializes to JSON via the native engine:
string json = user.ToTeaLeafJson();
CompileToTeaLeaf(string path, string key = "user", bool compress = false)
Compiles directly to a .tlbx binary file:
user.CompileToTeaLeaf("user.tlbx", compress: true);
FromTeaLeaf(TLDocument doc, string key = "user")
Deserializes from a TLDocument:
using var doc = TLDocument.ParseFile("user.tlbx");
var loaded = User.FromTeaLeaf(doc);
FromTeaLeaf(TLValue value)
Deserializes from a TLValue (for nested types):
using var val = doc["user"];
var loaded = User.FromTeaLeaf(val);
Nested Types
Types referencing other [TeaLeaf] types are fully supported:
[TeaLeaf]
public partial class Address
{
public string Street { get; set; } = "";
public string City { get; set; } = "";
}
[TeaLeaf]
public partial class Person
{
public string Name { get; set; } = "";
public Address Home { get; set; } = new();
[TLOptional] public Address? Work { get; set; }
}
Generated schema:
@struct address (street: string, city: string)
@struct person (name: string, home: address, work: address?)
Collections
[TeaLeaf]
public partial class Team
{
public string Name { get; set; } = "";
public List<string> Tags { get; set; } = new();
public List<Person> Members { get; set; } = new();
}
Generated schema:
@struct team (name: string, tags: []string, members: []person)
Enum Support
Enums are serialized as snake_case strings:
public enum Status { Active, Inactive, Suspended }
[TeaLeaf]
public partial class User
{
public string Name { get; set; } = "";
public Status Status { get; set; }
}
In TeaLeaf text: ("Alice", active)
Type Mapping
| C# Type | TeaLeaf Type |
|---|---|
bool | bool |
int | int |
long | int64 |
short | int16 |
sbyte | int8 |
uint | uint |
ulong | uint64 |
ushort | uint16 |
byte | uint8 |
double | float |
float | float32 |
decimal | float |
string | string |
DateTime | timestamp |
DateTimeOffset | timestamp |
byte[] | bytes |
List<T> | []T |
T? / Nullable<T> | T? |
| Enum | string |
[TeaLeaf] class | struct reference |
See Also
- Attributes Reference – all annotation options
- Diagnostics – compiler warnings (TL001-TL006)
.NET Attributes Reference
All TeaLeaf annotations are in the TeaLeaf.Annotations namespace.
Type-Level Attributes
[TeaLeaf] / [TeaLeaf("struct_name")]
Marks a class for source generator processing:
[TeaLeaf] // Schema name: "my_class" (auto snake_case)
public partial class MyClass { }
[TeaLeaf("config")] // Schema name: "config" (explicit)
public partial class AppConfiguration { }
The optional string parameter sets the struct name used in the TeaLeaf schema. If omitted, the class name is converted to snake_case.
The attribute also has an EmitSchema property (defaults to true). When set to false, the source generator skips @struct and @table output for arrays of this type:
[TeaLeaf(EmitSchema = false)] // Data only, no @struct definition
public partial class RawData { }
[TLKey("key_name")]
Overrides the top-level key used when serializing as a document entry:
[TeaLeaf]
[TLKey("app_settings")]
public partial class Config
{
public string Host { get; set; } = "";
public int Port { get; set; }
}
// Default key would be "config", but TLKey overrides to "app_settings"
string doc = config.ToTeaLeafDocument(); // key is "app_settings"
Property-Level Attributes
[TLSkip]
Exclude a property from serialization and deserialization:
[TeaLeaf]
public partial class User
{
public int Id { get; set; }
public string Name { get; set; } = "";
[TLSkip]
public string ComputedDisplayName => $"User #{Id}: {Name}";
}
[TLOptional]
Mark a property as nullable in the schema:
[TeaLeaf]
public partial class User
{
public string Name { get; set; } = "";
[TLOptional]
public string? Email { get; set; }
[TLOptional]
public int? Age { get; set; }
}
// Schema: @struct user (name: string, email: string?, age: int?)
Note: Properties of nullable reference types (
string?) orNullable<T>types (int?) are automatically treated as optional. The[TLOptional]attribute is mainly for explicit documentation.
[TLRename("field_name")]
Override the field name in the TeaLeaf schema:
[TeaLeaf]
public partial class User
{
[TLRename("user_name")]
public string Name { get; set; } = "";
[TLRename("is_active")]
public bool Active { get; set; }
}
// Schema: @struct user (user_name: string, is_active: bool)
Without [TLRename], property names are converted to snake_case (Name → name, IsActive → is_active).
[TLType("type_name")]
Override the TeaLeaf type for a field:
[TeaLeaf]
public partial class Event
{
public string Name { get; set; } = "";
[TLType("timestamp")]
public long CreatedAt { get; set; } // Would be int64, forced to timestamp
[TLType("uint64")]
public long LargeCount { get; set; } // Would be int64, forced to uint64
}
Valid type names: bool, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float, float32, float64, string, bytes, timestamp.
Attribute Summary
| Attribute | Level | Description |
|---|---|---|
[TeaLeaf] / [TeaLeaf("name")] | Class | Enable source generation, optional struct name |
[TLKey("key")] | Class | Override document key |
[TLSkip] | Property | Exclude from serialization |
[TLOptional] | Property | Mark as nullable in schema |
[TLRename("name")] | Property | Override field name |
[TLType("type")] | Property | Override TeaLeaf type |
Combining Attributes
[TeaLeaf("event_record")]
[TLKey("events")]
public partial class EventRecord
{
[TLRename("event_id")]
public int Id { get; set; }
public string Name { get; set; } = "";
[TLType("timestamp")]
public long CreatedAt { get; set; }
[TLOptional]
[TLRename("extra_data")]
public string? Metadata { get; set; }
[TLSkip]
public string DisplayLabel => $"{Name} ({Id})";
}
Generated schema:
@struct event_record (event_id: int, name: string, created_at: timestamp, extra_data: string?)
Reflection Serializer
The TeaLeafSerializer class provides runtime reflection-based serialization for scenarios where the source generator isn’t suitable.
When to Use
| Scenario | Approach |
|---|---|
| Known types at compile time | Source Generator (recommended) |
Generic types (T) | Reflection Serializer |
| Types you don’t control (third-party) | Reflection Serializer |
| Dynamic/runtime-determined types | Reflection Serializer |
| Maximum performance | Source Generator |
API
All methods are on the static TeaLeafSerializer class.
Serialization
// To document text (schemas + data)
string docText = TeaLeafSerializer.ToDocument<User>(user);
string docText = TeaLeafSerializer.ToDocument<User>(user, key: "custom_key");
// To TeaLeaf text (data only, no schemas)
string text = TeaLeafSerializer.ToText<User>(user);
// To TLDocument (for further operations)
using var doc = TeaLeafSerializer.ToTLDocument<User>(user);
using var doc = TeaLeafSerializer.ToTLDocument<User>(user, key: "custom_key");
// To JSON (via native engine)
string json = TeaLeafSerializer.ToJson<User>(user);
// Compile to binary
TeaLeafSerializer.Compile<User>(user, "output.tlbx", compress: true);
Deserialization
// From TLDocument
using var doc = TLDocument.Parse(tlText);
var user = TeaLeafSerializer.FromDocument<User>(doc);
var user = TeaLeafSerializer.FromDocument<User>(doc, key: "custom_key");
// From TLValue (for nested types)
using var val = doc.Get("user");
var user = TeaLeafSerializer.FromValue<User>(val);
// From text
var user = TeaLeafSerializer.FromText<User>(tlText);
Schema Generation
// Get schema string
string schema = TeaLeafSerializer.GetSchema<User>();
// "@struct user (id: int, name: string, email: string?)"
// Get TeaLeaf type name for a C# type
string typeName = TeaLeafTextHelper.GetTLTypeName(typeof(int)); // "int"
string typeName = TeaLeafTextHelper.GetTLTypeName(typeof(long)); // "int64"
string typeName = TeaLeafTextHelper.GetTLTypeName(typeof(DateTime)); // "timestamp"
Type Mapping
The reflection serializer uses TeaLeafTextHelper.GetTLTypeName() for type resolution:
| C# Type | TeaLeaf Type |
|---|---|
bool | bool |
int | int |
long | int64 |
short | int16 |
sbyte | int8 |
uint | uint |
ulong | uint64 |
ushort | uint16 |
byte | uint8 |
double | float |
float | float32 |
decimal | float |
string | string |
DateTime | timestamp |
DateTimeOffset | timestamp |
byte[] | bytes |
List<T> | []T |
Dictionary<string, T> | object |
| Enum | string |
[TeaLeaf] class | struct reference |
Attributes
The reflection serializer respects the same attributes as the source generator:
[TeaLeaf]/[TeaLeaf("name")]– struct name[TLKey("key")]– document key[TLSkip]– skip property[TLOptional]– nullable field[TLRename("name")]– rename field[TLType("type")]– override type
Text Helpers
The TeaLeafTextHelper class provides utilities used by the serializer:
// PascalCase to snake_case
TeaLeafTextHelper.ToSnakeCase("MyProperty"); // "my_property"
// String quoting
TeaLeafTextHelper.NeedsQuoting("hello world"); // true
TeaLeafTextHelper.QuoteIfNeeded("hello world"); // "\"hello world\""
TeaLeafTextHelper.EscapeString("line\nnewline"); // "line\\nnewline"
// Value formatting
var sb = new StringBuilder();
TeaLeafTextHelper.AppendValue(sb, 42, typeof(int)); // "42"
TeaLeafTextHelper.AppendValue(sb, null, typeof(string)); // "~"
Performance Considerations
The reflection serializer uses System.Reflection at runtime, which is slower than the source generator approach. For hot paths or high-throughput scenarios, prefer the source generator.
However, the actual binary compilation and native operations are identical – both approaches use the same native Rust library under the hood. The performance difference is only in the C# serialization/deserialization layer.
Native Types
The managed wrapper types provide safe access to the native TeaLeaf library. All native types implement IDisposable and must be disposed to prevent memory leaks.
TLDocument
Represents a parsed TeaLeaf document.
Construction
// Parse text
using var doc = TLDocument.Parse("name: alice\nage: 30");
// Parse from file (text or binary -- auto-detected)
using var doc = TLDocument.ParseFile("data.tl");
using var doc = TLDocument.ParseFile("data.tlbx");
// From JSON string
using var doc = TLDocument.FromJson("{\"name\": \"alice\"}");
Value Access
// Get value by key
using var val = doc["name"]; // indexer
using var val = doc.Get("name"); // method
// Get all keys
string[] keys = doc.Keys;
Output
// To text
string text = doc.ToText(); // full document (schemas + data)
string data = doc.ToTextDataOnly(); // data only (no schemas)
// To JSON
string json = doc.ToJson(); // pretty-printed
string json = doc.ToJsonCompact(); // minified
// Compile to binary
doc.Compile("output.tlbx", compress: true);
Disposal
TLDocument wraps a native pointer. Always dispose:
using var doc = TLDocument.Parse(text); // using statement (recommended)
// Or manual disposal
var doc = TLDocument.Parse(text);
try { /* use doc */ }
finally { doc.Dispose(); }
TLValue
Represents any TeaLeaf value with type-safe accessors.
Type Checking
TLType type = value.Type; // Enum: Null, Bool, Int, UInt, Float, String, etc.
bool isNull = value.IsNull; // Shorthand for Type == TLType.Null
Primitive Accessors
Each returns null if the value is not the expected type:
bool? b = value.AsBool();
long? i = value.AsInt();
ulong? u = value.AsUInt();
double? f = value.AsFloat();
string? s = value.AsString();
long? ts = value.AsTimestamp(); // Unix milliseconds
short? tz = value.AsTimestampOffset(); // Timezone offset in minutes (0 = UTC)
DateTimeOffset? dt = value.AsDateTime(); // Converted from timestamp (preserves offset)
byte[]? bytes = value.AsBytes();
Object Access
string[] keys = value.ObjectKeys; // All field names
using var field = value.GetField("name"); // Get by key
using var field = value["name"]; // Indexer shorthand
Array Access
int len = value.ArrayLength;
using var elem = value.GetArrayElement(0); // By index
using var elem = value[0]; // Indexer shorthand
foreach (var item in value.AsArray())
{
// item is a TLValue -- caller must dispose
using (item)
{
Console.WriteLine(item.AsString());
}
}
Map Access
int len = value.MapLength;
using var key = value.GetMapKey(0);
using var val = value.GetMapValue(0);
foreach (var (k, v) in value.AsMap())
{
using (k) using (v)
{
Console.WriteLine($"{k.AsString()}: {v.AsString()}");
}
}
Reference and Tag Access
string? refName = value.AsRefName(); // For Ref values
string? tagName = value.AsTagName(); // For Tagged values
using var inner = value.AsTagValue(); // Inner value of a Tagged
Dynamic Conversion
object? obj = value.ToObject();
// Returns: bool, long, ulong, double, string, byte[],
// DateTimeOffset, object[], Dictionary<string, object?>, or null
TLReader
Binary file reader with optional memory-mapped I/O.
Construction
// Standard file read
using var reader = TLReader.Open("data.tlbx");
// Memory-mapped (recommended for large files)
using var reader = TLReader.OpenMmap("data.tlbx");
Value Access
string[] keys = reader.Keys;
using var val = reader["users"];
using var val = reader.Get("users");
Schema Introspection
foreach (var schema in reader.Schemas)
{
Console.WriteLine($"Schema: {schema.Name}");
foreach (var field in schema.Fields)
{
Console.WriteLine($" {field.Name}: {(field.IsArray ? "[]" : "")}{field.Type}{(field.IsNullable ? "?" : "")}");
}
}
// Look up a specific schema by name
var userSchema = reader.GetSchema("user");
if (userSchema != null)
{
Console.WriteLine($"user has {userSchema.Fields.Count} fields");
}
TLType Enum
public enum TLType
{
Null = 0,
Bool = 1,
Int = 2,
UInt = 3,
Float = 4,
String = 5,
Bytes = 6,
Array = 7,
Object = 8,
Map = 9,
Ref = 10,
Tagged = 11,
Timestamp = 12,
}
Memory Management
All native types (TLDocument, TLValue, TLReader) hold native pointers and must be disposed:
// Preferred: using statement
using var doc = TLDocument.Parse(text);
// For values from collections, dispose each item:
foreach (var item in value.AsArray())
{
using (item)
{
// process
}
}
// For map entries:
foreach (var (key, val) in value.AsMap())
{
using (key) using (val)
{
// process
}
}
Accessing a disposed object throws ObjectDisposedException.
Diagnostics
The TeaLeaf source generator reports diagnostics (warnings and errors) through the standard C# compiler diagnostic system.
Diagnostic Codes
| Code | Severity | Message |
|---|---|---|
| TL001 | Error | Type must be declared as partial |
| TL002 | Warning | Unsupported property type |
| TL003 | Error | Invalid TLType attribute value |
| TL004 | Warning | Nested type not annotated with [TeaLeaf] |
| TL005 | Warning | Circular type reference detected |
| TL006 | Error | Open generic types are not supported |
TL001: Type Must Be Partial
The source generator needs to add methods to your class. This requires the partial modifier.
// ERROR: TL001
[TeaLeaf]
public class User { } // Missing 'partial'
// FIXED
[TeaLeaf]
public partial class User { }
TL002: Unsupported Property Type
A property type isn’t directly mappable to a TeaLeaf type.
[TeaLeaf]
public partial class Config
{
public IntPtr NativeHandle { get; set; } // WARNING: TL002
}
The property will be skipped. Supported types include all primitives, string, DateTime, DateTimeOffset, byte[], List<T>, Dictionary<string, T>, enums, and other [TeaLeaf]-annotated classes.
TL003: Invalid TLType Value
The [TLType] attribute was given an unrecognized type name.
[TeaLeaf]
public partial class Event
{
[TLType("datetime")] // ERROR: TL003 -- "datetime" is not a valid type
public long Created { get; set; }
[TLType("timestamp")] // CORRECT
public long Updated { get; set; }
}
Valid values: bool, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float, float32, float64, string, bytes, timestamp.
TL004: Nested Type Not Annotated
A property references a class type that doesn’t have the [TeaLeaf] attribute.
public class Address // Missing [TeaLeaf]
{
public string City { get; set; } = "";
}
[TeaLeaf]
public partial class User
{
public Address Home { get; set; } = new(); // WARNING: TL004
}
Fix by adding [TeaLeaf] to the nested type:
[TeaLeaf]
public partial class Address
{
public string City { get; set; } = "";
}
TL005: Circular Type Reference
A type references itself (directly or transitively), which may cause a stack overflow at runtime during serialization.
[TeaLeaf]
public partial class TreeNode
{
public string Name { get; set; } = "";
public TreeNode? Child { get; set; } // WARNING: TL005 -- circular reference
}
The code will still compile, but recursive structures must be bounded (e.g., use [TLOptional] with null termination) to avoid infinite recursion.
TL006: Open Generic Types
Generic type parameters are not supported:
// ERROR: TL006
[TeaLeaf]
public partial class Container<T>
{
public T Value { get; set; }
}
Use concrete types instead. For generic scenarios, use the Reflection Serializer.
Viewing Diagnostics
Diagnostics appear in:
- Visual Studio – Error List window
- VS Code – Problems panel (with C# extension)
- dotnet build – terminal output
- MSBuild – build log
Example compiler output:
User.cs(3,22): error TL001: TeaLeaf type 'User' must be declared as partial
Config.cs(8,16): warning TL004: Property 'Address' type is not annotated with [TeaLeaf]
Platform Support
The TeaLeaf NuGet package includes pre-built native libraries for all major platforms.
Supported Platforms
| OS | Architecture | Native Library | Status |
|---|---|---|---|
| Windows | x64 | tealeaf_ffi.dll | Supported |
| Windows | ARM64 | tealeaf_ffi.dll | Supported |
| Linux | x64 (glibc) | libtealeaf_ffi.so | Supported |
| Linux | ARM64 (glibc) | libtealeaf_ffi.so | Supported |
| macOS | x64 (Intel) | libtealeaf_ffi.dylib | Supported |
| macOS | ARM64 (Apple Silicon) | libtealeaf_ffi.dylib | Supported |
.NET Requirements
- .NET 8.0 or later
- C# compiler with incremental source generator support (for the source generator)
NuGet Package Structure
The NuGet package bundles native libraries for all platforms using the runtimes folder convention:
TeaLeaf.nupkg
├── lib/net8.0/
│ ├── TeaLeaf.dll
│ ├── TeaLeaf.Annotations.dll
│ └── TeaLeaf.Generators.dll
└── runtimes/
├── win-x64/native/tealeaf_ffi.dll
├── win-arm64/native/tealeaf_ffi.dll
├── linux-x64/native/libtealeaf_ffi.so
├── linux-arm64/native/libtealeaf_ffi.so
├── osx-x64/native/libtealeaf_ffi.dylib
└── osx-arm64/native/libtealeaf_ffi.dylib
The .NET runtime automatically selects the correct native library based on the host platform.
Native Library Loading
The managed layer uses [DllImport("tealeaf_ffi")] for P/Invoke. The .NET runtime resolves the native library through:
- NuGet runtimes folder – automatic for published apps
- Application directory – for self-contained deployments
- System library path –
PATH(Windows),LD_LIBRARY_PATH(Linux),DYLD_LIBRARY_PATH(macOS)
Deployment
Framework-Dependent
dotnet publish -c Release
The native library is copied to the output directory automatically.
Self-Contained
dotnet publish -c Release --self-contained -r win-x64
dotnet publish -c Release --self-contained -r linux-x64
dotnet publish -c Release --self-contained -r osx-arm64
Docker
For Linux containers, use the appropriate runtime:
FROM mcr.microsoft.com/dotnet/runtime:8.0
# Native library is included in the publish output
COPY --from=build /app/publish .
Building Native Libraries from Source
If you need a platform not included in the NuGet package:
# Clone the repository
git clone https://github.com/krishjag/tealeaf.git
cd tealeaf
# Build the FFI library
cargo build --release --package tealeaf-ffi
# Output location
# Windows: target/release/tealeaf_ffi.dll
# Linux: target/release/libtealeaf_ffi.so
# macOS: target/release/libtealeaf_ffi.dylib
Place the built library in your application directory or system library path.
Troubleshooting
DllNotFoundException
The native library could not be found. Check:
- The package includes your platform (
dotnet --infoto check RID) - For self-contained apps, ensure the correct
-rflag is used - For manual builds, ensure the library is in the application directory
BadImageFormatException
Architecture mismatch between the .NET runtime and native library. Ensure both are the same architecture (x64/ARM64).
EntryPointNotFoundException
Version mismatch between the managed and native libraries. Ensure both are from the same release.
FFI Reference: Overview
The tealeaf-ffi crate exposes a C-compatible API for integrating TeaLeaf into any language that supports C FFI (Foreign Function Interface).
Architecture
┌──────────────────────┐
│ Host Language │
│ (.NET, Python, etc.)│
├──────────────────────┤
│ FFI Bindings │
│ (P/Invoke, ctypes) │
├──────────────────────┤
│ tealeaf_ffi │ ← C ABI library
│ (cdylib + staticlib)│
├──────────────────────┤
│ tealeaf-core │ ← Rust core library
└──────────────────────┘
The FFI layer provides:
- Document parsing – parse text, files, and JSON
- Value access – type-safe accessors for all value types
- Binary reader – read
.tlbxfiles with optional memory mapping - Schema introspection – query schema structure at runtime
- JSON conversion – to/from JSON
- Binary compilation – compile documents to
.tlbx - Error handling – thread-local last-error pattern
- Memory management – explicit free functions for all allocated resources
Output Libraries
The crate builds both dynamic and static libraries:
| Platform | Dynamic Library | Static Library |
|---|---|---|
| Windows | tealeaf_ffi.dll | tealeaf_ffi.lib |
| Linux | libtealeaf_ffi.so | libtealeaf_ffi.a |
| macOS | libtealeaf_ffi.dylib | libtealeaf_ffi.a |
C Header
The build generates a C header via cbindgen:
#include "tealeaf_ffi.h"
// Parse a document
TLDocument* doc = tl_parse("name: alice\nage: 30");
if (!doc) {
char* err = tl_get_last_error();
fprintf(stderr, "Error: %s\n", err);
tl_string_free(err);
return 1;
}
// Access a value
TLValue* val = tl_document_get(doc, "name");
if (val && tl_value_type(val) == TL_STRING) {
char* name = tl_value_as_string(val);
printf("Name: %s\n", name);
tl_string_free(name);
}
tl_value_free(val);
tl_document_free(doc);
Opaque Types
The FFI uses opaque pointer types:
| Type | Description |
|---|---|
TLDocument* | Parsed document handle |
TLValue* | Value handle (any type) |
TLReader* | Binary file reader handle |
All handles must be freed with their corresponding _free function.
Error Model
TeaLeaf FFI uses the thread-local last-error pattern:
- Functions that can fail return
NULL(pointers) or a result struct - On failure, the error message is stored in thread-local storage
- Call
tl_get_last_error()to retrieve it - Call
tl_clear_error()to clear it
TLDocument* doc = tl_parse("invalid {");
if (!doc) {
char* err = tl_get_last_error();
// err contains the parse error message
tl_string_free(err);
}
Null Safety
All FFI functions that accept pointers are null-safe:
- Passing
NULLreturns a safe default (0, false, NULL) rather than crashing - This makes it safe to chain calls without checking each one
Next Steps
- API Reference – complete function listing
- Memory Management – ownership and freeing rules
- Building from Source – compilation instructions
FFI API Reference
Complete listing of all exported FFI functions.
Error Handling
tl_get_last_error
char* tl_get_last_error(void);
Returns the last error message, or NULL if no error. Caller must free with tl_string_free.
tl_clear_error
void tl_clear_error(void);
Clears the thread-local error state.
Version
tl_version
const char* tl_version(void);
Returns the library version string (e.g., "2.0.0-beta.8"). The returned pointer is static – do not free it.
Document API
tl_parse
TLDocument* tl_parse(const char* text);
Parse a TeaLeaf text string. Returns NULL on failure (check tl_get_last_error).
tl_parse_file
TLDocument* tl_parse_file(const char* path);
Parse a TeaLeaf text file. Returns NULL on failure.
tl_document_free
void tl_document_free(TLDocument* doc);
Free a document. Safe to call with NULL.
tl_document_get
TLValue* tl_document_get(const TLDocument* doc, const char* key);
Get a value by key. Returns NULL if key not found or doc is NULL. Caller must free with tl_value_free.
tl_document_keys
char** tl_document_keys(const TLDocument* doc);
Get all top-level keys as a NULL-terminated array. Caller must free with tl_string_array_free.
tl_document_to_text
char* tl_document_to_text(const TLDocument* doc);
Convert document to TeaLeaf text (with schemas). Caller must free with tl_string_free.
tl_document_to_text_data_only
char* tl_document_to_text_data_only(const TLDocument* doc);
Convert document to TeaLeaf text (data only, no schemas). Caller must free with tl_string_free.
tl_document_compile
TLResult tl_document_compile(const TLDocument* doc, const char* path, bool compress);
Compile document to binary file. Returns a TLResult indicating success or failure.
JSON API
tl_document_from_json
TLDocument* tl_document_from_json(const char* json);
Parse a JSON string into a TLDocument. Returns NULL on failure.
tl_document_to_json
char* tl_document_to_json(const TLDocument* doc);
Convert document to pretty-printed JSON. Caller must free with tl_string_free.
tl_document_to_json_compact
char* tl_document_to_json_compact(const TLDocument* doc);
Convert document to minified JSON. Caller must free with tl_string_free.
Value API
tl_value_type
TLValueType tl_value_type(const TLValue* value);
Get the type of a value. Returns TL_NULL (0) if value is NULL.
tl_value_free
void tl_value_free(TLValue* value);
Free a value. Safe to call with NULL.
Primitive Accessors
bool tl_value_as_bool(const TLValue* value); // false if not bool
int64_t tl_value_as_int(const TLValue* value); // 0 if not int
uint64_t tl_value_as_uint(const TLValue* value); // 0 if not uint
double tl_value_as_float(const TLValue* value); // 0.0 if not float
char* tl_value_as_string(const TLValue* value); // NULL if not string; free with tl_string_free
int64_t tl_value_as_timestamp(const TLValue* value); // 0 if not timestamp (millis only)
int16_t tl_value_as_timestamp_offset(const TLValue* value); // 0 if not timestamp (tz offset in minutes)
Bytes Accessors
size_t tl_value_bytes_len(const TLValue* value); // 0 if not bytes
const uint8_t* tl_value_bytes_data(const TLValue* value); // NULL if not bytes; pointer valid while value lives
Reference/Tag Accessors
char* tl_value_ref_name(const TLValue* value); // NULL if not ref; free with tl_string_free
char* tl_value_tag_name(const TLValue* value); // NULL if not tagged; free with tl_string_free
TLValue* tl_value_tag_value(const TLValue* value); // NULL if not tagged; free with tl_value_free
Array Accessors
size_t tl_value_array_len(const TLValue* value); // 0 if not array
TLValue* tl_value_array_get(const TLValue* value, size_t index); // NULL if out of bounds; free with tl_value_free
Object Accessors
TLValue* tl_value_object_get(const TLValue* value, const char* key); // NULL if not found; free with tl_value_free
char** tl_value_object_keys(const TLValue* value); // NULL-terminated; free with tl_string_array_free
Map Accessors
size_t tl_value_map_len(const TLValue* value); // 0 if not map
TLValue* tl_value_map_get_key(const TLValue* value, size_t index); // NULL if out of bounds; free with tl_value_free
TLValue* tl_value_map_get_value(const TLValue* value, size_t index);// NULL if out of bounds; free with tl_value_free
Binary Reader API
tl_reader_open
TLReader* tl_reader_open(const char* path);
Open a binary file for reading. Returns NULL on failure.
tl_reader_open_mmap
TLReader* tl_reader_open_mmap(const char* path);
Open a binary file with memory-mapped I/O (zero-copy). Returns NULL on failure.
tl_reader_free
void tl_reader_free(TLReader* reader);
Free a reader. Safe to call with NULL.
tl_reader_get
TLValue* tl_reader_get(const TLReader* reader, const char* key);
Get a value by key from binary. Returns NULL if not found. Caller must free with tl_value_free.
tl_reader_keys
char** tl_reader_keys(const TLReader* reader);
Get all section keys. Returns NULL-terminated array. Free with tl_string_array_free.
Schema API
size_t tl_reader_schema_count(const TLReader* reader);
char* tl_reader_schema_name(const TLReader* reader, size_t index);
size_t tl_reader_schema_field_count(const TLReader* reader, size_t schema_index);
char* tl_reader_schema_field_name(const TLReader* reader, size_t schema_index, size_t field_index);
char* tl_reader_schema_field_type(const TLReader* reader, size_t schema_index, size_t field_index);
bool tl_reader_schema_field_nullable(const TLReader* reader, size_t schema_index, size_t field_index);
bool tl_reader_schema_field_is_array(const TLReader* reader, size_t schema_index, size_t field_index);
All char* returns from schema functions must be freed with tl_string_free. Out-of-bounds indices return NULL/0/false.
Memory Management
tl_string_free
void tl_string_free(char* s);
Free a string returned by any FFI function. Safe to call with NULL.
tl_string_array_free
void tl_string_array_free(char** arr);
Free a NULL-terminated string array. Frees each string and the array pointer. Safe to call with NULL.
tl_result_free
void tl_result_free(TLResult* result);
Free any allocated memory inside a TLResult. Safe to call with NULL.
Type Enum
typedef enum {
TL_NULL = 0,
TL_BOOL = 1,
TL_INT = 2,
TL_UINT = 3,
TL_FLOAT = 4,
TL_STRING = 5,
TL_BYTES = 6,
TL_ARRAY = 7,
TL_OBJECT = 8,
TL_MAP = 9,
TL_REF = 10,
TL_TAGGED = 11,
TL_TIMESTAMP = 12,
} TLValueType;
Memory Management
The FFI layer uses explicit manual memory management. Understanding ownership rules is critical for writing correct bindings.
Ownership Rules
Rule 1: Caller Owns Returned Pointers
Every function that returns a heap-allocated pointer transfers ownership to the caller. The caller must free it with the appropriate function:
| Return Type | Free Function |
|---|---|
TLDocument* | tl_document_free() |
TLValue* | tl_value_free() |
TLReader* | tl_reader_free() |
char* | tl_string_free() |
char** | tl_string_array_free() |
TLResult | tl_result_free() |
Rule 2: Borrowed Pointers Are Read-Only
Functions that take const T* parameters borrow the pointer. The FFI layer does not take ownership or free inputs:
// doc is borrowed -- you still own it and must free it later
TLValue* val = tl_document_get(doc, "key");
// ... use val ...
tl_value_free(val); // free the returned value
tl_document_free(doc); // free the document separately
Rule 3: Null Is Always Safe
Every free function and every accessor accepts NULL safely:
tl_document_free(NULL); // no-op
tl_value_free(NULL); // no-op
tl_string_free(NULL); // no-op
TLValue* val = tl_document_get(NULL, "key"); // returns NULL
bool b = tl_value_as_bool(NULL); // returns false
Common Patterns
Parse → Use → Free
TLDocument* doc = tl_parse("name: alice");
if (doc) {
TLValue* name = tl_document_get(doc, "name");
if (name) {
char* str = tl_value_as_string(name);
if (str) {
printf("%s\n", str);
tl_string_free(str);
}
tl_value_free(name);
}
tl_document_free(doc);
}
Iterating Arrays
TLValue* arr = tl_document_get(doc, "items");
size_t len = tl_value_array_len(arr);
for (size_t i = 0; i < len; i++) {
TLValue* elem = tl_value_array_get(arr, i);
// use elem...
tl_value_free(elem); // free each element
}
tl_value_free(arr); // free the array value
Iterating Object Keys
TLValue* obj = tl_document_get(doc, "config");
char** keys = tl_value_object_keys(obj);
if (keys) {
for (int i = 0; keys[i] != NULL; i++) {
TLValue* val = tl_value_object_get(obj, keys[i]);
// use val...
tl_value_free(val);
}
tl_string_array_free(keys); // frees all strings AND the array
}
tl_value_free(obj);
Iterating Maps
TLValue* map = tl_document_get(doc, "headers");
size_t len = tl_value_map_len(map);
for (size_t i = 0; i < len; i++) {
TLValue* key = tl_value_map_get_key(map, i);
TLValue* val = tl_value_map_get_value(map, i);
char* k = tl_value_as_string(key);
char* v = tl_value_as_string(val);
printf("%s: %s\n", k, v);
tl_string_free(k);
tl_string_free(v);
tl_value_free(key);
tl_value_free(val);
}
tl_value_free(map);
String Arrays
char** keys = tl_document_keys(doc);
if (keys) {
for (int i = 0; keys[i] != NULL; i++) {
printf("Key: %s\n", keys[i]);
}
tl_string_array_free(keys); // ONE call frees everything
}
Bytes Data
The tl_value_bytes_data function returns a borrowed pointer valid only while the value lives:
TLValue* val = tl_document_get(doc, "data");
size_t len = tl_value_bytes_len(val);
const uint8_t* data = tl_value_bytes_data(val);
// Copy if you need the data after freeing the value
uint8_t* copy = malloc(len);
memcpy(copy, data, len);
tl_value_free(val); // data pointer is now invalid
// copy is still valid
Error Strings
Error strings are owned by the caller:
char* err = tl_get_last_error();
if (err) {
fprintf(stderr, "Error: %s\n", err);
tl_string_free(err); // must free
}
Common Mistakes
| Mistake | Consequence | Fix |
|---|---|---|
| Not freeing returned pointers | Memory leak | Always pair creation with _free |
| Using pointer after free | Use-after-free / crash | Set pointer to NULL after free |
Freeing borrowed bytes_data pointer | Double-free / crash | Only free with tl_value_free on the value |
| Calling wrong free function | Undefined behavior | Match the free to the allocation type |
| Freeing strings from string_array individually | Double-free | Use tl_string_array_free once |
Building from Source
How to build the TeaLeaf FFI library from source.
Prerequisites
- Rust toolchain (1.70+)
- A C compiler (for cbindgen header generation)
Build
git clone https://github.com/krishjag/tealeaf.git
cd tealeaf
cargo build --release --package tealeaf-ffi
Output Files
| Platform | Dynamic Library | Static Library |
|---|---|---|
| Windows | target/release/tealeaf_ffi.dll | target/release/tealeaf_ffi.lib |
| Linux | target/release/libtealeaf_ffi.so | target/release/libtealeaf_ffi.a |
| macOS | target/release/libtealeaf_ffi.dylib | target/release/libtealeaf_ffi.a |
C Header
The build generates a C header via cbindgen (configured in tealeaf-ffi/cbindgen.toml):
# Header is generated during build
# Location: target/tealeaf_ffi.h (or as configured)
Cross-Compilation
Linux ARM64
# Install cross-compilation tools
sudo apt install gcc-aarch64-linux-gnu
rustup target add aarch64-unknown-linux-gnu
# Build
cargo build --release --package tealeaf-ffi --target aarch64-unknown-linux-gnu
Windows ARM64
rustup target add aarch64-pc-windows-msvc
cargo build --release --package tealeaf-ffi --target aarch64-pc-windows-msvc
macOS (from any platform via cross)
# Using cross (https://github.com/cross-rs/cross)
cargo install cross
cross build --release --package tealeaf-ffi --target aarch64-apple-darwin
cross build --release --package tealeaf-ffi --target x86_64-apple-darwin
Linking
Dynamic Linking
# GCC/Clang
gcc -o myapp myapp.c -L/path/to/lib -ltealeaf_ffi
# MSVC
cl myapp.c /link tealeaf_ffi.lib
At runtime, ensure the dynamic library is in the library search path.
Static Linking
# GCC/Clang (Linux)
gcc -o myapp myapp.c /path/to/libtealeaf_ffi.a -lpthread -ldl -lm
# macOS
gcc -o myapp myapp.c /path/to/libtealeaf_ffi.a -framework Security -lpthread
Static linking eliminates the runtime dependency but produces a larger binary.
Dependencies
The FFI crate has minimal dependencies:
[dependencies]
tealeaf-core = { workspace = true }
[build-dependencies]
cbindgen = "0.27"
The resulting library links against:
- Linux:
libpthread,libdl,libm - macOS:
Security.framework,libpthread - Windows: standard Windows system libraries
Writing New Language Bindings
To create bindings for a new language:
- Generate or write FFI declarations matching the C header
- Load the dynamic library (or link statically)
- Wrap opaque pointers in your language’s resource management (destructors,
Dispose,__del__, etc.) - Map the error model – check for NULL returns and call
tl_get_last_error - Handle string ownership – copy strings to your language’s string type, then free the C string
Example: Python (ctypes)
import ctypes
lib = ctypes.CDLL("libtealeaf_ffi.so")
# Define function signatures
lib.tl_parse.restype = ctypes.c_void_p
lib.tl_parse.argtypes = [ctypes.c_char_p]
lib.tl_document_get.restype = ctypes.c_void_p
lib.tl_document_get.argtypes = [ctypes.c_void_p, ctypes.c_char_p]
lib.tl_value_as_string.restype = ctypes.c_char_p
lib.tl_value_as_string.argtypes = [ctypes.c_void_p]
# Use it
doc = lib.tl_parse(b"name: alice")
val = lib.tl_document_get(doc, b"name")
name = lib.tl_value_as_string(val)
print(name.decode()) # "alice"
lib.tl_value_free(val)
lib.tl_document_free(doc)
Testing
# Run FFI tests
cargo test --package tealeaf-ffi
# Run all workspace tests
cargo test --workspace
LLM Context Engineering
TeaLeaf’s primary use case is context engineering for Large Language Model applications. This guide explains why and how.
The Problem
LLM context windows are limited and expensive. Typical structured data (tool definitions, conversation history, user profiles) consumes tokens proportional to format verbosity:
{
"messages": [
{"role": "user", "content": "Hello", "tokens": 2},
{"role": "assistant", "content": "Hi there!", "tokens": 3},
{"role": "user", "content": "What's the weather?", "tokens": 5},
{"role": "assistant", "content": "Let me check...", "tokens": 4}
]
}
Every message repeats "role", "content", "tokens". With 50+ messages, this overhead adds up.
The TeaLeaf Approach
@struct Message (role: string, content: string, tokens: int?)
messages: @table Message [
(user, Hello, 2),
(assistant, "Hi there!", 3),
(user, "What's the weather?", 5),
(assistant, "Let me check...", 4),
]
Field names defined once. Data is positional. For 50 messages, this saves ~40% in text size and ~80% in binary.
Context Assembly Pattern
Define Schemas for Your Context
@struct Tool (name: string, description: string, params: []string)
@struct Message (role: string, content: string, tokens: int?)
@struct UserProfile (id: int, name: string, preferences: []string)
system_prompt: """
You are a helpful assistant with access to the user's profile
and conversation history. Use the tools when appropriate.
"""
user: @table UserProfile [
(42, "Alice", ["concise_responses", "code_examples"]),
]
tools: @table Tool [
(search, "Search the web for information", ["query"]),
(calculate, "Evaluate a mathematical expression", ["expression"]),
(weather, "Get current weather for a location", ["city", "country"]),
]
history: @table Message [
(user, Hello, 2),
(assistant, "Hi there! How can I help?", 7),
]
Binary Caching
Compiled .tlbx files make excellent context caches:
#![allow(unused)]
fn main() {
use tealeaf::{TeaLeafBuilder, ToTeaLeaf};
// Build context document
let doc = TeaLeafBuilder::new()
.add_value("system_prompt", Value::String(system_prompt))
.add_vec("tools", &tools)
.add_vec("history", &messages)
.add("user", &user_profile)
.build();
// Cache as binary (fast to read back)
doc.compile("context_cache.tlbx", true)?;
// Later: load instantly from binary
let cached = tealeaf::Reader::open("context_cache.tlbx")?;
}
Sending to LLM
Convert to text for LLM consumption:
#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("context.tl")?;
let context_text = doc.to_tl_with_schemas();
// Send context_text as part of the prompt
}
Or convert specific sections:
#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("context.tl")?;
let json = doc.to_json()?;
// Use JSON for APIs that expect it
}
Size Comparison: Real-World Context
For a typical LLM context with 50 messages, 10 tools, and a user profile:
| Format | Approximate Size |
|---|---|
| JSON | ~15 KB |
| TeaLeaf Text | ~8 KB |
| TeaLeaf Binary | ~4 KB |
| TeaLeaf Binary (compressed) | ~3 KB |
Token savings are significant but less than byte savings. BPE tokenizers partially compress repeated JSON field names, so byte savings overstate token savings by 5-18 percentage points depending on data repetitiveness. For typical structured data, expect ~36% fewer data tokens (median), with savings increasing for larger and more structured datasets.
Token Comparison (verified via OpenAI tokenizer)
| Dataset | JSON tokens | TeaLeaf tokens | Savings |
|---|---|---|---|
| Healthcare records | 903 | 572 | 37% |
| Retail orders | 9,829 | 5,632 | 43% |
At the API level, prompt instructions are identical for both formats, diluting data-only savings (~36%) to ~30% of total input tokens.
Structured Outputs
LLMs can also produce TeaLeaf-formatted responses:
@struct Insight (category: string, finding: string, confidence: float)
analysis: @table Insight [
(revenue, "Q4 revenue grew 15% YoY", 0.92),
(churn, "Customer churn decreased by 3%", 0.87),
(forecast, "Projected 20% growth in Q1", 0.73),
]
This can then be parsed and processed programmatically:
#![allow(unused)]
fn main() {
let response = TeaLeaf::parse(&llm_output)?;
if let Some(Value::Array(insights)) = response.get("analysis") {
for insight in insights {
// Process each structured insight
}
}
}
Best Practices
- Define schemas for all structured context – tool definitions, messages, profiles
- Use
@tablefor arrays of uniform objects – conversation history, search results - Cache compiled binary for frequently-used context segments
- Use text format for LLM input – models understand the schema notation
- String deduplication helps when context has repetitive strings (roles, tool names)
- Separate static and dynamic context – compile static context once, merge at runtime
Benchmark Results
The accuracy-benchmark suite tests 12 tasks across 10 business domains on Claude Sonnet 4.5 and GPT-5.2:
- ~36% fewer data tokens compared to JSON (savings increase with larger datasets)
- No accuracy loss – scores within noise across all providers
- See the benchmark README for full methodology and results.
Schema Evolution
TeaLeaf takes a deliberately simple approach to schema evolution: when schemas change, recompile.
Design Philosophy
- No migration machinery – no schema versioning or compatibility negotiation
- Source file is master – the
.tlfile defines the current schema - Explicit over implicit – tuples require values for all fields
- Binary is a compiled artifact – regenerate it like you would a compiled binary
Compatible Changes
These changes do not require recompilation of existing binary files:
Rename Fields
Field data is stored positionally. Names are documentation only:
# Before
@struct user (name: string, email: string)
# After -- binary still works
@struct user (full_name: string, email_address: string)
Widen Types
Automatic safe widening when reading:
# Before: field was int8
@struct sensor (id: int8, reading: float32)
# After: widened to int32 -- readers auto-widen
@struct sensor (id: int, reading: float)
Widening path: int8 → int16 → int32 → int64, float32 → float64
Incompatible Changes
These changes require recompilation from the .tl source:
Add a Field
# Before
@struct user (id: int, name: string)
# After -- added email field
@struct user (id: int, name: string, email: string?)
Old binary files won’t have the new field. Recompile:
tealeaf compile users.tl -o users.tlbx
Remove a Field
# Before
@struct user (id: int, name: string, legacy_field: string)
# After -- removed legacy_field
@struct user (id: int, name: string)
Reorder Fields
Binary data is positional. Changing field order changes the meaning of stored data:
# Before
@struct point (x: int, y: int)
# After -- DON'T DO THIS without recompiling
@struct point (y: int, x: int)
Narrow Types
Narrowing (e.g., int64 → int8) can lose data:
# Before
@struct data (value: int64)
# After -- potential data loss
@struct data (value: int8)
Recompilation Workflow
When schemas change:
# 1. Edit the .tl source file
# 2. Validate
tealeaf validate data.tl
# 3. Recompile
tealeaf compile data.tl -o data.tlbx
# 4. Verify
tealeaf info data.tlbx
Migration Strategy
For applications that need to handle schema changes:
Approach 1: Version Keys
Use different top-level keys for different schema versions:
@struct user_v1 (id: int, name: string)
@struct user_v2 (id: int, name: string, email: string?, role: string)
# Old data
users_v1: @table user_v1 [(1, alice), (2, bob)]
# New data
users_v2: @table user_v2 [(3, carol, "carol@ex.com", admin)]
Approach 2: Application-Level Migration
Read old binary, transform in code, write new binary:
#![allow(unused)]
fn main() {
// Read old binary format
let old_doc = tealeaf::Reader::open("data_v1.tlbx")?;
// Transform
let new_doc = TeaLeafBuilder::new()
.add_vec("users", &migrate_users(&old_doc.get("users")?))
.build();
// Write new format
new_doc.compile("data_v2.tlbx", true)?;
}
Approach 3: Nullable Fields
Add new fields as nullable to maintain backward compatibility:
@struct user (
id: int,
name: string,
email: string?, # new field, nullable
phone: string?, # new field, nullable
)
Old data can have ~ for new fields. New data populates them.
Comparison with Other Formats
| Aspect | TeaLeaf | Protobuf | Avro |
|---|---|---|---|
| Schema location | Inline in data file | External .proto | Embedded in binary |
| Adding fields | Recompile | Compatible (field numbers) | Compatible (defaults) |
| Removing fields | Recompile | Compatible (skip unknown) | Compatible (skip) |
| Migration tool | None (recompile) | protoc | Schema registry |
| Complexity | Low | Medium | High |
TeaLeaf trades automatic evolution for simplicity. If your use case requires frequent schema changes across distributed systems, consider Protobuf or Avro.
Performance
Performance characteristics of TeaLeaf across different operations.
Size Efficiency
Benchmark Results
| Format | Small Object | 10K Points | 1K Users |
|---|---|---|---|
| JSON | 1.00x | 1.00x | 1.00x |
| Protobuf | 0.38x | 0.65x | 0.41x |
| MessagePack | 0.35x | 0.63x | 0.38x |
| TeaLeaf Binary | 3.56x | 0.15x | 0.47x |
Analysis
- Small objects: TeaLeaf has a 64-byte header overhead. For objects under ~200 bytes, JSON or MessagePack are more compact.
- Large arrays: TeaLeaf’s string deduplication and schema-based compression shine. For 10K+ records, TeaLeaf achieves 6-7x better compression than JSON.
- Medium datasets (1K records): TeaLeaf is competitive with Protobuf, with the advantage of embedded schemas.
Where Size Matters Most
| Scenario | Recommendation |
|---|---|
| < 100 bytes payload | Use MessagePack or raw JSON |
| 1-10 KB | TeaLeaf text or JSON (overhead amortized) |
| 10 KB - 1 MB | TeaLeaf binary with compression |
| > 1 MB | TeaLeaf binary with compression (best gains) |
Parse/Decode Speed
TeaLeaf’s dynamic key-based access is ~2-5x slower than Protobuf’s generated code:
| Operation | TeaLeaf | Protobuf | JSON (serde) |
|---|---|---|---|
| Parse text | Moderate | N/A | Fast |
| Decode binary | Moderate | Fast | N/A |
| Random key access | O(1) hash | O(1) field | O(n) parse |
| Full iteration | Moderate | Fast | Fast |
Why TeaLeaf Is Slower Than Protobuf
- Dynamic dispatch – TeaLeaf resolves fields by name at runtime; Protobuf uses generated code with known offsets
- String table lookup – each string access requires a table lookup
- Schema resolution – schema structure is parsed from binary at load time
When This Matters
- Hot loops decoding millions of records → consider Protobuf
- Cold reads or moderate throughput → TeaLeaf is fine
- Size-constrained transmission → TeaLeaf’s smaller binary compensates for slower decode
Memory-Mapped Reading
For large binary files, use memory-mapped I/O:
#![allow(unused)]
fn main() {
// Rust
let reader = Reader::open_mmap("large_file.tlbx")?;
}
// .NET
using var reader = TLReader.OpenMmap("large_file.tlbx");
Benefits:
- No upfront allocation – data loaded on demand by the OS
- Shared pages – multiple processes can read the same file
- Lazy loading – only accessed sections are read from disk
Compilation Performance
Compiling .tl to .tlbx:
| Input Size | Compile Time (approximate) |
|---|---|
| 1 KB | < 1 ms |
| 100 KB | ~10 ms |
| 1 MB | ~100 ms |
| 10 MB | ~1 second |
Compression adds ~20-50% to compile time but can reduce output size by 50-90%.
Optimization Tips
1. Use Schemas for Tabular Data
Schema-bound @table data gets optimal encoding:
- Positional storage (no field name repetition)
- Null bitmaps (1 bit per nullable field vs full null markers)
- Type-homogeneous arrays
2. Enable Compression for Large Files
Compression is most effective for:
- Sections larger than 64 bytes
- Data with repeated string values
- Numeric arrays with patterns
tealeaf compile data.tl -o data.tlbx # compression on by default
3. Use Binary Format for Storage
Text is for authoring; binary is for storage and transmission:
Text (.tl) → Author, review, version control
Binary (.tlbx) → Deploy, cache, transmit
4. Cache Compiled Binary
For data that’s read frequently but written rarely:
#![allow(unused)]
fn main() {
// Compile once
doc.compile("cache.tlbx", true)?;
// Read many times (fast)
let reader = Reader::open_mmap("cache.tlbx")?;
}
5. Minimize String Diversity
String deduplication works best when values repeat:
- Enum-like fields (
"active","inactive") → deduplicated - UUIDs or timestamps → each is unique, no deduplication benefit
6. Use the Right Integer Sizes
The writer auto-selects the smallest representation, but schema types guide encoding:
@struct sensor (
id: uint16, # 2 bytes instead of 4
reading: float32, # 4 bytes instead of 8
flags: uint8, # 1 byte instead of 4
)
Round-Trip Fidelity
Understanding which conversion paths preserve data perfectly and where information can be lost.
Round-Trip Matrix
| Path | Data Preserved | Lost |
|---|---|---|
.tl → .tlbx → .tl | All data and schemas | Comments, formatting |
.tl → .json → .tl | Basic types (string, number, bool, null, array, object) | Schemas, comments, refs, tags, maps, timestamps, bytes |
.tl → .tlbx → .json | Same as .tl → .json | Same losses |
.json → .tl → .json | All JSON-native types | (generally lossless) |
.json → .tlbx → .json | All JSON-native types | (generally lossless) |
.tlbx → .tlbx (recompile) | All data | (lossless) |
Lossless: Text ↔ Binary
The text-to-binary-to-text round-trip preserves all data and schema information:
tealeaf compile original.tl -o compiled.tlbx
tealeaf decompile compiled.tlbx -o roundtrip.tl
tealeaf compile roundtrip.tl -o roundtrip.tlbx
# compiled.tlbx and roundtrip.tlbx contain equivalent data
What’s lost:
- Comments (stripped during compilation)
- Whitespace and formatting
- The decompiled output may have different formatting than the original
What’s preserved:
- All schemas (
@structdefinitions) - All values (every type)
- Key ordering
- Schema-typed data (table structure)
Lossy: TeaLeaf → JSON
JSON cannot represent all TeaLeaf types. The following conversions are one-way:
Timestamps → Strings
created: 2024-01-15T10:30:00Z
JSON output:
{"created": "2024-01-15T10:30:00.000Z"}
Reimporting: the ISO 8601 string becomes a plain String, not a Timestamp.
Maps → Arrays
headers: @map {200: "OK", 404: "Not Found"}
JSON output:
{"headers": [[200, "OK"], [404, "Not Found"]]}
Reimporting: becomes a plain nested array, not a Map.
References → Objects
!ref: {x: 1, y: 2}
point: !ref
JSON output:
{"point": {"$ref": "ref"}}
Reimporting: becomes a plain object with $ref key, not a Ref.
Tagged Values → Objects
event: :click {x: 100, y: 200}
JSON output:
{"event": {"$tag": "click", "$value": {"x": 100, "y": 200}}}
Reimporting: becomes a plain object, not a Tagged.
Bytes → Hex Strings (JSON only)
Bytes round-trip losslessly within TeaLeaf text format using b"..." literals:
data: b"cafef00d"
However, JSON export converts bytes to hex strings:
{"data": "0xcafef00d"}
Reimporting from JSON: becomes a plain string, not bytes.
Schemas → Lost
@struct user (id: int, name: string)
users: @table user [(1, alice), (2, bob)]
JSON output:
{"users": [{"id": 1, "name": "alice"}, {"id": 2, "name": "bob"}]}
The @struct definition is not represented in JSON. However, from-json can re-infer schemas from uniform arrays.
Bytes and Text Format
Bytes now round-trip losslessly through text format using the b"..." literal:
Binary (bytes value) → Decompile → Text (b"..." literal) → Compile → Binary (bytes value)
The decompiler emits b"cafef00d" for bytes values, and the parser reads them back as Value::Bytes.
Ensuring Lossless Round-Trips
Use Binary for Storage
If you need to preserve all TeaLeaf types (refs, tags, maps, timestamps, bytes), keep data in .tlbx:
# Lossless cycle
tealeaf compile data.tl -o data.tlbx
tealeaf decompile data.tlbx -o data.tl
# data.tl preserves all types (except comments)
Use JSON Only for Interop
JSON conversion is for integrating with JSON-based tools. Don’t use it as a primary storage format if your data uses TeaLeaf-specific types.
Verify with CLI
# Compile → JSON two ways, compare
tealeaf to-json data.tl -o from_text.json
tealeaf compile data.tl -o data.tlbx
tealeaf tlbx-to-json data.tlbx -o from_binary.json
# from_text.json and from_binary.json should be identical
Type Preservation Summary
| TeaLeaf Type | Binary Round-Trip | JSON Round-Trip |
|---|---|---|
| Null | Lossless | Lossless |
| Bool | Lossless | Lossless |
| Int | Lossless | Lossless |
| UInt | Lossless | Lossless (as number) |
| Float | Lossless | Lossless |
| String | Lossless | Lossless |
| Bytes | Lossless | Lossy (→ hex string) |
| Array | Lossless | Lossless |
| Object | Lossless | Lossless |
| Map | Lossless | Lossy (→ array of pairs) |
| Ref | Lossless | Lossy (→ $ref object) |
| Tagged | Lossless | Lossy (→ $tag/$value object) |
| Timestamp | Lossless | Lossy (→ ISO 8601 string) |
| Schemas | Lossless | Lost (re-inferred on import) |
| Comments | Lost (stripped) | Lost |
Architecture Decision Records
This section documents significant architecture decisions made in the TeaLeaf project. Each record captures the context, decision, and consequences of a choice that affects the project’s design or implementation.
ADR Index
| ADR | Title | Status | Date |
|---|---|---|---|
| ADR-0001 | Use IndexMap for Insertion Order Preservation | Accepted | 2026-02-05 |
| ADR-0002 | Fuzzing Architecture and Strategy | Accepted | 2026-02-06 |
| ADR-0003 | Maximum Nesting Depth Limit (256) | Accepted | 2026-02-06 |
| ADR-0004 | ZLIB Compression for Binary Format | Accepted | 2026-02-06 |
What is an ADR?
An Architecture Decision Record (ADR) is a short document that captures an important architectural decision along with its context and consequences. ADRs help future contributors understand why certain design choices were made, not just what was built.
ADR Lifecycle
Each ADR has one of the following statuses:
- Proposed — Under discussion, not yet implemented
- Accepted — Approved and implemented (or in progress)
- Superseded — Replaced by a newer ADR (linked in the record)
- Deprecated — No longer applicable due to project changes
ADR-0001: Use IndexMap for Insertion Order Preservation
- Status: Accepted
- Date: 2026-02-05
- Applies to: tealeaf-core, tealeaf-derive, tealeaf-ffi
Context
TeaLeaf’s primary use case is context engineering for LLM applications, where structured data passes through multiple format conversions (JSON → .tl → .tlbx and back). Users intentionally order their JSON keys to convey semantic meaning — for example, placing name before description before details to mirror how a human would read the document. Prior to this change, all user-facing maps used HashMap<K, V>, and the text serializer and binary writer explicitly sorted keys alphabetically before output.
This caused two problems:
-
Semantic ordering was lost. A user who wrote
{"zebra": 1, "apple": 2}in their JSON would get{"apple": 2, "zebra": 1}after a round-trip through TeaLeaf. For LLM prompt engineering, this reordering could change how models interpret the context. -
Sorting was unnecessary work. Every serialization path (
dumps(),compile(),write_value(),to_tl_with_schemas()) collected keys into aVec, sorted them, and then iterated — adding O(n log n) overhead to every output operation.
Alternatives Considered
| Approach | Pros | Cons |
|---|---|---|
| Keep HashMap + sort (status quo) | Deterministic output, no dependency change | Loses user intent, sorting overhead |
| Vec of (key, value) pairs | Order preserved, no new dependency | Loses O(1) key lookup, breaks API surface broadly |
| IndexMap | Order preserved, O(1) lookup, drop-in API | Slightly slower decode (insertion cost), new dependency |
| BTreeMap | Sorted + deterministic | Still not insertion-ordered, lookup O(log n) |
Decision
Replace HashMap with IndexMap (from the indexmap crate v2) in all user-facing ordered containers:
Value::Object→ObjectMap<String, Value>(type alias forIndexMap)TeaLeaf.data,TeaLeaf.schemas,TeaLeaf.unions→IndexMap<String, _>Parseroutput,Reader.sections, trait return types →IndexMap
Internal lookup tables stay as HashMap because they don’t need ordering:
Writer.string_map,Writer.schema_map,Writer.union_mapReader.schema_map,Reader.union_map,Reader.cache
Additionally:
- Enable
serde_json’spreserve_orderfeature so JSON parsing also preserves key order - Remove all explicit
keys.sort()calls from serialization paths - Re-export
IndexMapandObjectMapfromtealeaf-coreso derive macros and downstream crates don’t need a directindexmapdependency
Consequences
Positive
- Round-trip fidelity. JSON → TeaLeaf → JSON now preserves the original key order at every level (sections, object fields, schema definitions).
- Encoding is faster. Removing O(n log n) sort calls from every serialization path yields measurable improvements in encode benchmarks (6–17% for small/medium objects).
- Simpler serialization code. Serialization loops iterate the map directly instead of collecting-sorting-iterating.
- Binary format is unchanged. Old
.tlbxfiles remain fully readable. The reader always produces keys in file order, which for old files happens to be alphabetical.
Negative
-
Binary decode is slower.
IndexMap::insert()is slower thanHashMap::insert()because it maintains a dense insertion-order array alongside the hash table. Benchmarks show +56% to +105% regression for decode-heavy workloads (large arrays of objects, deeply nested structs). For the primary use case (LLM context), this is acceptable because:- Documents are typically encoded once and consumed as text (not repeatedly decoded from binary)
- The absolute times remain in the microsecond-to-millisecond range
- Encode performance (the more common hot path) improved
-
New dependency.
indexmapv2 is a well-maintained, widely-used crate (used byserde_jsoninternally), so supply-chain risk is minimal. -
Public API change.
TeaLeaf::new()now takesIndexMapinstead ofHashMap. This is a breaking change, mitigated by:- The project is in beta (
2.0.0-beta.2) From<HashMap<String, Value>> for Valueconversion is retained for backward compatibility- Downstream code using
.get(),.insert(),.iter()works identically
- The project is in beta (
Benchmark Summary
| Workload | Encode | Decode |
|---|---|---|
| small_object | -16% (faster) | — |
| nested_structs | -10% to -17% (faster) | +56% to +68% (slower) |
| large_array_10000 | -5% (faster) | +105% (slower) |
| tabular_5000 | -69% (faster) | -48% (faster) |
Note: Tabular workloads use struct-array encoding (columnar), which has fewer per-row
IndexMapinsertions. The decode regression is concentrated in generic object decoding where each row creates a newObjectMapwith field-by-field inserts.
References
indexmapcrate documentation- serde_json
preserve_orderfeature - Implementation PR: HashMap → IndexMap migration across 16+ files
ADR-0002: Fuzzing Architecture and Strategy
- Status: Accepted
- Date: 2026-02-06
- Applies to: tealeaf-core
Context
TeaLeaf is a data format with multiple serialization paths: text parsing, text serialization, binary compilation, binary reading, and JSON import/export. Each path accepts untrusted input in production scenarios (user-supplied .tl files, .tlbx binaries, JSON strings from external APIs). Malformed or adversarial input must never cause undefined behavior, panics in non-roundtrip code paths, or memory safety violations.
The project already had unit tests, canonical fixture tests, and adversarial tests (hand-crafted malformed inputs). However, these approaches have inherent limitations:
-
Unit/fixture tests are author-biased. They test cases the developer thought of, missing emergent edge cases from format interactions (e.g., deeply nested structures with unicode escapes inside hex-prefixed numbers).
-
Adversarial tests are finite. The hand-crafted corpus in
adversarial-tests/covers known attack patterns but cannot explore the combinatorial input space. -
Round-trip fidelity is hard to test exhaustively. The property “serialize then parse produces the same value” requires testing across all
Valuevariants, nesting depths, and string content — a space too large for manual enumeration.
Alternatives Considered
| Approach | Pros | Cons |
|---|---|---|
| Property-based testing (proptest/quickcheck) | Integrated into cargo test, structure-aware | Limited mutation depth, no coverage feedback, deterministic |
| AFL++ | Mature, multiple mutation strategies | Requires instrumentation harness, harder CI integration on GitHub Actions |
| cargo-fuzz (libFuzzer) | Native Rust support, coverage-guided, dictionary support, easy CI | Requires nightly toolchain, Linux-only |
| Honggfuzz | Hardware-assisted coverage | Less Rust ecosystem integration, complex setup |
Decision
Use cargo-fuzz (libFuzzer) with a three-layer fuzzing strategy:
Layer 1: Byte-level fuzzing (6 targets)
Coverage-guided mutation of raw bytes, testing each attack surface independently:
| Target | Input | Tests |
|---|---|---|
fuzz_parse | Raw bytes as TL text | Parser robustness against arbitrary byte sequences |
fuzz_serialize | Raw bytes as TL text | Parse then re-serialize roundtrip fidelity |
fuzz_roundtrip | Raw bytes as TL text | Full text → parse → serialize → re-parse → value equality |
fuzz_reader | Raw bytes as .tlbx binary | Binary reader robustness against malformed files |
fuzz_json | Raw bytes as JSON string | JSON import → TL export → re-import roundtrip |
fuzz_json_schemas | Raw bytes as JSON string | JSON import with schema inference → roundtrip |
Layer 2: Dictionary-guided fuzzing
libFuzzer dictionaries provide grammar-aware tokens that seed the mutation engine, dramatically improving coverage for structured formats where random bytes rarely produce valid syntax:
| Dictionary | Used by | Key tokens |
|---|---|---|
tl.dict | fuzz_parse, fuzz_serialize, fuzz_roundtrip | Keywords (true, false, null, NaN, inf), directives (@struct, @table, @union), type names, escape sequences, boundary numbers, timestamp patterns |
json.dict | fuzz_json, fuzz_json_schemas | JSON delimiters, escape sequences, surrogate pair markers, serde_json magic strings, boundary numbers |
Measured coverage impact (30-second fresh corpus):
| Target | Without dict | With dict | Improvement |
|---|---|---|---|
fuzz_parse | 1790 edges | 1922 edges | +7.4% |
fuzz_json | 1339 edges | 1533 edges | +14.5% |
Layer 3: Structure-aware fuzzing (1 target)
The fuzz_structured target bypasses the parser entirely, generating valid Value trees directly from fuzzer bytes using the arbitrary crate. This tests serialization and binary compilation paths with guaranteed-valid inputs that would take byte-level fuzzers much longer to discover:
- Bounded recursion (max depth 3) prevents stack overflow
- 13 Value variants including
JsonNumber,Tagged,Ref,Map,Bytes - Three roundtrip tests per invocation: text serialize/parse, binary compile/read, JSON no-panic
- Reaches 2464 coverage edges in just 733 runs (vs thousands of runs for byte-level targets)
Fuzz infrastructure layout
tealeaf-core/fuzz/
Cargo.toml # Fuzz workspace with libfuzzer-sys + arbitrary
fuzz_targets/
fuzz_parse.rs # Layer 1: text parser robustness
fuzz_serialize.rs # Layer 1: text roundtrip
fuzz_roundtrip.rs # Layer 1: full text roundtrip with value equality
fuzz_reader.rs # Layer 1: binary reader robustness
fuzz_json.rs # Layer 1: JSON import roundtrip
fuzz_json_schemas.rs # Layer 1: JSON with schema inference roundtrip
fuzz_structured.rs # Layer 3: structure-aware value generation
dictionaries/
tl.dict # Layer 2: TL text format tokens
json.dict # Layer 2: JSON format tokens
corpus/ # Persistent corpus (per-target subdirectories)
artifacts/ # Crash artifacts (per-target subdirectories)
CI integration
Fuzz targets run on GitHub Actions ubuntu-latest (2-core, 7 GB RAM) with the following constraints:
- 120 seconds per target (coverage saturates within ~30 seconds; 120s provides buffer for deeper exploration)
- Serial execution — targets run one at a time to avoid memory pressure (each can use up to 512 MB RSS)
- RSS limit: 512 MB per target
- Dictionary-guided runs for text and JSON targets
- Nightly Rust toolchain required (libFuzzer instrumentation)
- Total wall time: ~15 minutes (7 targets × 120s + build overhead)
Value equality semantics
All roundtrip targets use a custom values_equal() function rather than PartialEq to handle expected coercions:
Int(n)==UInt(n)whenn >= 0(sign-agnostic integer comparison)JsonNumber(s)==Int(i)whensparses toi(precision-preserving numbers may roundtrip as integers if they fit)Floatcomparison usesto_bits()for exact bit-level equality (distinguishes+0.0from-0.0, handles NaN)
Consequences
Positive
- Discovered real bugs. Fuzzing found a NaN quoting bug (
NaNroundtripped asFloat(NaN)instead of being preserved through text format) and the precision loss that motivatedValue::JsonNumber. - Continuous regression detection. CI runs catch regressions in parser/serializer correctness automatically on every push.
- Coverage-guided exploration. libFuzzer’s coverage feedback explores code paths that hand-written tests miss, particularly in error handling and edge case branches.
- Dictionary tokens accelerate exploration. Measured 7-14% coverage improvement with dictionaries, at zero runtime cost (dictionaries only seed the mutation engine).
- Structure-aware fuzzing tests serializer independently. By generating valid
Valuetrees directly,fuzz_structuredachieves deep serializer coverage without depending on parser correctness.
Negative
- Nightly Rust toolchain required.
cargo-fuzzrequires nightly for-Zflags and sanitizer instrumentation. This is isolated to the fuzz workspace and does not affect the main build. - Linux-only. libFuzzer doesn’t support Windows natively. Local fuzzing requires WSL on Windows; CI uses Ubuntu runners.
- CI time cost. ~15 minutes per run. Acceptable for a post-push check; not suitable for pre-commit.
- Corpus growth. The persistent corpus grows over time as new coverage-increasing inputs are discovered. Periodic corpus minimization (
cargo fuzz cmin) is recommended.
Not covered
- Protocol-level fuzzing. The FFI boundary (
tealeaf-ffi) is not fuzzed directly. FFI functions are thin wrappers around the core library, which is fuzzed. - .NET binding fuzzing. The .NET layer is tested through its own test suite and the adversarial harness, but not through libFuzzer.
- Concurrency testing. All fuzz targets are single-threaded. Thread-safety of
Reader(which usesmmap) is tested separately.
References
- cargo-fuzz documentation
- libFuzzer documentation
- libFuzzer dictionary format
- arbitrary crate for structure-aware fuzzing
ADR-0003: Maximum Nesting Depth Limit (256)
- Status: Accepted
- Date: 2026-02-06
- Applies to: tealeaf-core (parser, binary reader)
Context
TeaLeaf accepts untrusted input in production — user-supplied .tl files, .tlbx binaries from external sources, and JSON strings from APIs. Recursive data structures (arrays, objects, maps, tagged values) create call stacks proportional to input nesting depth. Without a limit, an attacker can craft a payload like key: [[[[... with thousands of levels, causing a stack overflow and process termination.
Two constants enforce the limit:
| Constant | File | Value |
|---|---|---|
MAX_PARSE_DEPTH | parser.rs | 256 |
MAX_DECODE_DEPTH | reader.rs | 256 |
Both constants are set to the same value to ensure text-binary parity: any document that parses successfully from .tl text can also round-trip through .tlbx binary without hitting a different depth ceiling.
The limit is checked at every recursive entry point:
- Parser:
parse_value()— arrays, objects, maps, tuples, tagged values - Reader:
decode_value(),decode_array(),decode_object(),decode_struct(),decode_struct_array(),decode_map()
When exceeded, both paths return a descriptive error rather than panicking or overflowing the stack.
Ecosystem Comparison
| Parser / Library | Default Max Depth | Configurable? |
|---|---|---|
| TeaLeaf | 256 | No (compile-time constant) |
| serde_json (Rust) | 128 | Yes (disable_recursion_limit) |
| serde_yaml (Rust) | 128 | No |
| System.Text.Json (.NET) | 64 | Yes (MaxDepth) |
| ASP.NET Core (default) | 32 | Yes |
| Jackson (Java) | 1000 (v2), 500 (v3) | Yes |
| Go encoding/json | 10,000 | No |
| Python json (stdlib) | ~1,000 (interpreter limit) | Via sys.setrecursionlimit |
| Protocol Buffers (Java/C++) | 100 | Yes |
| Protocol Buffers (Go) | 10,000 | Yes |
| rmp-serde (MessagePack) | 1,024 | Yes |
| CBOR (ciborium, Rust) | 128 | Yes |
| toml (Rust) | None | No (vulnerable to stack overflow) |
Observations
- Conservative defaults are trending down. Jackson reduced from 1,000 to 500 in v3. .NET defaults to 64. Protocol Buffers targets 100.
- 128 is the most common Rust ecosystem default (serde_json, serde_yaml, ciborium).
- No production data format needs > 100 levels. Deeply nested structures indicate either machine-generated intermediate representations or adversarial input.
- Formats without limits have CVEs. The toml crate’s lack of depth limiting is tracked as an open issue. Python’s reliance on interpreter limits has caused production crashes.
Decision
Set MAX_PARSE_DEPTH and MAX_DECODE_DEPTH to 256.
Why 256 over 128?
TeaLeaf schemas add implicit nesting. A @struct with an array of @struct-typed objects creates 3 levels of nesting (object → array → object) for what the user perceives as one level of structure. With schema compositions, 128 could be reached in complex but legitimate documents. 256 provides a 2x margin above the Rust ecosystem default while remaining well within safe stack bounds.
Why not configurable?
- Simplicity. A compile-time constant is zero-cost at runtime (no configuration plumbing, no state to manage).
- Consistent behavior. All TeaLeaf implementations (Rust, FFI, .NET) enforce the same limit. A configurable limit would require coordination across language boundaries.
- 256 is generous enough. No known use case requires deeper nesting. If a legitimate need arises, the constant can be bumped in a patch release without breaking any public API.
Stack safety margin
On x86-64 Linux with the default 8 MB stack, each recursive call uses roughly 200–400 bytes of stack frame. At 256 depth, the worst case is ~100 KB — well under 2% of the available stack. This leaves ample room for the caller’s own stack frames and for platforms with smaller stacks (e.g., 1 MB thread stacks).
Test Coverage
| Test | Location | What it verifies |
|---|---|---|
test_parse_depth_256_succeeds | parser.rs | 200-level nesting parses successfully |
test_fuzz_deeply_nested_arrays_no_stack_overflow | parser.rs | 500-level nesting returns error (no crash) |
parse_deep_nesting_ok | adversarial.rs | 7-level nesting succeeds in adversarial harness |
fuzz_structured depth=3 | fuzz_structured.rs | Structure-aware fuzzer bounds depth to 3 |
canonical/large_data.tl | Canonical suite | Deep nesting fixture round-trips correctly |
Consequences
Positive
- Stack overflow protection. Malicious or malformed input with extreme nesting is rejected with a clear error message instead of crashing the process.
- Text-binary parity. The same limit in parser and reader means any document that parses from text will also decode from binary, and vice versa.
- Predictable resource usage. Callers can reason about maximum stack consumption without inspecting input.
Negative
- Theoretical limitation. Documents with more than 256 levels of nesting are rejected. In practice, no known data format use case requires this depth.
- Not configurable. Users who need deeper nesting must rebuild from source with a modified constant. This is an intentional trade-off for simplicity.
Neutral
- No performance cost. The depth check is a single integer comparison per recursive call — unmeasurable relative to the cost of decoding a value.
ADR-0004: ZLIB Compression for Binary Format
- Status: Accepted
- Date: 2026-02-06
- Applies to: tealeaf-core (writer, reader), spec §4.3 and §4.9
Context
The .tlbx binary format compresses individual sections to reduce file size. The implementation has always used ZLIB (deflate) via the flate2 crate. However, the spec contained a contradiction:
- §4.3 (Header Flags) described the COMPRESS flag as indicating “zstd compression” and required readers to detect compression via the zstd frame magic (
0xFD2FB528). - §4.9 (Compression) correctly stated the algorithm as “ZLIB (deflate)”.
This contradiction meant a third-party implementation following §4.3 would look for zstd-compressed data that doesn’t exist, while one following §4.9 would work correctly. The spec needed a single, definitive answer.
Decision
Standardize on ZLIB (deflate) as the sole compression algorithm for .tlbx binary format v2.
Why not zstd?
zstd is a superior algorithm in general-purpose benchmarks, but TeaLeaf’s design neutralizes its advantages:
-
String deduplication removes the most compressible data. The string table deduplicates all strings before compression runs. What remains for the compressor is packed integers, null bitmaps, and string table indices — low-entropy binary data with little redundancy.
-
Sections are small. The compression threshold is 64 bytes. Most sections are a few hundred bytes to a few KB. At these sizes, zlib and zstd achieve nearly identical compression ratios without dictionaries.
-
zstd’s dictionary mode doesn’t help here. Dictionary compression — where zstd’s largest advantage lies for small payloads — requires pre-training on representative data. TeaLeaf documents are schema-variable and content-diverse (the primary use case is LLM context engineering with arbitrary structured data). A static dictionary would not generalize across different schemas and data shapes.
-
The 90% threshold filters aggressively. Sections that don’t compress to under 90% of their original size are stored uncompressed. This threshold means most small sections aren’t compressed at all, making the algorithm choice irrelevant for the majority of sections.
-
Decompression speed is irrelevant at this scale. zstd decompresses 3-5x faster than zlib, but a few-hundred-byte section decompresses in microseconds with either algorithm. The difference is unmeasurable in practice.
Why zlib?
-
Universal availability. ZLIB/deflate is implemented in every language’s standard library or a widely-available package. zstd requires an additional native dependency in most ecosystems.
-
No breaking change. Every
.tlbxfile ever produced uses zlib. Switching would require either a format version bump (breaking all existing files) or dual-algorithm detection logic (complexity for every implementation). -
Simpler for third-party implementations. One algorithm, no magic-byte detection, no conditional dependency. A conformant reader needs only zlib decompression.
-
Compression is not the primary size reduction strategy. TeaLeaf’s token efficiency comes from the text format’s conciseness and the binary format’s schema-aware encoding (struct arrays, string deduplication, type-specific packing). Compression is a secondary optimization applied on top.
Spec Changes
| Section | Before | After |
|---|---|---|
| §4.3 (Header Flags) | “zstd compression”, “zstd frame magic” | “ZLIB (deflate) compression”, per-section flag detection |
| §4.9 (Compression) | Already correct (“ZLIB (deflate)”) | No change |
Consequences
Positive
- Spec is internally consistent. §4.3 and §4.9 now agree on ZLIB.
- Third-party interop is unambiguous. Implementers need one algorithm, clearly documented.
- No migration required. All existing
.tlbxfiles remain valid.
Negative
- Foregoes zstd’s speed advantage. In workloads with large sections (tens of KB+), zstd would decompress faster. TeaLeaf’s current section sizes don’t reach this threshold.
Neutral
- Future versions can reconsider. If TeaLeaf v3 introduces large-section use cases (e.g., embedded binary blobs), zstd could be adopted with a format version bump. This ADR applies to binary format v2 only.
Architecture
High-level architecture of the TeaLeaf project.
Crate Structure
tealeaf/
├── tealeaf-core/ # Core library + CLI
│ ├── src/
│ │ ├── main.rs # CLI entry point
│ │ ├── lib.rs # Public API (TeaLeaf, Value, Schema, traits)
│ │ ├── reader.rs # Binary file reader
│ │ ├── writer.rs # Binary file writer (compiler)
│ │ ├── builder.rs # TeaLeafBuilder fluent API
│ │ └── convert.rs # ToTeaLeaf/FromTeaLeaf trait impls for primitives
│ └── tests/
│ ├── canonical.rs # Canonical fixture tests
│ └── derive.rs # Derive macro tests
│
├── tealeaf-derive/ # Proc-macro crate
│ ├── lib.rs # Macro entry points
│ ├── attrs.rs # Attribute parsing
│ ├── to_tealeaf.rs # ToTeaLeaf derive implementation
│ ├── from_tealeaf.rs # FromTeaLeaf derive implementation
│ ├── schema.rs # Schema generation logic
│ └── util.rs # Shared utilities
│
├── tealeaf-ffi/ # C FFI layer
│ ├── src/lib.rs # All FFI exports
│ └── build.rs # cbindgen header generation
│
├── bindings/dotnet/ # .NET bindings
│ ├── TeaLeaf.Annotations/ # Attribute definitions
│ ├── TeaLeaf.Generators/ # Source generator
│ ├── TeaLeaf/ # Managed wrappers + serializer
│ └── TeaLeaf.Tests/ # Test project
│
├── canonical/ # Canonical test fixtures
│ ├── samples/ # .tl text files
│ ├── expected/ # Expected .json outputs
│ ├── binary/ # Pre-compiled .tlbx files
│ └── errors/ # Invalid files for error testing
│
└── spec/ # Format specification
└── TEALEAF_SPEC.md
Data Flow
Parse Pipeline
Text input (.tl)
│
▼
Lexer → Token stream
│
▼
Parser → AST (directives + key-value pairs)
│
├── Schema definitions → IndexMap<String, Schema>
├── Reference definitions → resolved inline
└── Key-value pairs → IndexMap<String, Value>
│
▼
TeaLeaf { schemas, data }
Compile Pipeline
TeaLeaf { schemas, data }
│
▼
String collector → String table (deduplicated)
│
▼
Schema encoder → Schema table (binary)
│
▼
Value encoder → Data sections (per key)
│ │
│ ├── Primitives → fixed-size encoding
│ ├── Strings → string table index (u32)
│ ├── Struct arrays → null bitmap + positional values
│ └── Other → type-tagged encoding
│
▼
Compressor (per section, if > 64 bytes)
│
▼
Writer → .tlbx file
├── Header (64 bytes)
├── String table
├── Schema table
├── Section index
└── Data sections
Read Pipeline
.tlbx file
│
▼
Reader (or MmapReader)
│
├── Header validation (magic, version)
├── String table → lazy access
├── Schema table → lazy access
└── Section index → key → offset mapping
│
▼
Value access (by key)
│
├── Locate section in index
├── Decompress if needed
├── Decode value by type code
└── Return Value enum
Key Design Decisions
Positional Schema Encoding
Field names appear only in the schema table. Data rows use position to identify fields. This trades readability of binary for compactness.
Per-Section Compression
Each top-level key is a separate section compressed independently. This allows:
- Random access without decompressing the entire file
- Selective decompression (only read sections you need)
Thread-Local Error Handling (FFI)
The FFI uses thread-local storage for error messages instead of out-parameters or exceptions. This simplifies the C API while remaining thread-safe.
Source Generator vs Reflection
The .NET binding offers both approaches because:
- Source generators produce optimal code but require
partialclasses - Reflection works with any type but is slower
- Both share the same native library for actual encoding/decoding
Insertion Order Preservation (IndexMap)
All user-facing maps use IndexMap instead of HashMap to preserve insertion order across format conversions. Internal lookup tables (string interning, schema/union resolution, caches) remain HashMap for performance. See ADR-0001 for the full decision record including benchmark impact.
No Schema Versioning
TeaLeaf deliberately avoids schema evolution machinery. The rationale:
- Simpler implementation and specification
- Source file is always the truth
- Recompilation is explicit and deterministic
- Applications that need evolution can layer it on top
Binary Encoding Details
Deep dive into how values are encoded in the .tlbx binary format.
Encoding Strategy
The encoder selects the encoding strategy based on value type and context:
Top-Level Values
Each top-level key-value pair becomes a section in the binary file. The section’s type code and flags determine how to decode it.
Primitive Encoding
| Type | Encoding | Size |
|---|---|---|
| Null | Nothing (type code alone) | 0 bytes |
| Bool | 0x00 or 0x01 | 1 byte |
| Int8 | Signed byte | 1 byte |
| Int16 | 2 bytes, little-endian | 2 bytes |
| Int32 | 4 bytes, little-endian | 4 bytes |
| Int64 | 8 bytes, little-endian | 8 bytes |
| UInt8-64 | Same as signed, unsigned | 1-8 bytes |
| Float32 | IEEE 754, little-endian | 4 bytes |
| Float64 | IEEE 754, little-endian | 8 bytes |
| String | u32 string table index | 4 bytes |
| Bytes | varint length + raw data | variable |
| Timestamp | i64 Unix ms + i16 tz offset (minutes), LE | 10 bytes |
Integer Size Selection
The writer automatically selects the smallest representation:
Value::Int(42) → Int8 (1 byte) // fits in i8
Value::Int(1000) → Int16 (2 bytes) // fits in i16
Value::Int(100000) → Int32 (4 bytes) // fits in i32
Value::Int(5×10⁹) → Int64 (8 bytes) // needs i64
Struct Array Encoding
The most optimized encoding path is for arrays of schema-typed objects:
┌──────────────────────┐
│ Count: u32 │ Number of rows
│ Schema Index: u16 │ Which schema these rows follow
│ Null Bitmap Size: u16│ Bytes per row for null tracking
├──────────────────────┤
│ Row 0: │
│ Null Bitmap: [u8] │ One bit per field (1 = null)
│ Field 0 data │ Only if not null
│ Field 1 data │ Only if not null
│ ... │
├──────────────────────┤
│ Row 1: │
│ Null Bitmap: [u8] │
│ Field data... │
├──────────────────────┤
│ ... │
└──────────────────────┘
Null Bitmap
- Size:
ceil((field_count + 7) / 8)bytes per row - Bit
iset = fieldiis null - Only non-null fields have data written
For a schema with 5 fields, the bitmap is 1 byte. If bit 2 is set, field 2 is null and its data is skipped.
Field Data
Each non-null field is encoded according to its schema type:
- Primitive types: fixed-size encoding
- String:
u32string table index - Nested struct: recursively encoded fields (with their own null bitmap)
- Array field: count + typed elements
Homogeneous Array Encoding
Top-level arrays use homogeneous (packed) encoding only for two types:
Integer Arrays (i32 only)
All elements must be Value::Int and fit within the i32 range (-2³¹ to 2³¹ - 1). Integer arrays where any value exceeds i32 fall through to heterogeneous encoding.
Count: u32
Element Type: 0x04 (Int32)
Elements: [i32 × Count] -- packed, no type tags
String Arrays
Count: u32
Element Type: 0x10 (String)
Elements: [u32 × Count] -- string table indices
All Other Top-Level Arrays
Arrays of UInt, Bool, Float, Timestamp, Int64 (values exceeding i32), and mixed-type arrays all use heterogeneous encoding (see below). This keeps the top-level format simple for third-party implementations.
Schema-Typed Field Arrays
Arrays within struct fields are a separate case — they use homogeneous encoding for their schema-declared type, regardless of the top-level restrictions:
Count: u32
Element Type: u8 (from schema field type)
Elements: [packed data]
Heterogeneous Array Encoding
For mixed-type arrays and all top-level arrays not covered by Int32/String homogeneous encoding:
Count: u32
Element Type: 0xFF (heterogeneous marker)
Elements: [
type: u8, data,
type: u8, data,
...
]
Each element carries its own type tag.
Object Encoding
Field Count: u16
Fields: [
key_idx: u32 (string table index)
type: u8 (value type code)
data: [...] (type-specific encoding)
]
Objects are the untyped key-value container. Unlike struct arrays, each field carries its name and type.
Map Encoding
Count: u32
Entries: [
key_type: u8, key_data: [...],
value_type: u8, value_data: [...],
]
Both keys and values carry type tags.
Reference Encoding
name_idx: u32 (string table index for the reference name)
A reference is just a string table pointer to the target name.
Tagged Value Encoding
tag_idx: u32 (string table index for the tag name)
value_type: u8 (type code of the inner value)
value_data: [...] (type-specific encoding of the inner value)
Varint Encoding
Used for bytes length:
Value: 300 (0x012C)
Encoded: 0xAC 0x02
Bit layout:
0xAC = 1_0101100 → continuation bit set, value bits: 0101100 (44)
0x02 = 0_0000010 → no continuation, value bits: 0000010 (2)
Result: 44 + (2 << 7) = 44 + 256 = 300
- Continuation bit:
0x80– if set, more bytes follow - 7 value bits per byte
- Least-significant group first
Compression
Applied per section:
- Check if uncompressed size > 64 bytes
- Compress with ZLIB (deflate)
- If compressed size < 90% of original, use compressed version
- Set compression flag in section index entry
- Store both
size(compressed) anduncompressed_sizein the index
String Table
The string table is a core component of the binary format that provides string deduplication.
Purpose
In a typical document with 1,000 user records, field values like "active", "Engineering", or city names repeat frequently. Without deduplication, each occurrence stores the full string. The string table stores each unique string once and uses 4-byte indices everywhere else.
Structure
┌─────────────────────────────┐
│ Total Size: u32 │ Size of the entire string table section
│ Count: u32 │ Number of unique strings
├─────────────────────────────┤
│ Offsets: [u32 × Count] │ Byte offset of each string in the data section
│ Lengths: [u32 × Count] │ Length of each string (up to 4 GB)
├─────────────────────────────┤
│ String Data: [u8...] │ Concatenated UTF-8 string data
└─────────────────────────────┘
How It Works
During Compilation
- The writer traverses all values in the document
- Every unique string is collected (keys, string values, schema names, field names, ref names, tag names)
- Duplicates are eliminated
- Each string gets an index (0, 1, 2, …)
- The string table is written first in the file
- All subsequent encoding uses indices instead of raw strings
During Reading
- The reader loads the string table at startup
- When decoding a string value, it reads a
u32index - The index maps to an offset and length in the string data
- The string is read from the data section
Lookup Performance
String table access is O(1) by index:
index → offsets[index] → offset in data section
index → lengths[index] → number of bytes to read
string = data[offset..offset+length]
Size Impact
Example: 1,000 Users with 5 Fields
Without deduplication:
- Field names repeated 1,000 times each
- Common values (“active”, “Engineering”) repeated many times
- Estimated overhead: ~20-30 KB just for repeated strings
With string table:
- Each unique string stored once
- References are 4 bytes each
- Estimated savings: 60-80% on string data
Extreme Case: Large Tabular Data
For 10,000 rows with 10 fields, field names alone would consume:
| Approach | Field Name Storage |
|---|---|
| JSON (per-field) | ~10 × 10,000 × avg(8 bytes) = ~800 KB |
| TeaLeaf (string table) | 10 × avg(8 bytes) + 100,000 × 4 bytes = ~400 KB |
| TeaLeaf with schema | 10 × avg(8 bytes) = ~80 bytes (field names in schema only!) |
With schema-typed data, field names appear only in the schema table – the string table contains only the actual string values.
What Gets Deduplicated
| String Source | Deduplicated? |
|---|---|
| Top-level key names | Yes |
| Object field names | Yes |
| String values | Yes |
| Schema names | Yes |
| Schema field names | Yes |
| Reference names | Yes |
| Tag names | Yes |
Maximum String Length
String lengths are stored as u32, supporting individual strings up to ~4 GB. The total string table size (all strings + metadata) is also capped at u32::MAX by the table’s Size header field.
Interaction with Compression
The string table itself is not compressed (it’s needed for decoding). However, data sections that reference the string table benefit doubly:
- String references are 4 bytes (already compact)
- ZLIB compression can further compress repetitive index patterns
Implementation Note
The string table uses a HashMap<String, u32> during compilation for O(1) dedup lookups. The final table is written as parallel arrays (offsets + lengths + data) for O(1) indexed access during reading.
Schema Inference
TeaLeaf can automatically infer schemas from JSON arrays of uniform objects. This page explains the algorithm.
When Schema Inference Runs
Schema inference is triggered by:
tealeaf from-jsonCLI commandtealeaf json-to-tlbxCLI commandTeaLeaf::from_json_with_schemas()Rust API
It is not triggered by:
TeaLeaf::from_json()(plain import, no schemas)TLDocument.FromJson()(.NET API – plain import)
Algorithm
Step 1: Array Detection
Scan top-level JSON values for arrays where all elements are objects with identical key sets:
{
"users": [ // ← Candidate: array of uniform objects
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"}
],
"tags": ["a", "b"], // ← Not candidate: array of strings
"config": {...} // ← Not candidate: not an array
}
Step 2: Name Inference
The schema name is derived from the parent key by singularization:
| Key | Inferred Schema Name |
|---|---|
"users" | user |
"products" | product |
"employees" | employee |
"addresses" | address |
"data" | data (already singular) |
"items_list" | items_list (compound, kept as-is) |
Basic singularization rules:
- Remove trailing
sif the word doesn’t end inss - Remove trailing
esfor-eswords - Remove trailing
ies→y
Step 3: Type Inference
For each field, scan all array elements to determine the type:
| JSON Values Seen | Inferred TeaLeaf Type |
|---|---|
| All integers | int |
| All numbers (mixed int/float) | float |
| All strings | string |
| All booleans | bool |
| All objects (uniform keys) | Nested struct reference |
| All arrays | Inferred element type |
| Mixed types | string (fallback) |
Step 4: Nullable Detection
If any element has null for a field, that field becomes nullable:
[
{"id": 1, "email": "alice@ex.com"},
{"id": 2, "email": null} // ← email becomes string?
]
Step 5: Nested Schema Inference
If a field’s value is an object across all array elements, and those objects have identical keys, a nested schema is created:
{
"users": [
{"name": "Alice", "address": {"city": "Seattle", "zip": "98101"}},
{"name": "Bob", "address": {"city": "Austin", "zip": "78701"}}
]
}
Inferred schemas:
@struct address (city: string, zip: string)
@struct user (address: address, name: string)
This is recursive – nested objects can have their own nested schemas.
Output
The inferred schemas are:
- Added to the document as
@structdefinitions - The original JSON arrays are converted to
@tabletuples - Written in the output file before the data
Example
Input JSON:
{
"products": [
{"id": 1, "name": "Widget", "price": 9.99, "in_stock": true},
{"id": 2, "name": "Gadget", "price": 24.99, "in_stock": false}
]
}
Output TeaLeaf:
@struct product (id: int, in_stock: bool, name: string, price: float)
products: @table product [
(1, true, Widget, 9.99),
(2, false, Gadget, 24.99),
]
Limitations
-
Field order – JSON objects have no guaranteed order. Fields are sorted alphabetically in the inferred schema.
-
Type ambiguity – JSON numbers don’t distinguish int from float. If any element has a decimal, the field becomes
float. -
Non-uniform arrays – arrays where objects have different key sets are not schema-inferred. They remain as plain arrays of objects.
-
Deeply nested arrays – only the first level of array → schema inference is applied. Nested arrays within objects are not auto-inferred.
-
No timestamp detection – ISO 8601 strings in JSON remain as strings, not timestamps.
Testing
TeaLeaf has a comprehensive test suite spanning the Rust core, FFI layer, and .NET bindings.
Test Structure
tealeaf/
├── tealeaf-core/tests/
│ ├── canonical.rs # Canonical fixture round-trip tests
│ └── derive.rs # Derive macro tests
│
├── tealeaf-ffi/src/lib.rs # FFI safety tests (inline #[cfg(test)])
│
├── bindings/dotnet/
│ ├── TeaLeaf.Tests/ # .NET unit tests
│ └── TeaLeaf.Generators.Tests/ # Source generator tests
│
└── canonical/ # Shared test fixtures
├── samples/ # .tl text files (14 canonical samples)
├── expected/ # Expected .json outputs
├── binary/ # Pre-compiled .tlbx files
└── errors/ # Invalid files for error testing
Running Tests
Rust
# All Rust tests
cargo test --workspace
# Core tests only
cargo test --package tealeaf-core
# Derive macro tests
cargo test --package tealeaf-core --test derive
# Canonical fixture tests
cargo test --package tealeaf-core --test canonical
# FFI tests
cargo test --package tealeaf-ffi
.NET
cd bindings/dotnet
dotnet test
Everything
# Rust
cargo test --workspace
# .NET
cd bindings/dotnet && dotnet test
Canonical Test Fixtures
The canonical/ directory contains 14 sample files that test every feature:
| Sample | Features Tested |
|---|---|
primitives | All primitive types (bool, int, float, string, null) |
arrays | Simple and nested arrays |
objects | Nested objects |
schemas | @struct definitions and @table usage |
nested_schemas | Struct-referencing-struct |
deep_nesting | Multi-level struct nesting |
nullable | Nullable fields with ~ values |
maps | @map with various key types |
references | !ref definitions and usage |
tagged | :tag value tagged values |
timestamps | ISO 8601 timestamp parsing |
mixed | Combination of multiple features |
comments | Comment handling |
strings | Quoted, unquoted, multiline strings |
Each sample has:
canonical/samples/{name}.tl– the text sourcecanonical/expected/{name}.json– expected JSON outputcanonical/binary/{name}.tlbx– pre-compiled binary
Canonical Test Pattern
#![allow(unused)]
fn main() {
#[test]
fn test_canonical_sample() {
let tl = TeaLeaf::load("canonical/samples/primitives.tl").unwrap();
// Round-trip: text → binary → text
let tmp = tempfile::NamedTempFile::new().unwrap();
tl.compile(tmp.path(), true).unwrap();
let reader = Reader::open(tmp.path()).unwrap();
// Verify values match
assert_eq!(reader.get("count").unwrap().as_int(), Some(42));
// JSON output matches expected
let json = tl.to_json().unwrap();
let expected = std::fs::read_to_string("canonical/expected/primitives.json").unwrap();
assert_json_eq(&json, &expected);
}
}
Error Fixtures
The canonical/errors/ directory contains intentionally invalid files:
| File | Error Tested |
|---|---|
| Invalid syntax | Parser error handling |
| Missing struct | @table references undefined schema |
| Type mismatches | Schema validation |
| Malformed binary | Binary reader error handling |
Derive Macro Tests
Tests for #[derive(ToTeaLeaf, FromTeaLeaf)]:
- Basic struct serialization/deserialization
- All attribute combinations (
rename,skip,optional,type,flatten,default) - Nested structs
- Enum variants
- Collection types (
Vec,HashMap,IndexMap,Option) - Edge cases (empty structs, single-field structs)
.NET Test Categories
The .NET test suite covers:
Source Generator Tests
- Schema generation for all type combinations
- Serialization output (text, JSON, binary)
- Deserialization from documents
- Nested types and collections
- Enum handling
- Attribute processing
Reflection Serializer Tests
- Generic serialization/deserialization
- Type mapping accuracy
- Nullable handling
- Dictionary and List support
Native Type Tests
TLDocumentlifecycle (parse, access, dispose)TLValuetype accessorsTLReaderbinary access- Schema introspection
- Error handling (disposed objects, missing keys)
DTO Serialization Tests
- Full round-trip (C# object → TeaLeaf → C# object)
- Edge cases (empty strings, nulls, large numbers)
- Collection serialization
Test Philosophy
- Canonical fixtures – shared across Rust and .NET, ensuring format consistency
- Round-trip testing – text → binary → text verifies no data loss
- JSON equivalence – text → JSON and binary → JSON produce identical output
- Error coverage – every error path has at least one test
- Cross-language – same fixtures tested in Rust, .NET, and via FFI
Benchmarks
TeaLeaf includes a Criterion-based benchmark suite that measures encode/decode performance and output size across multiple serialization formats.
Running Benchmarks
# Run all benchmarks
cargo bench -p tealeaf-core
# Run a specific scenario
cargo bench -p tealeaf-core -- small_object
cargo bench -p tealeaf-core -- large_array_1000
cargo bench -p tealeaf-core -- tabular_5000
# List available benchmarks
cargo bench -p tealeaf-core -- --list
Results are saved to target/criterion/ with HTML reports and JSON data. Criterion tracks historical performance across runs.
Formats Compared
Each scenario benchmarks encode and decode across six formats:
| Format | Library | Notes |
|---|---|---|
| TeaLeaf Parse | tealeaf | Text parsing (.tl → in-memory) |
| TeaLeaf Binary | tealeaf | Binary compile/read (.tlbx) |
| JSON | serde_json | Standard JSON serialization |
| MessagePack | rmp_serde | Binary, schemaless |
| CBOR | ciborium | Binary, schemaless |
| Protobuf | prost | Binary with generated code from .proto definitions |
Note: Protobuf benchmarks use
prostwith code generation viabuild.rs. The generated structs have known field offsets at compile time, giving Protobuf a structural speed advantage over TeaLeaf’s dynamic key-based access.
Benchmark Scenarios
| Group | Data Shape | Sizes | What It Tests |
|---|---|---|---|
small_object | Config-like object | 1 | Header overhead, small payload efficiency |
large_array_100 | Array of Point structs | 100 | Array encoding at small scale |
large_array_1000 | Array of Point structs | 1,000 | Array encoding at medium scale |
large_array_10000 | Array of Point structs | 10,000 | Array encoding at large scale, throughput |
nested_structs | Nested objects | 2 levels | Nesting overhead |
nested_structs_100 | Nested objects | 100 levels | Deep nesting scalability |
mixed_types | Heterogeneous data | 1 | Strings, numbers, booleans mixed |
tabular_100 | @table User records | 100 | Schema-bound tabular data, small |
tabular_1000 | @table User records | 1,000 | Schema-bound tabular data, medium |
tabular_5000 | @table User records | 5,000 | Schema-bound tabular data, large |
Each group measures both encode (serialize) and decode (deserialize) operations, using Throughput::Elements for per-element metrics on scaled scenarios.
Size Comparison Results
From cargo run --example size_report on tealeaf-core:
| Format | Small Object | 10K Points | 1K Users |
|---|---|---|---|
| JSON | 1.00x | 1.00x | 1.00x |
| Protobuf | 0.38x | 0.65x | 0.41x |
| MessagePack | 0.35x | 0.63x | 0.38x |
| TeaLeaf Binary | 3.56x | 0.15x | 0.47x |
Key observations:
- Small objects: TeaLeaf has a 64-byte header overhead. For objects under ~200 bytes, JSON or MessagePack are more compact.
- Large arrays: String deduplication and schema-based compression produce 6-7x better compression than JSON for 10K+ records.
- Tabular data:
@tableencoding with positional storage is competitive with Protobuf, with the advantage of embedded schemas.
Speed Characteristics
TeaLeaf’s dynamic key-based access is ~2-5x slower than Protobuf’s generated code:
| Operation | TeaLeaf | Protobuf | JSON (serde) |
|---|---|---|---|
| Parse text | Moderate | N/A | Fast |
| Decode binary | Moderate | Fast | N/A |
| Random key access | O(1) hash | O(1) field | O(n) parse |
Why TeaLeaf is slower than Protobuf:
- Dynamic dispatch – fields resolved by name at runtime; Protobuf uses generated code with known offsets
- String table lookup – each string access requires a table lookup
- Schema resolution – schema structure parsed from binary at load time
When this matters:
- Hot loops decoding millions of records → consider Protobuf
- Cold reads or moderate throughput → TeaLeaf is fine
- Size-constrained transmission → TeaLeaf’s smaller binary compensates for slower decode
Code Structure
tealeaf-core/benches/
├── benchmarks.rs # Entry point: criterion_group + criterion_main
├── common/
│ ├── mod.rs # Module exports
│ ├── data.rs # Test data generation functions
│ └── structs.rs # Rust struct definitions (serde-compatible)
└── scenarios/
├── mod.rs # Module exports
├── small_object.rs # Small config object benchmarks
├── large_array.rs # Scaled array benchmarks (100-10K)
├── nested_structs.rs # Nesting depth benchmarks (2-100)
├── mixed_types.rs # Heterogeneous data benchmarks
└── tabular_data.rs # @table User record benchmarks (100-5K)
Each scenario module exports bench_encode and bench_decode functions. Scaled scenarios accept a size parameter.
For optimization tips and practical guidance on when to use each format, see Performance.
Accuracy Benchmark
The accuracy benchmark suite evaluates LLM providers’ ability to analyze structured data in TeaLeaf format. It sends analysis prompts with TeaLeaf-formatted business data to multiple providers and scores the responses.
Overview
The workflow:
- Takes JSON data from various business domains
- Converts it to TeaLeaf format using
tealeaf-core - Sends analysis prompts to multiple LLM providers
- Evaluates and compares the responses using a scoring framework
Supported Providers
| Provider | Environment Variable | Model |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY | Claude Sonnet 4.5 (Extended Thinking) |
| OpenAI | OPENAI_API_KEY | GPT-5.2 |
Installation
Pre-built Binaries
Download the latest release from GitHub Releases:
| Platform | Architecture | File |
|---|---|---|
| Windows | x64 | accuracy-benchmark-windows-x64.zip |
| Windows | ARM64 | accuracy-benchmark-windows-arm64.zip |
| macOS | Intel | accuracy-benchmark-macos-x64.tar.gz |
| macOS | Apple Silicon | accuracy-benchmark-macos-arm64.tar.gz |
| Linux | x64 | accuracy-benchmark-linux-x64.tar.gz |
| Linux | ARM64 | accuracy-benchmark-linux-arm64.tar.gz |
| Linux | x64 (static) | accuracy-benchmark-linux-musl-x64.tar.gz |
Build from Source
cargo build -p accuracy-benchmark --release
# Or run directly
cargo run -p accuracy-benchmark -- --help
Usage
# Run with all available providers
cargo run -p accuracy-benchmark -- run
# Run with specific providers
cargo run -p accuracy-benchmark -- run --providers anthropic,openai
# Run specific categories only
cargo run -p accuracy-benchmark -- run --categories finance,retail
# Compare TeaLeaf vs JSON format performance
cargo run -p accuracy-benchmark -- run --compare-formats
# Verbose output
cargo run -p accuracy-benchmark -- -v run
# List available tasks
cargo run -p accuracy-benchmark -- list-tasks
# Generate configuration template
cargo run -p accuracy-benchmark -- init-config -o my-config.json
Benchmark Tasks
The suite includes 12 tasks across 10 business domains:
| ID | Domain | Complexity | Output Type |
|---|---|---|---|
| FIN-001 | Finance | Simple | Calculation |
| FIN-002 | Finance | Moderate | Calculation |
| RET-001 | Retail | Simple | Summary |
| RET-002 | Retail | Complex | Recommendation |
| HLT-001 | Healthcare | Simple | Summary |
| TEC-001 | Technology | Moderate | Analysis |
| MKT-001 | Marketing | Moderate | Calculation |
| LOG-001 | Logistics | Moderate | Analysis |
| HR-001 | HR | Moderate | Analysis |
| MFG-001 | Manufacturing | Moderate | Calculation |
| RE-001 | Real Estate | Complex | Recommendation |
| LEG-001 | Legal | Complex | Analysis |
Data Sources
Each task specifies input data in one of two ways:
Inline JSON:
#![allow(unused)]
fn main() {
BenchmarkTask::new("FIN-001", "finance", "Analyze this data:\n\n{tl_data}")
.with_json_data(serde_json::json!({
"revenue": 1000000,
"expenses": 750000
}))
}
JSON file reference:
#![allow(unused)]
fn main() {
BenchmarkTask::new("LOG-001", "logistics", "Analyze this data:\n\n{tl_data}")
.with_json_file("tasks/logistics/data/shipments.json")
}
The {tl_data} placeholder in the prompt template is replaced with TeaLeaf-formatted data before sending to the LLM.
Analysis Framework
Accuracy Metrics
Responses are evaluated across five dimensions:
| Metric | Weight | Description |
|---|---|---|
| Completeness | 25% | Were all expected elements addressed? |
| Relevance | 25% | How relevant is the response to the task? |
| Coherence | 20% | Is the response well-structured? |
| Factual Accuracy | 20% | Do values match validation patterns? |
| Actionability | 10% | For recommendations – are they actionable? |
Element Detection
Each task defines expected elements that should appear in the response:
#![allow(unused)]
fn main() {
// Keyword presence check
.expect("metric", "Total revenue calculation", true)
// Regex pattern validation
.expect_with_pattern("metric", "Percentage value", true, r"\d+\.?\d*%")
}
- Without pattern: checks for keyword presence from description
- With pattern: validates using regex (e.g.,
\$[\d,]+for dollar amounts)
Scoring Rubrics
Different rubrics apply based on output type:
| Output Type | Key Criteria |
|---|---|
| Calculation | Numeric content (5+ numbers), structured output |
| Analysis | Depth, structure, evidence with data |
| Recommendation | Actionable language, prioritization, justification |
| Summary | Completeness, conciseness, organization |
Coherence Checks
- Structure markers:
##,###,**,-, numbered lists - Paragraph breaks (3+ paragraphs preferred)
- Reasonable length (100-2000 words)
Actionability Keywords
For recommendation tasks, these keywords are detected:
- recommend, should, suggest, consider, advise
- action, implement, improve, optimize, prioritize
- next step, immediate, critical, important
Format Comparison Results
Run with --compare-formats to compare TeaLeaf vs JSON input efficiency.
Sample run from February 5, 2026 with Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) and GPT-5.2 (gpt-5.2-2025-12-11):
| Provider | TeaLeaf Score | JSON Score | Accuracy Diff | TeaLeaf Input | JSON Input | Input Savings |
|---|---|---|---|---|---|---|
| anthropic | 0.988 | 0.978 | +0.010 | 5,793 | 8,275 | -30.0% |
| openai | 0.901 | 0.899 | +0.002 | 4,868 | 7,089 | -31.3% |
Input tokens = data sent to the model. Output tokens vary by model verbosity.
Key findings:
| Provider | Accuracy | Data Token Efficiency |
|---|---|---|
| anthropic | Comparable (+1.0%) | TeaLeaf uses ~36% fewer data tokens |
| openai | Comparable (+0.2%) | TeaLeaf uses ~36% fewer data tokens |
TeaLeaf data payloads use ~36% fewer tokens than equivalent JSON (median across 12 tasks, validated with tiktoken). Total input savings are ~30% because shared instruction text dilutes the data-only difference. Savings increase with larger and more structured datasets.
Sample Results: Reference benchmark results are available in
accuracy-benchmark/results/sample/in the repository.
Output Files
Results are saved in two formats:
TeaLeaf Format (analysis.tl)
# Accuracy Benchmark Results
# Generated: 2026-02-05 15:29:42 UTC
run_metadata: {
run_id: "20260205-152419",
started_at: 2026-02-05T15:24:19Z,
completed_at: 2026-02-05T15:29:42Z,
total_tasks: 12,
providers: [anthropic, openai]
}
responses: @table api_response [
(FIN-001, openai, "gpt-5.2-2025-12-11", 315, 490, 6742, 2026-02-05T15:24:38Z, success),
(FIN-001, anthropic, "claude-sonnet-4-5-20250929", 396, 1083, 12309, 2026-02-05T15:24:31Z, success),
...
]
analysis_results: @table analysis_result [
(FIN-001, openai, 0.667, 0.625, 0.943, 0.000),
(FIN-001, anthropic, 1.000, 1.000, 1.000, 1.000),
...
]
comparisons: @table comparison_result [
(FIN-001, [anthropic, openai], anthropic, 0.389),
(RET-001, [anthropic, openai], anthropic, 0.047),
...
]
summary: {
total_tasks: 12,
wins: { anthropic: 11, openai: 1 },
avg_scores: { anthropic: 0.988, openai: 0.901 },
by_category: { ... },
by_complexity: { ... }
}
JSON Summary (summary.json)
{
"run_id": "20260205-152419",
"total_tasks": 12,
"provider_rankings": [
{ "provider": "anthropic", "wins": 11, "avg_score": 0.988 },
{ "provider": "openai", "wins": 1, "avg_score": 0.901 }
],
"category_breakdown": {
"retail": { "leader": "anthropic", "margin": 0.111 },
"finance": { "leader": "anthropic", "margin": 0.197 },
...
},
"detailed_results_file": "analysis.tl"
}
Adding Custom Tasks
From JSON Data
#![allow(unused)]
fn main() {
BenchmarkTask::new(
"CUSTOM-001",
"custom_category",
"Analyze this data:\n\n{tl_data}\n\nProvide summary and recommendations."
)
.with_json_file("tasks/custom/data/my_data.json")
.with_complexity(Complexity::Moderate)
.with_output_type(OutputType::Analysis)
.expect("summary", "Data overview", true)
.expect_with_pattern("metric", "Total value", true, r"\d+")
}
From TeaLeaf File
cargo run -p accuracy-benchmark -- run --tasks path/to/tasks.tl
Extending Providers
Implement the LLMProvider trait:
#![allow(unused)]
fn main() {
#[async_trait]
impl LLMProvider for NewProviderClient {
fn name(&self) -> &str { "newprovider" }
async fn complete(&self, request: CompletionRequest) -> ProviderResult<CompletionResponse> {
// Implementation
}
}
}
Then register in src/providers/mod.rs via create_all_providers() and create_providers().
Directory Structure
accuracy-benchmark/
├── src/
│ ├── main.rs # CLI interface (clap)
│ ├── lib.rs # Library exports
│ ├── config.rs # Configuration management
│ ├── providers/ # LLM provider clients
│ │ ├── traits.rs # LLMProvider trait
│ │ ├── anthropic.rs # Claude implementation
│ │ └── openai.rs # GPT implementation
│ ├── tasks/ # Task definitions
│ │ ├── mod.rs # BenchmarkTask, DataSource
│ │ ├── categories.rs # Domain, Complexity, OutputType
│ │ └── loader.rs # TeaLeaf file loader
│ ├── runner/ # Execution engine
│ │ ├── executor.rs # Parallel task execution
│ │ └── rate_limiter.rs
│ ├── analysis/ # Response analysis
│ │ ├── metrics.rs # AccuracyMetrics
│ │ ├── scoring.rs # ScoringRubric
│ │ └── comparator.rs # Cross-provider comparison
│ └── reporting/ # Output generation
│ └── tl_writer.rs # TeaLeaf format output
├── tasks/ # Sample data by domain
│ ├── finance/data/
│ ├── healthcare/data/
│ ├── retail/data/
│ └── ...
├── results/runs/ # Archived run results
└── Cargo.toml
Adversarial Tests
The adversarial test suite validates TeaLeaf’s error handling and robustness using crafted malformed inputs, binary corruption, compression edge cases, and large-corpus stress tests. All tests are isolated in the adversarial-tests/ directory to avoid touching core project files.
Current count: 58 tests across 9 categories.
Running Tests
# Run all adversarial tests
cd adversarial-tests/core-harness
cargo test --test adversarial
# With output
cargo test --test adversarial -- --nocapture
# Run via script (PowerShell)
./adversarial-tests/scripts/run_core_harness.ps1
# CLI adversarial tests
./adversarial-tests/scripts/run_cli_adversarial.ps1
# .NET adversarial harness
./adversarial-tests/scripts/run_dotnet_harness.ps1
Test Input Files
TeaLeaf Format (.tl) — 13 files
Crafted .tl files testing parser error paths:
| File | Error Tested | Expected |
|---|---|---|
bad_unclosed_string.tl | Unclosed string literal ("Alice) | Parse error |
bad_missing_colon.tl | Missing colon in key-value pair | Parse error |
bad_invalid_escape.tl | Invalid escape sequence (\q) | Parse error |
bad_number_overflow.tl | Number exceeding u64 bounds | See note below |
bad_table_wrong_arity.tl | Table row with wrong field count | Parse error |
bad_schema_unclosed.tl | Unclosed @struct definition | Parse error |
bad_unicode_escape_short.tl | Incomplete \u escape (\u12) | Parse error |
bad_unicode_escape_invalid_hex.tl | Invalid hex in \uZZZZ | Parse error |
bad_unicode_escape_surrogate.tl | Unicode surrogate pair (\uD800) | Parse error |
bad_unterminated_multiline.tl | Unterminated """ multiline string | Parse error |
invalid_utf8.tl | Invalid UTF-8 byte sequence | Parse error |
Note:
bad_number_overflow.tldoes not cause a parse error. Numbers exceeding i64/u64 range are stored asValue::JsonNumber(exact decimal string), not rejected.
Edge cases that should succeed:
| File | What It Tests | Expected |
|---|---|---|
deep_nesting.tl | 7 levels of nested arrays ([[[[[[[1]]]]]]]) | Parse OK |
empty_doc.tl | Empty document | Parse OK |
JSON Format (.json) — 6 files
Files testing from_json error and edge-case paths:
| File | What It Tests | Expected |
|---|---|---|
invalid_json_trailing.json | Trailing comma or content | Parse error |
invalid_json_unclosed.json | Unclosed object or array | Parse error |
large_number.json | Number overflowing f64 | Stored as JsonNumber |
deep_array.json | Deeply nested arrays | Parse OK |
empty_object.json | Empty JSON object {} | Parse OK |
root_array.json | Root-level array [1,2,3] | Preserved as array |
Binary Format (.tlbx) — 4 files (unused)
These fixture files exist but are not referenced by any test. All binary adversarial tests generate malformed data inline using tempfile::tempdir(). These files are only used by the CLI adversarial scripts in results/cli/.
| File | Content |
|---|---|
bad_magic.tlbx | Invalid magic bytes |
bad_version.tlbx | Invalid version field |
random_garbage.tlbx | Random bytes |
truncated_header.tlbx | Incomplete header |
Test Functions
Parse Error Tests (10 tests)
| Function | Input | Assertion |
|---|---|---|
parse_invalid_syntax_unclosed_string | name: "Alice | is_err() |
parse_invalid_escape_sequence | name: "Alice\q" | is_err() |
parse_missing_colon | name "Alice" | is_err() |
parse_schema_unclosed | Unclosed @struct | is_err() |
parse_table_wrong_arity | 3 fields for 2-field schema | is_err() |
parse_unicode_escape_short | \u12 | is_err() |
parse_unicode_escape_invalid_hex | \uZZZZ | is_err() |
parse_unicode_escape_surrogate | \uD800 | is_err() |
parse_unterminated_multiline_string | """unterminated | is_err() |
from_json_invalid | {"a":1,} (trailing comma) | is_err() |
Success / Edge-Case Parse Tests (3 tests)
| Function | Input | Assertion |
|---|---|---|
parse_number_overflow_falls_to_json_number | 18446744073709551616 | Parse succeeds; stored as Value::JsonNumber |
parse_deep_nesting_ok | [[[[[[[1]]]]]]] | Parse succeeds; get("root") returns value |
from_json_root_array_is_preserved | [1,2,3] | Stored under "root" key as Value::Array |
Error Variant Coverage (5 tests)
Tests that exercise specific Error enum variants for code coverage:
| Function | What It Tests | Assertion |
|---|---|---|
parse_unknown_struct_in_table | @table nonexistent references undefined struct | is_err(); message contains struct name |
parse_unexpected_eof_unclosed_brace | obj: {x: 1, | is_err(); message indicates EOF |
parse_unexpected_eof_unclosed_bracket | arr: [1, 2, | is_err() |
reader_missing_field | reader.get("nonexistent") on valid binary | is_err(); message contains key name |
from_json_large_number_falls_to_json_number | {"big": 18446744073709551616} | Parsed as Value::JsonNumber |
Type Coercion Tests (2 tests)
Validates spec §2.5 best-effort numeric coercion during binary compilation:
| Function | Input | Assertion |
|---|---|---|
writer_int_overflow_coerces_to_zero | int8 field with value 999 | Binary roundtrip produces Value::Int(0) |
writer_uint_negative_coerces_to_zero | uint8 field with value -1 | Binary roundtrip produces Value::UInt(0) |
Binary Reader Tests (4 tests)
| Function | Input | Assertion |
|---|---|---|
reader_rejects_bad_magic | [0x58, 0x58, 0x58, 0x58] | Reader::open().is_err() |
reader_rejects_bad_version | Valid magic + version 3 | Reader::open().is_err() |
load_invalid_file_errors | .tl file with bad syntax | TeaLeaf::load().is_err() |
load_invalid_utf8_errors | [0xFF, 0xFE, 0xFA] | TeaLeaf::load().is_err() |
Binary Corruption Tests (12 tests)
Tests that take valid binary output, corrupt specific bytes, and verify the reader does not panic:
| Function | What It Corrupts |
|---|---|
reader_corrupted_magic_byte | Flips first magic byte |
reader_corrupted_string_table_offset | Points string table offset past EOF |
reader_truncated_string_table | Truncates file right after header |
reader_oversized_string_count | Sets string count to u32::MAX |
reader_oversized_section_count | Sets section count to u32::MAX |
reader_corrupted_schema_count | Sets schema count to u32::MAX |
reader_flipped_bytes_in_section_data | Flips bytes in last 10 bytes of section data |
reader_truncated_compressed_data | Removes last 20 bytes from compressed file |
reader_invalid_zlib_stream | Overwrites data section with 0xBA bytes |
reader_zero_length_file | Empty Vec<u8> |
reader_just_magic_no_header | Only b"TLBX" (4 bytes, no header) |
reader_corrupted_type_code | Replaces a type code byte with 0xFE |
All corruption tests assert no panic. Most also verify that Reader::from_bytes() or reader.get() either returns an error or handles the corruption gracefully.
Compression Stress Tests (4 tests)
| Function | What It Tests |
|---|---|
compression_at_threshold_boundary | Data just over 64 bytes triggers compression attempt; roundtrip OK |
compression_skipped_when_not_beneficial | High-entropy data: compressed file not much larger than raw |
compression_all_identical_bytes | 10K zeros: compressed size < half of raw; roundtrip OK |
compression_below_threshold_stored_raw | Small data with compress=true: stored raw (same size as uncompressed) |
Soak / Large-Corpus Tests (8 tests)
Stress tests for parser, writer, and reader with large inputs:
| Function | Scale | What It Tests |
|---|---|---|
soak_deeply_nested_arrays | 200 levels deep | Parser handles deep nesting without stack overflow |
soak_wide_object | 10,000 fields | Parser and Value::Object handle wide objects |
soak_large_array | 100,000 integers | Parser handles large arrays; first/last element correct |
soak_large_array_binary_roundtrip | 100,000 integers | Compile + read roundtrip with compression |
soak_many_sections | 5,000 top-level keys | Binary writer/reader handles many sections |
soak_many_schemas | 500 @struct definitions | Schema table handles large schema counts |
soak_string_deduplication | 15,000 strings (5K dupes) | String dedup in binary writer; roundtrip correct |
soak_long_string | 1 MB string | Binary writer/reader handles large string values |
Memory-Mapped Reader Tests (10 tests)
Validates Reader::open_mmap() produces identical results to Reader::open() and Reader::from_bytes():
| Function | What It Tests |
|---|---|
mmap_roundtrip_all_primitive_types | Int, float, bool, string, timestamp via mmap |
mmap_roundtrip_containers | Arrays, objects, nested arrays via mmap |
mmap_roundtrip_schemas | @struct + @table data via mmap |
mmap_roundtrip_compressed | 500-element compressed array via mmap |
mmap_vs_open_equivalence | All keys: open_mmap values == open values |
mmap_vs_from_bytes_equivalence | All keys: open_mmap values == from_bytes values |
mmap_large_file | 50,000-element array via mmap |
mmap_nonexistent_file | open_mmap on missing path returns error |
mmap_multiple_sections | 100 sections via mmap; boundary keys correct |
mmap_string_dedup | 100 identical string values via mmap; dedup preserved |
Directory Structure
adversarial-tests/
├── inputs/
│ ├── tl/ # 13 crafted .tl files (11 error + 2 success)
│ ├── json/ # 6 crafted .json files
│ └── tlbx/ # 4 .tlbx files (used by CLI scripts, not Rust tests)
├── core-harness/
│ ├── tests/
│ │ └── adversarial.rs # 58 Rust integration tests
│ └── Cargo.toml
├── dotnet-harness/ # C# harness using TeaLeaf bindings
├── scripts/
│ ├── run_core_harness.ps1
│ ├── run_cli_adversarial.ps1
│ └── run_dotnet_harness.ps1
├── results/ # CLI test logs and outputs
└── README.md
Adding New Tests
1. Add an Inline Test (preferred)
Most adversarial tests generate their inputs inline. This avoids stale fixture files and keeps the test self-contained:
#![allow(unused)]
fn main() {
#[test]
fn parse_new_error_case() {
assert_parse_err("malformed: input here");
}
}
The assert_parse_err helper asserts that TeaLeaf::parse(input).is_err().
2. For Binary Tests
Use the make_valid_binary helper to produce valid bytes, then corrupt them:
#![allow(unused)]
fn main() {
#[test]
fn reader_new_corruption_case() {
let mut data = make_valid_binary("val: 42", false);
data[0] ^= 0xFF; // corrupt something
let result = Reader::from_bytes(data);
// Assert no panic; error or graceful handling OK
if let Ok(r) = result {
let _ = r.get("val");
}
}
}
3. Input File Tests (for CLI scripts)
Place malformed input in the appropriate subdirectory for CLI adversarial testing:
adversarial-tests/inputs/tl/bad_new_case.tl
The CLI script run_cli_adversarial.ps1 exercises these files through the tealeaf CLI binary and logs results to results/cli/.
Contributing Guide
Contributions to TeaLeaf are welcome. The full contributing guide lives in the repository root:
That document covers project architecture, build instructions, testing, the canonical test suite, version management, PR process, and areas of interest for contributors.
This page highlights the key points. See the Development Setup page for environment setup details.
Ways to Contribute
- Bug reports – file issues on GitHub with reproduction steps
- Feature requests – open an issue describing the use case
- Code contributions – submit pull requests
- Documentation – fix typos, improve explanations, add examples
- Language bindings – create bindings for Python, Java, Go, etc.
- Test cases – add canonical test fixtures or edge case tests
Repository
Source code: github.com/krishjag/tealeaf
Pull Request Checklist
- Fork the repository and create a feature branch from
main - Make your changes
- Run tests and lints:
cargo test --workspace cargo clippy --workspace cargo fmt --check - If you modified .NET bindings:
cd bindings/dotnet && dotnet test - Submit a pull request against
main
CI runs on Linux, macOS, and Windows automatically. Version consistency is validated on every PR.
Code Style
Rust
- Standard
rustfmtformatting (no custom config) - Standard
clippylints (no custom config) - Document public APIs with
///doc comments - Edition 2021
C# (.NET)
- Standard C# naming conventions
- XML doc comments for public APIs
- Target frameworks: net6.0, net8.0, net10.0, netstandard2.0
Areas of Interest
New Language Bindings
The FFI layer exposes a C-compatible API that can be used from any language. See the FFI Overview for getting started.
Desired bindings:
- Python (via
ctypesorcffi) - Java/Kotlin (via JNI or JNA)
- Go (via cgo)
- JavaScript/TypeScript (via WASM or N-API)
Format Improvements
- Union support in binary encoding
- Bytes literal syntax in text format
- Streaming/append-only mode
Tooling
- Editor plugins (VS Code syntax highlighting for
.tl) - Schema validation tooling
- Web-based playground
License
By contributing, you agree that your contributions will be licensed under the MIT License.
Development Setup
How to set up a development environment for working on TeaLeaf. See also the comprehensive CONTRIBUTING.md in the repository root for project architecture, version management, and PR guidelines.
Prerequisites
| Tool | Version | Purpose |
|---|---|---|
| Rust | 1.70+ | Core library, CLI, FFI |
| .NET SDK | 8.0+ | .NET bindings and tests |
| Git | Any | Version control |
Optional:
- Protobuf compiler (for benchmark suite)
- mdBook (for documentation)
Clone and Build
git clone https://github.com/krishjag/tealeaf.git
cd tealeaf
# Build everything
cargo build --workspace
# Build release
cargo build --workspace --release
Project Layout
tealeaf/
├── tealeaf-core/ # Core library + CLI binary
├── tealeaf-derive/ # Proc-macro (derive macros)
├── tealeaf-ffi/ # C FFI layer
├── bindings/dotnet/ # .NET bindings
├── canonical/ # Shared test fixtures
├── spec/ # Format specification
├── examples/ # Example files
├── docs-site/ # Documentation site (mdBook)
└── accuracy-benchmark/ # Accuracy benchmark tool
Running Tests
Rust
# All tests
cargo test --workspace
# Specific package
cargo test --package tealeaf-core
cargo test --package tealeaf-derive
cargo test --package tealeaf-ffi
# Specific test file
cargo test --package tealeaf-core --test canonical
cargo test --package tealeaf-core --test derive
# With output
cargo test --workspace -- --nocapture
.NET
cd bindings/dotnet
dotnet build
dotnet test
Lint
cargo clippy --workspace
cargo fmt --check
Development Workflows
Modifying the Parser
- Edit
tealeaf-core/src/lib.rs(lexer and parser live here) - Run
cargo test --package tealeaf-core - Check canonical fixtures still pass
- Add new test cases for the change
Modifying the Binary Format
- Edit
tealeaf-core/src/writer.rs(encoder) andtealeaf-core/src/reader.rs(decoder) - Run canonical round-trip tests:
cargo test --package tealeaf-core --test canonical - Regenerate binary fixtures if the format changed
Modifying Derive Macros
- Edit files in
tealeaf-derive/src/ - Run:
cargo test --package tealeaf-core --test derive - Check that derive tests cover your change
Modifying FFI
- Edit
tealeaf-ffi/src/lib.rs - Run:
cargo test --package tealeaf-ffi - The C header is auto-regenerated by
cbindgenduring build
Modifying .NET Bindings
- Edit files in
bindings/dotnet/ - Build:
cd bindings/dotnet && dotnet build - Test:
dotnet test - The native library must be built first:
cargo build --package tealeaf-ffi
Documentation
Building the Documentation Site
# Install mdBook
cargo install mdbook
# Build
cd docs-site
mdbook build
# Serve locally with live reload
mdbook serve --open
Rust API Docs
cargo doc --workspace --no-deps --open
CI/CD
The project uses GitHub Actions for CI:
| Workflow | Purpose |
|---|---|
rust-cli.yml | Build and test Rust on all platforms |
dotnet-package.yml | Build .NET package with native libraries |
accuracy-benchmark.yml | Benchmark accuracy tests |
All CI runs are triggered on push to main/develop and on pull requests.
Debugging
Rust
# Run with debug output
RUST_LOG=debug cargo run --package tealeaf-core -- info test.tl
# Run with backtrace
RUST_BACKTRACE=1 cargo test --package tealeaf-core
.NET
Use Visual Studio or VS Code with the C# extension for debugging the source generator and managed code.
For native library issues, attach a native debugger to the .NET test process.
Type Reference
Complete reference table for all TeaLeaf types, their text syntax, binary encoding, and language mappings.
Primitive Types
| TeaLeaf Type | Text Syntax | Binary Code | Binary Size | Rust Type | C# Type |
|---|---|---|---|---|---|
bool | true / false | 0x01 | 1 byte | bool | bool |
int8 | 42 | 0x02 | 1 byte | i8 | sbyte |
int16 | 1000 | 0x03 | 2 bytes | i16 | short |
int / int32 | 100000 | 0x04 | 4 bytes | i32 | int |
int64 | 5000000000 | 0x05 | 8 bytes | i64 | long |
uint8 | 255 | 0x06 | 1 byte | u8 | byte |
uint16 | 65535 | 0x07 | 2 bytes | u16 | ushort |
uint / uint32 | 100000 | 0x08 | 4 bytes | u32 | uint |
uint64 | 18446744073709551615 | 0x09 | 8 bytes | u64 | ulong |
float32 | 3.14 | 0x0A | 4 bytes | f32 | float |
float / float64 | 3.14 | 0x0B | 8 bytes | f64 | double |
string | "hello" / hello | 0x10 | 4 bytes (index) | String | string |
bytes | b"cafef00d" | 0x11 | varint + data | Vec<u8> | byte[] |
json_number | (from JSON) | 0x12 | 4 bytes (index) | String | string |
timestamp | 2024-01-15T10:30:00Z | 0x32 | 10 bytes | (i64, i16) | DateTimeOffset |
Special Types
| TeaLeaf Type | Text Syntax | Binary Code | Description |
|---|---|---|---|
null | ~ | 0x00 | Null/missing value |
Container Types
| TeaLeaf Type | Text Syntax | Binary Code | Description |
|---|---|---|---|
| Array | [1, 2, 3] | 0x20 | Ordered collection |
| Object | {key: value} | 0x21 | String-keyed map |
| Struct | (val, val, ...) in @table | 0x22 | Schema-typed record |
| Map | @map {key: value} | 0x23 | Any-keyed ordered map |
| Tuple | (val, val, ...) | 0x24 (reserved) | Currently parsed as array |
Semantic Types
| TeaLeaf Type | Text Syntax | Binary Code | Description |
|---|---|---|---|
| Ref | !name | 0x30 | Named reference |
| Tagged | :tag value | 0x31 | Discriminated value |
Type Modifiers
| Modifier | Syntax | Description |
|---|---|---|
| Nullable | type? | Field can be ~ (null) |
| Array | []type | Array of the given type |
| Nullable array | []type? | The field itself can be null |
Type Widening Path
int8 → int16 → int32 → int64
uint8 → uint16 → uint32 → uint64
float32 → float64
Widening is automatic when reading binary data. Narrowing requires recompilation.
JSON Mapping
| TeaLeaf Type | JSON Output | JSON Input |
|---|---|---|
| Null | null | null → Null |
| Bool | true/false | boolean → Bool |
| Int | number | integer → Int |
| UInt | number | large integer → UInt |
| Float | number | decimal → Float |
| String | "text" | string → String |
| Bytes | "0xhex" | (not auto-detected) |
| JsonNumber | number | large/precise number → JsonNumber |
| Timestamp | "ISO 8601" | (not auto-detected) |
| Array | [...] | array → Array |
| Object | {...} | object → Object |
| Map | [[k,v],...] | (not auto-detected) |
| Ref | {"$ref":"name"} | (not auto-detected) |
| Tagged | {"$tag":"t","$value":v} | (not auto-detected) |
Comparison Matrix
How TeaLeaf compares to other data formats.
Feature Comparison
| Feature | JSON | YAML | Protobuf | Avro | MsgPack | CBOR | TeaLeaf |
|---|---|---|---|---|---|---|---|
| Human-readable text | Yes | Yes | No* | No | No | No | Yes |
| Compact binary | No | No | Yes | Yes | Yes | Yes | Yes |
| Schema in text | No | No | External | External | No | No | Inline |
| Schema in binary | No | No | No | Yes | No | No | Yes |
| No codegen required | Yes | Yes | No | Partial | Yes | Yes | Yes |
| Comments | No | Yes | N/A | N/A | No | No | Yes |
| Built-in JSON conversion | – | – | No | No | No | No | Yes |
| String deduplication | No | No | No | No | No | No | Yes |
| Per-section compression | No | No | No | Yes | No | No | Yes |
| Null bitmaps | No | No | No | Yes | No | No | Yes |
| Random-access reading | No | No | No | No | No | No | Yes |
*Protobuf TextFormat exists but is rarely used.
Size Comparison
| Format | Small Object | 10K Points | 1K Users |
|---|---|---|---|
| JSON | 1.00x | 1.00x | 1.00x |
| YAML | ~1.1x | ~1.1x | ~1.1x |
| Protobuf | 0.38x | 0.65x | 0.41x |
| MessagePack | 0.35x | 0.63x | 0.38x |
| CBOR | ~0.40x | ~0.65x | ~0.42x |
| TeaLeaf Binary | 3.56x | 0.15x | 0.47x |
Speed Comparison
| Operation | JSON (serde) | Protobuf | MsgPack | TeaLeaf |
|---|---|---|---|---|
| Parse text | Fast | N/A | N/A | Moderate |
| Decode binary | N/A | Fast | Fast | Moderate |
| Encode text | Fast | N/A | N/A | Moderate |
| Encode binary | N/A | Fast | Fast | Moderate |
| Random key access | O(n) parse | O(1) generated | N/A | O(1) hash |
When to Use Each Format
Use TeaLeaf When
| Scenario | Why |
|---|---|
| LLM context / prompts | Schema-first reduces token count |
| Config files (human-edited + deployed) | Text for editing, binary for deployment |
| Large tabular data | 6-7x compression with string dedup |
| Self-describing data exchange | No external schema files needed |
| Game save data / asset manifests | Compact, nested, self-describing |
| Scientific/sensor data | Null bitmaps for sparse data |
Use JSON When
| Scenario | Why |
|---|---|
| Web APIs / REST | Universal support |
| Small payloads (< 1 KB) | No overhead |
| JavaScript-heavy applications | Native parsing |
| Human-only data (no binary needed) | Simpler tooling |
Use Protobuf When
| Scenario | Why |
|---|---|
| RPC / gRPC services | First-class streaming support |
| Maximum decode speed | Generated code with known offsets |
| Schema evolution at scale | Field numbers + backward compat |
| Microservice communication | Established ecosystem |
Use Avro When
| Scenario | Why |
|---|---|
| Hadoop / big data pipelines | Ecosystem integration |
| Schema registry workflows | Built-in evolution |
| Large-scale data lake storage | Block compression |
Use MessagePack / CBOR When
| Scenario | Why |
|---|---|
| Tiny payloads (< 100 bytes) | Minimal overhead |
| Schemaless binary | No schema definition needed |
| Drop-in JSON replacement | Similar data model |
Ecosystem Maturity
| Aspect | JSON | Protobuf | Avro | TeaLeaf |
|---|---|---|---|---|
| Language support | Universal | 10+ languages | 5+ languages | Rust, .NET |
| Tooling | Extensive | Extensive | Moderate | CLI + libraries |
| Community | Massive | Large | Medium | Early |
| Specification maturity | RFC 8259 | Stable (proto3) | Apache spec | Beta |
| IDE support | Universal | Plugins | Plugins | Planned |
TeaLeaf is a young format (v2.0.0-beta.8). It fills a specific niche that existing formats don’t serve well – but it doesn’t aim to replace established formats in their core use cases.
Specification Governance
How the TeaLeaf specification, implementation, and tests relate to each other.
Two Sources of Truth
TeaLeaf has a prose specification and an executable specification:
| Prose Spec | Executable Spec | |
|---|---|---|
| Location | spec/TEALEAF_SPEC.md | canonical/ test suite |
| Format | Markdown document | .tl samples + expected JSON + pre-compiled .tlbx |
| Enforced by | Human review | CI (automated on every push and PR) |
| Covers | Full grammar, type system, binary layout | 14 feature areas, 52 tests (42 success + 10 error) |
The canonical test suite is the normative specification. If the prose spec and the tests disagree, the tests are authoritative. The prose spec is documentation that describes intent and rationale.
What the Canonical Suite Validates
Each canonical sample is tested through three paths:
Text (.tl) ──────────────────────────────► JSON (compare with expected/)
Binary (.tlbx) ──────────────────────────► JSON (compare with expected/)
Text (.tl) ──► Binary (.tlbx) ──► Read ──► JSON (full round-trip)
The 14 sample files cover:
| File | Coverage |
|---|---|
primitives.tl | Null, bool, int, float, string, escape sequences |
arrays.tl | Empty, typed, mixed, nested arrays |
objects.tl | Empty, simple, nested, deeply nested objects |
schemas.tl | @struct, @table, nested structs, nullable fields |
special_types.tl | References, tagged values, maps, edge cases |
timestamps.tl | ISO 8601 variants, timezones, milliseconds |
numbers_extended.tl | Hex, binary, scientific notation, i64 limits |
unions.tl | @union, empty/multi-field variants |
multiline_strings.tl | Triple-quoted strings, auto-dedent, code blocks |
unicode_escaping.tl | CJK, Cyrillic, Arabic, emoji, ZWJ sequences |
refs_tags_maps.tl | References, tagged values, maps, compositions |
mixed_schemas.tl | Schema-bound and schemaless data together |
large_data.tl | Stress tests: 100+ element arrays, deep nesting, long strings |
cyclic_refs.tl | Reference cycles, forward references, self-references |
Error tests in canonical/errors/ validate that invalid input produces specific, stable error messages across all interfaces (CLI, FFI, .NET).
Change Process
Adding New Behavior
When adding new syntax, types, or features:
- Design – Describe the change in an issue or PR description
- Implement – Modify the parser/encoder/decoder in
tealeaf-core - Add canonical tests – Create or extend a sample in
canonical/samples/, generate expected JSON and binary fixtures - Update the prose spec – Update
spec/TEALEAF_SPEC.mdto document the new behavior - CI validates – All three round-trip paths must pass
A PR that adds implementation without canonical tests is incomplete. A PR that updates the prose spec without tests is documentation-only and does not change behavior.
Modifying Existing Behavior
Behavior changes fall into two categories:
Non-breaking (output changes, error message improvements):
- Update canonical expected outputs (
canonical/expected/*.json) - Update error golden tests if error messages changed
- Update the prose spec
Breaking (syntax changes, binary format changes, type system changes):
- Requires a version bump in
release.json - Regenerate all binary fixtures (
canonical/binary/*.tlbx) - Update the prose spec with a clear note about the breaking change
- Binary format changes must update the format version constant in
writer.rs
Error Message Stability
Error messages are part of the public contract. The canonical/errors/ directory contains invalid input files paired with expected error messages in expected_errors.json. Changes to error text should be noted in the changelog and may require downstream consumers to update.
What Is Not Covered
The canonical suite focuses on the core format. These areas rely on their own test suites:
| Area | Test Location | Notes |
|---|---|---|
| CLI flags and output formatting | tealeaf-core/tests/cli_integration.rs | Tests CLI behavior, not format correctness |
| Derive macros (Rust) | tealeaf-core/tests/derive.rs | Tests DTO conversion, not parsing |
| FFI memory management | tealeaf-ffi unit tests | Tests allocation/deallocation, not format |
| .NET source generator | TeaLeaf.Generators.Tests | Tests code generation, not format |
| .NET serialization | TeaLeaf.Tests | Tests managed-to-native bridge |
| Accuracy benchmark | accuracy-benchmark | Tests LLM accuracy, not format |
Spec Versioning
The format version is embedded in the binary header (see writer.rs). The prose spec documents the current version. When the binary format changes in a backward-incompatible way:
- The format version constant in
writer.rsmust be incremented - The reader (
reader.rs) should handle both old and new versions where feasible - All binary fixtures in
canonical/binary/must be regenerated - The prose spec must document the version change
The project version (release.json) and the binary format version are independent. A project version bump does not necessarily mean a format version bump, and vice versa.
Changelog
v2.0.0-beta.8 (Current)
.NET
- XML documentation in NuGet packages —
TeaLeafandTeaLeaf.Annotationspackages now include XML doc files (TeaLeaf.xml,TeaLeaf.Annotations.xml) for all target frameworks. Consumers get IntelliSense tooltips for all public APIs. Previously,GenerateDocumentationFilewas not enabled and the.xmlfiles were absent from the.nupkg. - Added XML doc comments to all undocumented public members:
TLTypeenum values (13),TLDocument.ToString/Dispose,TLReader.Dispose,TLField.ToString,TLSchema.ToString,TLExceptionconstructors (3) - Enabled
TreatWarningsAsErrorsforTeaLeafandTeaLeaf.Annotations— missing XML docs or other warnings are now compile errors, preventing regressions
Testing
- Added
ToJson_PreservesSpecialCharacters_NoUnicodeEscaping— verifies+,<,>,'survive binary round-trip without Unicode escaping in bothToJson()andToJsonCompact()paths - Added
ToJson_PreservesFloatDecimalPoint_WholeNumbers— verifies whole-number floats (99.0,150.0,0.0) retain.0suffix and non-whole floats (4.5,3.75) preserve decimal digits
v2.0.0-beta.7
.NET
- Fixed
TLReader.ToJson()escaping non-ASCII-safe characters —+in phone numbers rendered as\u002B,</>as\u003C/\u003E, etc.System.Text.Json’s defaultJavaScriptEncoder.DefaultHTML-encodes these characters for XSS safety, which is inappropriate for a data serialization library. All three JSON serialization methods (ToJson,ToJsonCompact,GetAsJson) now useJavaScriptEncoder.UnsafeRelaxedJsonEscapingvia sharedstatic readonlyoptions. - Fixed
TLReader.ToJson()dropping.0suffix from whole-number floats —3582.0in source JSON became3582after binary round-trip becauseSystem.Text.Json’sJsonValue.Create(double)strips trailing.0. AddedFloatToJsonNodehelper that usesF1formatting for whole-number doubles, preserving formatting fidelity with the Rust CLI path.
v2.0.0-beta.6
Features
- Recursive array schema inference in JSON import —
from_json_with_schemasnow discovers schemas for arrays nested inside objects at arbitrary depth (e.g.,items[].product.stock[]). Previously,analyze_nested_objectsonly recursed into nested objects but not nested arrays, causing deeply nested arrays to fall back to[]any. The CLI and derive-macro paths now produce equivalent schema coverage. - Deterministic schema declaration order —
analyze_arrayandanalyze_nested_objectsnow use single-pass field-order traversal (depth-first), matching the derive macro’s field-declaration-order strategy. Previously, both functions made two separate passes (arrays first, then objects), causing schema declarations to appear in a different order than the derive/Builder API path. CLI and Builder API now produce byte-identical.tloutput for the same data.
Bug Fixes
- Fixed binary encoding corruption for
[]anytyped arrays —encode_typed_valueincorrectly wroteTLType::Structas the element type for the “any” pseudo-type (theto_tl_type()default for unknown names), causing the reader to interpret heterogeneous data as struct schema indices. Arrays with mixed element types inside schema-typed objects (e.g.,order.customer,order.payment) now correctly use heterogeneous0xFFencoding when no matching schema exists.
Tooling
- Version sync scripts (
sync-version.ps1,sync-version.sh) now regenerate the workflow diagram (assets/tealeaf_workflow.png) viagenerate_workflow_diagram.pyon each version bump
Testing
- Added
json_any_array_binary_roundtrip— focused regression test verifying[]anyfields inside schema-typed structs survive binary compilation with full data integrity verification - Added
retail_orders_json_binary_roundtrip— end-to-end test exercising JSON → infer schemas → compile → binary read withretail_orders.json(the exact path that was untested) - Added .NET
FromJson_HeterogeneousArrayInStruct_BinaryRoundTrips— mirrors the Rust[]anyregression test through the FFI layer - Strengthened .NET
FromJson_RetailOrdersFixture_CompileRoundTrips— upgraded from string-contains check to structural JSON verification (10 orders, 4 products, 3 customers, spot-check order ID and item count) - Added
json_inference_nested_array_inside_object— verifies arrays nested inside objects (e.g.,items[].product.stock[]) get their own schema and typed array fields - Added
gen_retail_orders_api_tlderive integration test — generates.tlfrom Rust DTOs via Builder API and confirms byte-identical output with CLI path - Added
examples/retail_orders_different_shape_cli.tlandretail_orders_different_shape_api.tlcomparison fixtures (2,395 bytes each, zero diff) - Moved
retail_orders_different_shape.rsfromexamples/totealeaf-core/tests/fixtures/to keep test dependencies within the crate boundary - Verified all 7 fuzz targets pass (~566K total runs, zero crashes)
v2.0.0-beta.5
Features
- Schema-aware serialization for Builder API —
to_tl_with_schemas()now produces compact@tableoutput for documents built viaTeaLeafBuilderwith derive-macro schemas. Previously, PascalCase schema names from#[derive(ToTeaLeaf)](e.g.,SalesOrder) didn’t match the serializer’ssingularize()heuristic (e.g.,"orders"→"order"), causing all arrays to fall back to verbose[{k: v}]format. The serializer now resolves schemas via a 4-step chain: declared type from parent schema → singularize → case-insensitive singularize → structural field matching.
Bug Fixes
- Fixed schema inference name collision when a field singularizes to the same name as its parent array’s schema — prevented self-referencing schemas (e.g.,
@struct root (root: root)) and data loss during round-trip (found via fuzzing) - Fixed
@tableserializer applying wrong schema when the same field name appears at multiple nesting levels with different object shapes — serializer now validates schema fields match the actual object keys before using positional tuple encoding
Testing
- Added 8 Rust regression tests for schema name collisions:
fuzz_repro_dots_in_field_name,schema_name_collision_field_matches_parent,analyze_node_nesting_stress_test,schema_collision_recursive_arrays,schema_collision_recursive_same_shape,schema_collision_three_level_nesting,schema_collision_three_level_divergent_leaves,all_orders_cli_vs_api_roundtrip - Added derive integration test
test_builder_schema_aware_table_output— verifies Builder API with 5 nested PascalCase schemas produces@tableencoding and round-trips correctly - Verified all 7 fuzz targets pass (~445K total runs, zero crashes)
v2.0.0-beta.4
Bug Fixes
- Fixed binary encoding crash when compiling JSON with heterogeneous nested objects —
from_json_with_schemasinfersanypseudo-type for fields whose nested objects have varying shapes; the binary encoder now falls back to generic encoding instead of erroring with “schema-typed field ‘any’ requires a schema” - Fixed parser failing to resolve schema names that shadow built-in type keywords — schemas named
bool,int,string, etc. now correctly resolve via LParen lookahead disambiguation (struct tuples always start with(, primitives never do) - Fixed
singularize()producing empty string for single-character field names (e.g.,"s"→"") — caused@structdefinitions with missing names and unparseable TL text output - Fixed
validate_tokens.pytoken comparison by converting API input tointfor safety
.NET
- Added
TLValueExtensionswithGetRequired()extension methods forTLValueandTLDocument— provides non-nullable access patterns, reducing CS8602 warnings in consuming code - Added TL007 diagnostic:
[TeaLeaf]classes in the global namespace now produce a compile-time error (“TeaLeaf type must be in a named namespace”) - Removed
SuppressDependenciesWhenPackingproperty fromTeaLeaf.Generators.csproj - Exposed
InternalsVisibleToforTeaLeaf.Tests
CI/CD
- Re-enabled all 6 GitHub Actions workflows after making the repository public (rust-cli, dotnet-package, accuracy-benchmark, docs, coverage, fuzz)
- Fixed coverlet filter quoting in coverage workflow — commas URL-encoded as
%2cto prevent shell argument splitting - Fixed Codecov token handling — made
CODECOV_TOKENoptional for public repo tokenless uploads - Fixed Codecov multi-file upload format — changed from YAML block scalar to comma-separated single-line
- Refactored coverage workflow to use
dotnet-coveragewith dedicated settings XML files - Added CodeQL security analysis workflow
- Fixed accuracy-benchmark workflow permissions
Testing
- Added Rust regression test for
anypseudo-type compile round-trip - Added 21 Rust tests for schema names shadowing all built-in type keywords (
bool,int,int8..int64,uint..uint64,float,float32,float64,string,timestamp,bytes) — covers JSON inference round-trip, direct TL parsing, self-referencing schemas, duplicate declarations, and multiple built-in-named schemas in one document - Added 4 .NET regression tests covering
TLDocument.FromJson→Compilewith heterogeneous nested objects, mixed-structure arrays, complex schema inference, and retail_orders.json end-to-end - Added .NET tests for JSON serialization of timestamps and byte arrays
- Added .NET coverage tests for multi-word enums and nullable nested objects
- Added .NET source generator tests (524 new lines in
GeneratorTests.cs) including TL007 global namespace diagnostic - Added .NET
TLValue.GetRequired()extension method tests - Added .NET
TLReaderbinary reader tests (168 new lines) - Added cross-platform
FindRepoFilehelper for .NET test fixture discovery (walks up directory tree instead of hardcoded relative path depth) - Verified full .NET test suite on Linux (WSL Ubuntu 24.04)
Tooling
- Added
--version/-VCLI flag - Added
delete-caches.ps1anddelete-caches.shGitHub Actions cache cleanup scripts - Updated
coverage.ps1to supportdotnet-coveragecollection with XML settings files
Documentation
- Updated binary deserialization method names in quick-start, LLM context guide, schema evolution guide, and derive macros docs
- Updated tealeaf workflow diagram
v2.0.0-beta.3
Features
- Byte literals —
b"..."hex syntax for byte data in text format (e.g.,payload: b"cafef00d") - Arbitrary-precision numbers —
Value::JsonNumberpreserves exact decimal representation for numbers exceeding native type ranges - Insertion order preservation —
IndexMapreplacesHashMapfor all user-facing containers; JSON round-trips now preserve original key order (ADR-0001) - Timestamp timezone support — Timestamps encode timezone offset in minutes (10 bytes: 8 millis + 2 offset); supports
Z,+HH:MM,-HH:MM,+HHformats - Special float values —
NaN,inf,-infkeywords for IEEE 754 special values (JSON export converts tonull) - Extended escape sequences —
\b(backspace),\f(form feed),\uXXXX(Unicode code points) for full JSON string escape parity - Forward compatibility — Unknown directives silently ignored, enabling older implementations to partially parse files with newer features (spec §1.18)
Bug Fixes
- Fixed bounds check failures and bitmap overflow issues in binary decoder
- Fixed lexer infinite loop on certain malformed inputs (found via fuzzing)
- Fixed NaN value quoting causing incorrect round-trip behavior
- Fixed parser crashes on deeply nested structures
- Fixed integer overflow in varint decoding
- Fixed off-by-one errors in array length checks
- Fixed negative hex/binary literal parsing
- Fixed exponent-only numbers (e.g.,
1e3) to parse as floats, not integers - Fixed timestamp timezone parsing to accept hour-only offsets (
+05=+05:00) - Rejected value-only types (
object,map,tuple,ref,tagged) as schema field types per spec §2.1 - Fixed .NET package publishing for
TeaLeaf.AnnotationsandTeaLeaf.Generatorsto NuGet
Performance
- Removed O(n log n) key sorting from all serialization paths: 6-17% faster for small/medium objects, up to 69% faster for tabular data
- Binary decode 56-105% slower for generic object workloads due to
IndexMapinsertion cost (acceptable trade-off per ADR-0001; columnar workloads less affected)
Specification
- Schema table header byte +6 stores Union Count (was reserved)
- String table length encoding changed from
u16tou32for strings > 65KB - Added type code
0x12forJSONNUMBER - Timestamp encoding extended to 10 bytes (8 millis + 2 offset)
- Added
bytes_litgrammar production; extendednumberto includeNaN/inf/-inf - Documented
object,map,ref,taggedas value-only types (not valid in schema fields) - Resolved compression algorithm spec contradiction: binary format v2 uses ZLIB (deflate), not zstd (ADR-0004)
Tooling
- Fuzzing infrastructure — 7 cargo-fuzz targets with custom dictionaries and structure-aware generation (ADR-0002)
- Fuzzing CI workflow — GitHub Actions runs all targets for 120s each (~15 min per run)
- Nesting depth limit — 256-level max for stack overflow protection (ADR-0003)
- VS Code extension — Syntax highlighting for
.tlfiles (vscode-tealeaf/) - FFI safety — Comprehensive
# Safetydocs on all FFI functions; regeneratedtealeaf.h - Token validation —
validate_tokens.pyscript validates API-reported token counts against tiktoken - Maintenance scripts —
delete-deploymentsanddelete-workflow-runsfor GitHub cleanup
Testing
- 238+ adversarial tests for malformed binary input
- 333+ .NET edge case tests for FFI boundary conditions
- Property-based tests with depth-bounded recursive generation
- Accuracy benchmark token savings updated to ~36% fewer data tokens (validated with tiktoken)
Documentation
- ADR-0001: IndexMap for Insertion Order Preservation
- ADR-0002: Fuzzing Architecture and Strategy
- ADR-0003: Maximum Nesting Depth Limit (256)
- ADR-0004: ZLIB Compression for Binary Format
- Code of Conduct, SECURITY.md, GitHub issue/PR templates
examples/showcase.tl— 736-line comprehensive format demonstration- Sample accuracy benchmark results
Breaking Changes
Value::ObjectusesIndexMap<String, Value>instead ofHashMap(type aliasObjectMapprovided;From<HashMap>retained for backward compatibility)Value::Timestamp(i64)→Value::Timestamp(i64, i16)— second field is timezone offset in minutesValue::JsonNumber(String)variant added — match expressions onValueneed new arm- Binary timestamps not backward-compatible (beta.2 readers cannot decode beta.3 timestamps; beta.3 readers handle beta.2 files by defaulting offset to UTC)
- JSON round-trips preserve key order instead of alphabetizing
v2.0.0-beta.2
Format
@uniondefinitions now encoded in binary schema table (full text-binary-text roundtrip)- Union schema region uses backward-compatible extension of schema table header
- Derive macro
collect_unions()generates union definitions for Rust enums TeaLeafBuilder::add_union()for programmatic union construction
Improvements
- Version sync automation expanded to cover all project files (16 targets)
- NuGet package icon added to all NuGet packages (TeaLeaf, Annotations, Generators)
- CI badges added to README (Rust CI, .NET CI, crates.io, NuGet, codecov, License)
- crates.io publish ordering fixed (
tealeaf-derivebeforetealeaf-core) - Contributing guide added (
CONTRIBUTING.md) - Spec governance documentation added
- Accuracy benchmark
dump-promptssubcommand for offline prompt inspection TeaLeaf.Annotationspublished as separate NuGet package (fixes dependency resolution)benches_proto/excluded from crates.io package (removesprotocrequirement for consumers)
v2.0.0-beta.1
Initial public beta release.
Format
- Text format (
.tl) with comments, schemas, and all value types - Binary format (
.tlbx) with string deduplication, schema embedding, and per-section compression - 15 primitive types + 6 container/semantic types
- Inline schemas with
@struct,@table,@map,@union - References (
!name) and tagged values (:tag value) - File includes (
@include) - ISO 8601 timestamp support
- JSON bidirectional conversion with schema inference
CLI
- 8 commands:
compile,decompile,info,validate,to-json,from-json,tlbx-to-json,json-to-tlbx - Pre-built binaries for 7 platforms (Windows, Linux, macOS – x64 and ARM64)
Rust
tealeaf-corecrate with full parser, compiler, and readertealeaf-derivecrate with#[derive(ToTeaLeaf, FromTeaLeaf)]- Builder API (
TeaLeafBuilder) - Memory-mapped binary reading
- Conversion traits with automatic schema collection
.NET
TeaLeafNuGet package with native libraries for all platforms- C# incremental source generator (
[TeaLeaf]attribute) - Reflection-based serializer (
TeaLeafSerializer) - Managed wrappers (
TLDocument,TLValue,TLReader) - Schema introspection API
- Diagnostic codes TL001-TL006
FFI
- C-compatible API via
tealeaf-fficrate - 45+ exported functions
- Thread-safe error handling
- Null-safe for all pointer parameters
- C header generation via
cbindgen
Known Limitations
Bytes type does not round-trip through text format(resolved:b"..."hex literals added)- JSON import does not recognize
$ref,$tag, or timestamp strings - Individual string length limited to ~4 GB (u32) in binary format
- 64-byte header overhead makes TeaLeaf inefficient for very small objects