Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

TeaLeaf Data Format

A schema-aware data format with human-readable text and compact binary representation.

~36% fewer data tokens than JSON for LLM applications, with zero accuracy loss.

v2.0.0-beta.8


What is TeaLeaf?

TeaLeaf is a data format that bridges the gap between human-readable configuration and machine-efficient binary storage. A single .tl source file can be read and edited by humans, compiled to a compact .tlbx binary, and converted to/from JSON – all with schemas inline.

TeaLeaf – schemas with nested structures, compact positional data:

# Schema: define structure once
@struct Location (city, country)
@struct Department (name, location: Location)
@struct Employee (
  id: int,
  name,
  role,
  department: Department,
  skills: []string,
)

# Data: field names not repeated
employees: @table Employee [
  (1, "Alice", "Engineer",
    ("Platform", ("Seattle", "USA")),
    ["rust", "python"])
  (2, "Bob", "Designer",
    ("Product", ("Austin", "USA")),
    ["figma", "css"])
  (3, "Carol", "Manager",
    ("Platform", ("Seattle", "USA")),
    ["leadership", "agile"])
]

JSON – no schema, names repeated:

{
  "employees": [
    {
      "id": 1,
      "name": "Alice",
      "role": "Engineer",
      "department": {
        "name": "Platform",
        "location": { "city": "Seattle", "country": "USA" }
      },
      "skills": ["rust", "python"]
    },
    {
      "id": 2,
      "name": "Bob",
      "role": "Designer",
      "department": {
        "name": "Product",
        "location": { "city": "Austin", "country": "USA" }
      },
      "skills": ["figma", "css"]
    },
    {
      "id": 3,
      "name": "Carol",
      "role": "Manager",
      "department": {
        "name": "Platform",
        "location": { "city": "Seattle", "country": "USA" }
      },
      "skills": ["leadership", "agile"]
    }
  ]
}

Key Features

FeatureDescription
Dual formatHuman-readable text (.tl) and compact binary (.tlbx)
Inline schemas@struct definitions live alongside data – no external .proto files
JSON interopBidirectional conversion with automatic schema inference
String deduplicationBinary format stores each unique string once
CompressionPer-section ZLIB compression with null bitmaps
Comments# line comments in the text format
Language bindingsNative Rust, .NET (via FFI + source generator)
CLI toolingtealeaf compile, decompile, validate, info, JSON conversion

Why TeaLeaf?

The existing data format landscape presents trade-offs that TeaLeaf attempts to bridge. TeaLeaf does not attempt to replace any of the formats listed below, but rather presents a different perspective that users can objectively compare to identify if it fits their specific use cases.

FormatObservation
JSONVerbose, no comments, no schema
YAMLIndentation-sensitive, error-prone at scale
ProtobufSchema external, binary-only, requires codegen
AvroSchema embedded but not human-readable
CSV/TSVToo simple for nested or typed data
MessagePack/CBORCompact but schemaless

TeaLeaf unifies these concerns:

  • Human-readable text format with explicit types and comments
  • Compact binary with embedded schemas – no external schema files needed
  • Schema-first design – field names defined once, not repeated per record
  • No codegen required – schemas discovered at runtime
  • Built-in JSON conversion for easy integration with existing tools

Primary Use Case: LLM API Data Payloads

TeaLeaf is well-suited for assembling and managing context for large language models – sending business data, analytics, and structured payloads to LLM APIs where token efficiency directly impacts API costs.

Why TeaLeaf for LLM context:

  • ~36% fewer data tokens — verified across Claude Sonnet 4.5 and GPT-5.2 (12 tasks, 10 domains; savings increase with larger datasets)
  • Zero accuracy lossbenchmark scores within noise (0.988 vs 0.978 Anthropic, 0.901 vs 0.899 OpenAI)
  • Binary format for fast cached context retrieval
  • String deduplication (roles, field names, common values stored once)
  • Human-readable text for prompt authoring

Token savings example (retail orders dataset):

FormatCharactersTokens (GPT-5.x)Savings
JSON36,7919,829
TeaLeaf14,5425,63243% fewer tokens

Size Comparison

FormatSmall Object10K Points1K Users
JSON1.00x1.00x1.00x
Protobuf0.38x0.65x0.41x
MessagePack0.35x0.63x0.38x
TeaLeaf Text1.38x0.87x0.63x
TeaLeaf Compressed3.56x0.15x0.47x

TeaLeaf has 64-byte header overhead (not ideal for tiny objects). For large arrays with compression, TeaLeaf achieves 6-7x better compression than JSON.

Trade-off: TeaLeaf decode is ~2-5x slower than Protobuf due to dynamic key-based access. Choose TeaLeaf when size matters more than decode speed.

Project Structure

tealeaf/
├── tealeaf-core/       # Rust core: parser, compiler, reader, CLI
├── tealeaf-derive/     # Rust proc-macro: #[derive(ToTeaLeaf, FromTeaLeaf)]
├── tealeaf-ffi/        # C-compatible FFI layer
├── bindings/
│   └── dotnet/         # .NET bindings + source generator
├── canonical/          # Canonical test fixtures
├── spec/               # Format specification
└── examples/           # Example files and workflows

License

TeaLeaf is licensed under the MIT License.

Source code: github.com/krishjag/tealeaf

Installation

Pre-built Binaries

Download the latest release from GitHub Releases.

PlatformArchitectureDownload
Windowsx64tealeaf-windows-x64.zip
WindowsARM64tealeaf-windows-arm64.zip
Linuxx64 (glibc)tealeaf-linux-x64.tar.gz
LinuxARM64 (glibc)tealeaf-linux-arm64.tar.gz
Linuxx64 (musl)tealeaf-linux-musl-x64.tar.gz
macOSx64 (Intel)tealeaf-macos-x64.tar.gz
macOSARM64 (Apple Silicon)tealeaf-macos-arm64.tar.gz

Quick Install

Windows (PowerShell)

# Download and extract to current directory
Invoke-WebRequest -Uri "https://github.com/krishjag/tealeaf/releases/latest/download/tealeaf-windows-x64.zip" -OutFile tealeaf.zip
Expand-Archive tealeaf.zip -DestinationPath .

# Optional: add to PATH
$env:PATH += ";$PWD"

Linux / macOS

# Download and extract (replace with your platform)
curl -LO https://github.com/krishjag/tealeaf/releases/latest/download/tealeaf-linux-x64.tar.gz
tar -xzf tealeaf-linux-x64.tar.gz

# Optional: move to PATH
sudo mv tealeaf /usr/local/bin/

Build from Source

Requires the Rust toolchain (1.70+).

git clone https://github.com/krishjag/tealeaf.git
cd tealeaf
cargo build --release --package tealeaf-core

The binary will be at target/release/tealeaf (or tealeaf.exe on Windows).

Verify Installation

tealeaf --version
# tealeaf 2.0.0-beta.8

tealeaf help

Rust Crate

Add tealeaf-core to your Cargo.toml:

[dependencies]
tealeaf-core = { version = "2.0.0-beta.8", features = ["derive"] }

The derive feature enables #[derive(ToTeaLeaf, FromTeaLeaf)] macros.

.NET NuGet Package

dotnet add package TeaLeaf

The NuGet package includes everything needed:

  • TeaLeaf.Annotations[TeaLeaf], [TLSkip], and other attributes
  • TeaLeaf.Generators – C# incremental source generator (bundled as an analyzer)
  • Native libraries for all supported platforms (Windows, Linux, macOS – x64 and ARM64)

No additional packages required. [TeaLeaf] classes get compile-time serialization methods automatically.

Note: The .NET package requires .NET 8.0 or later. The source generator requires a C# compiler with incremental generator support.

Quick Start

This guide walks through the core TeaLeaf workflow: write text, compile to binary, and convert to/from JSON.

1. Write a TeaLeaf File

Create example.tl:

# Define schemas
@struct address (street: string, city: string, zip: string)
@struct user (
  id: int,
  name: string,
  email: string?,
  address: address,
  active: bool,
)

# Data uses schemas -- field names defined once, not repeated
users: @table user [
  (1, "Alice", "alice@example.com", ("123 Main St", "Seattle", "98101"), true),
  (2, "Bob", ~, ("456 Oak Ave", "Austin", "78701"), false),
]

# Plain key-value pairs
app_version: "2.0.0-beta.2"
debug: false

2. Validate

Check that the file is syntactically correct:

tealeaf validate example.tl

3. Compile to Binary

Compile to the compact binary format:

tealeaf compile example.tl -o example.tlbx

4. Inspect

View information about either format:

tealeaf info example.tl
tealeaf info example.tlbx

5. Convert to JSON

# Text to JSON
tealeaf to-json example.tl -o example.json

# Binary to JSON
tealeaf tlbx-to-json example.tlbx -o example_from_binary.json

6. Convert from JSON

# JSON to TeaLeaf text (with automatic schema inference)
tealeaf from-json example.json -o reconstructed.tl

# JSON to TeaLeaf binary
tealeaf json-to-tlbx example.json -o direct.tlbx

7. Decompile

Convert binary back to text:

tealeaf decompile example.tlbx -o decompiled.tl

Complete Workflow

example.tl ──compile──> example.tlbx ──decompile──> decompiled.tl
    │                       │
    ├──to-json──> example.json <──tlbx-to-json──┘
    │                │
    └──from-json─────┘

Using the Rust API

use tealeaf::TeaLeaf;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Parse text format
    let doc = TeaLeaf::load("example.tl")?;

    // Access values
    if let Some(users) = doc.get("users") {
        println!("Users: {:?}", users);
    }

    // Compile to binary
    doc.compile("example.tlbx", true)?;

    // Convert to JSON
    let json = doc.to_json()?;
    println!("{}", json);

    Ok(())
}

With Derive Macros

use tealeaf::{TeaLeaf, ToTeaLeaf, FromTeaLeaf, ToTeaLeafExt};

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
    id: i32,
    name: String,
    #[tealeaf(optional)]
    email: Option<String>,
    active: bool,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let user = User {
        id: 1,
        name: "Alice".into(),
        email: Some("alice@example.com".into()),
        active: true,
    };

    // Serialize to TeaLeaf text
    let text = user.to_tl_string("user");
    println!("{}", text);

    // Compile directly to binary
    user.to_tlbx("user", "user.tlbx", false)?;

    // Deserialize from binary
    let reader = tealeaf::Reader::open("user.tlbx")?;
    let loaded = User::from_tealeaf_value(&reader.get("user")?)?;

    Ok(())
}

Using the .NET API

Source Generator (Compile-Time)

using TeaLeaf;
using TeaLeaf.Annotations;

[TeaLeaf]
public partial class User
{
    public int Id { get; set; }
    public string Name { get; set; } = "";

    [TLOptional]
    public string? Email { get; set; }

    public bool Active { get; set; }
}

// Serialize
var user = new User { Id = 1, Name = "Alice", Active = true };
string text = user.ToTeaLeafText();
string json = user.ToTeaLeafJson();
user.CompileToTeaLeaf("user.tlbx");

// Deserialize
using var doc = TLDocument.ParseFile("user.tlbx");
var loaded = User.FromTeaLeaf(doc);

Reflection Serializer (Runtime)

using TeaLeaf;

var user = new User { Id = 1, Name = "Alice", Active = true };

// Serialize
string docText = TeaLeafSerializer.ToDocument(user);
TeaLeafSerializer.Compile(user, "user.tlbx");

// Deserialize
var loaded = TeaLeafSerializer.FromText<User>(docText);

Next Steps

Core Concepts

This page introduces the fundamental ideas behind TeaLeaf.

Dual Format

TeaLeaf has two representations of the same data:

Text (.tl)Binary (.tlbx)
PurposeAuthoring, version control, reviewStorage, transmission, deployment
Human-readableYesNo
CommentsYes (#)Stripped during compilation
SchemasInline @struct definitionsEmbedded in schema table
SizeLarger (field names in data)Compact (positional, deduplicated)
SpeedSlower to parseFast random-access via memory mapping

The .tl file is the source of truth. Binary files are compiled artifacts – regenerate them when the source changes.

Schemas

Schemas define the structure of your data using @struct:

@struct point (x: int, y: int)
@struct line (start: point, end: point, color: string?)

Key properties:

  • Inline – schemas live in the same file as data
  • Positional – binary encoding uses field order, not names
  • Nestable – structs can reference other structs
  • Nullable – fields marked with ? accept null (~)

Schemas enable @table for compact tabular data:

points: @table point [
  (0, 0),
  (100, 200),
  (-50, 75),
]

Without schemas, the same data would require repeating field names:

# Without schemas -- verbose
points: [
  {x: 0, y: 0},
  {x: 100, y: 200},
  {x: -50, y: 75},
]

Type System

TeaLeaf has a rich type system with primitives, containers, and modifiers.

Primitives

TypeDescriptionExample
boolBooleantrue, false
int / int3232-bit signed integer42, -17
int6464-bit signed integer9999999999
uint / uint3232-bit unsigned integer255
float / float6464-bit float3.14, 6.022e23
stringUTF-8 text"hello", alice
bytesRaw binary datab"cafef00d"
timestampISO 8601 date/time2024-01-15T10:30:00Z

Containers

SyntaxDescription
[]TArray of type T
T?Nullable type T
@map { ... }Ordered key-value map
{ key: value }Untyped object

Null

The tilde ~ represents null:

optional_field: ~

Key-Value Documents

A TeaLeaf document is a collection of named key-value sections:

# Each top-level entry is a "section" in the binary format
config: {host: localhost, port: 8080}
users: @table user [(1, alice), (2, bob)]
version: "2.0.0-beta.2"

Keys become section names in the binary file. You access values by key at runtime.

References

References allow data reuse and graph structures:

# Define a reference
!seattle: {city: "Seattle", state: "WA"}

# Use it in multiple places
office: !seattle
warehouse: !seattle

Tagged Values

Tags add a discriminator label to values, enabling sum types:

events: [
  :click {x: 100, y: 200},
  :scroll {delta: -50},
  :keypress {key: "Enter"},
]

Unions

Named discriminated unions:

@union shape {
  circle (radius: float),
  rectangle (width: float, height: float),
  point (),
}

shapes: [:circle (5.0), :rectangle (10.0, 20.0), :point ()]

Union definitions are preserved through binary compilation and decompilation, including variant names, field names, and field types.

Compilation Pipeline

   .tl (text)
      │
      ├── parse ──> in-memory document (TeaLeaf / TLDocument)
      │                    │
      │                    ├── compile ──> .tlbx (binary)
      │                    ├── to_json ──> .json
      │                    └── to_tl_text ──> .tl (round-trip)
      │
   .tlbx (binary)
      │
      ├── reader ──> random-access values (zero-copy with mmap)
      │                    │
      │                    ├── decompile ──> .tl
      │                    └── to_json ──> .json
      │
   .json
      │
      └── from_json ──> in-memory document
                             │
                             └── (with schema inference for arrays)

File Includes

Split large files into modules:

@include "schemas/common.tl"
@include "./data/users.tl"

Paths are resolved relative to the including file.

Next Steps

  • Text Format – complete syntax reference
  • Type System – all types and modifiers in detail
  • Schemas – schema definitions, tables, and nesting

Text Format

The TeaLeaf text format (.tl) is the human-readable representation. This page is the complete syntax reference.

Comments

Comments begin with # and extend to end of line:

# This is a line comment
name: alice  # inline comment

Comments are stripped during compilation to binary.

Strings

Simple (Unquoted)

Bare identifiers that contain no whitespace or special characters:

name: alice
host: localhost
status: active

Valid characters: letters, digits, _, -, .

Quoted

Double-quoted strings with escape sequences:

greeting: "hello world"
path: "C:\\Users\\name"
message: "line1\nline2"
tab_separated: "col1\tcol2"

Escape sequences: \\, \", \n, \t, \r, \b (backspace), \f (form feed), \uXXXX (Unicode code point, 4 hex digits)

Multiline (Triple-Quoted)

Triple-quoted strings with automatic leading whitespace removal:

description: """
  This is a multiline string.
  Leading whitespace is trimmed based on
  the indentation of the first content line.
  Useful for documentation blocks.
"""

Numbers

Integers

count: 42
negative: -17
zero: 0

Floats

price: 3.14
scientific: 6.022e23
negative_exp: 1.5e-10

Numbers with exponent notation but no decimal point (e.g., 1e3) are parsed as floats.

Hexadecimal

color: 0xFF5500
mask: 0x00A1

Binary Literals

flags: 0b1010
byte_val: 0b11110000

Both lowercase (0x, 0b) and uppercase (0X, 0B) prefixes are accepted.

Negative hex and binary literals are supported: -0xFF, -0b1010.

Bytes Literals

payload: b"cafef00d"
empty: b""
checksum: b"CAFE"

Hex digits only (uppercase or lowercase), even length, no spaces.

Special Float Values

not_a_number: NaN
positive_infinity: inf
negative_infinity: -inf

These keywords represent IEEE 754 special values. In JSON export, NaN and infinity values are converted to null.

Boolean and Null

enabled: true
disabled: false
missing: ~

The tilde (~) is the null literal.

Timestamps

ISO 8601 formatted date/time values:

# Date only
created: 2024-01-15

# Date and time (UTC)
updated: 2024-01-15T10:30:00Z

# With milliseconds
precise: 2024-01-15T10:30:00.123Z

# With timezone offset
local: 2024-01-15T10:30:00+05:30

Format: YYYY-MM-DD[THH:MM[:SS[.sss]][Z|+HH:MM|-HH:MM]]

Seconds (:SS) are optional and default to 00. Timestamps are stored internally as Unix milliseconds (i64).

Objects

Curly-brace delimited key-value collections:

# Inline
point: {x: 10, y: 20}

# Multi-line
config: {
  host: localhost,
  port: 8080,
  debug: false,
}

Trailing commas are allowed.

Arrays

Square-bracket delimited ordered collections:

numbers: [1, 2, 3, 4, 5]
mixed: [1, "hello", true, ~]
nested: [[1, 2], [3, 4]]
empty: []

Tuples

Parenthesized value lists. Outside of @table, tuples are parsed as plain arrays:

# This is an array [0, 0], NOT a struct
origin: (0, 0)

Inside a @table context, tuples are bound to the table’s schema:

@struct point (x: int, y: int)
points: @table point [
  (0, 0),       # bound to point schema
  (100, 200),
]

Maps

Ordered key-value maps with the @map directive. Unlike objects, maps support non-string keys:

# String keys
headers: @map {
  "Content-Type": "application/json",
  "Accept": "*/*",
}

# Integer keys
status_codes: @map {
  200: "OK",
  404: "Not Found",
  500: "Internal Server Error",
}

# Mixed value types
config: @map {
  name: "myapp",
  port: 8080,
  debug: true,
}

Maps preserve insertion order and support heterogeneous key types.

References

Define named values and reuse them:

# Define a reference
!node_a: {label: "Start", value: 1}
!node_b: {label: "End", value: 2}

# Use references
edges: [
  {from: !node_a, to: !node_b, weight: 1.0},
  {from: !node_b, to: !node_a, weight: 0.5},
]

# References can be used multiple times
nodes: [!node_a, !node_b]

References can be defined at the top level or inside objects.

Tagged Values

A colon prefix adds a discriminator tag to any value:

events: [
  :click {x: 100, y: 200},
  :scroll {delta: -50},
  :keypress {key: "Enter"},
]

Tags are useful for discriminated unions and variant types.

Unions

Named discriminated unions with @union:

@union shape {
  circle (radius: float),
  rectangle (width: float, height: float),
  point (),
}

shapes: [
  :circle (5.0),
  :rectangle (10.0, 20.0),
  :point (),
]

Union definitions are encoded in the binary schema table alongside struct definitions, preserving variant names, field names, and field types through compilation and decompilation.

Root Array

The @root-array directive marks the document as representing a top-level JSON array. This is primarily used for JSON round-trip fidelity.

When a root-level JSON array is imported via from-json, TeaLeaf stores each element as a numbered key (0, 1, 2, …) and emits @root-array so that to-json reconstructs the original array structure:

@root-array

0: {id: 1, name: alice}
1: {id: 2, name: bob}
2: {id: 3, name: carol}

Without @root-array, exporting to JSON would produce {"0": {...}, "1": {...}, ...}. With it, the output is [{...}, {...}, ...].

The directive takes no arguments and must appear before any data pairs.

Unknown Directives

Unknown directives (e.g., @custom) at the document top level are silently ignored. If a same-line argument follows the directive (e.g., @custom foo or @custom [1,2,3]), it is consumed and discarded. Arguments on the next line are not consumed — they are parsed as normal statements. This enables forward compatibility: files authored for a newer spec version can be partially parsed by older implementations that do not recognize new directives.

When an unknown directive appears as a value (e.g., key: @unknown [1,2,3]), it is treated as null. The argument expression is consumed but discarded.

File Includes

Import other TeaLeaf files:

@include "schemas/common.tl"
@include "./shared/config.tl"

Paths are resolved relative to the including file. Included schemas are available for @table use in the including file.

Formatting Rules

  • Trailing commas are allowed in objects, arrays, tuples, and maps
  • Whitespace is flexible – indent as you like
  • Key names follow identifier rules: start with letter or _, then letters, digits, _, -, .
  • Quoted keys are supported for names with special characters: "Content-Type": "application/json"

Type System

TeaLeaf has a rich type system covering primitives, containers, and type modifiers.

Primitive Types

TypeAliasesDescriptionBinary Size
booltrue/false1 byte
int8Signed 8-bit integer1 byte
int16Signed 16-bit integer2 bytes
intint32Signed 32-bit integer4 bytes
int64Signed 64-bit integer8 bytes
uint8Unsigned 8-bit integer1 byte
uint16Unsigned 16-bit integer2 bytes
uintuint32Unsigned 32-bit integer4 bytes
uint64Unsigned 64-bit integer8 bytes
float3232-bit IEEE 754 float4 bytes
floatfloat6464-bit IEEE 754 float8 bytes
stringUTF-8 textvariable
bytesRaw binary datavariable
json_numberArbitrary-precision numeric string (from JSON)variable
timestampUnix milliseconds (i64) + timezone offset (i16)10 bytes

Type Modifiers

field: string          # required string
field: string?         # nullable string (can be ~)
field: []string        # required array of strings
field: []string?       # nullable array of strings (the field itself can be ~)
field: []user          # array of structs

The ? modifier applies to the field, not array elements. However, the parser does accept ~ (null) values inside arrays, including schema-typed arrays. Null elements are tracked in the null bitmap.

Value Types (Not Schema Types)

The following are value types that appear in data but cannot be declared as field types in @struct:

TypeDescription
objectUntyped { key: value } collections
mapOrdered @map { key: value } with any key type
refReference (!name) to another value
taggedTagged value (:tag value)

For structured fields, define a named struct and use it as the field type. For tagged values with a known set of variants, define a @union – this provides schema metadata (variant names, field names, field types) that is preserved in the binary format.

Type Widening

When reading binary data, automatic safe conversions apply:

  • int8int16int32int64
  • uint8uint16uint32uint64
  • float32float64

Narrowing conversions are not automatic and require recompilation.

Type Inference

Standalone Values

When writing, the smallest representation is selected:

  • Integers: i8 if fits, else i16, else i32, else i64
  • Unsigned: u8 if fits, else u16, else u32, else u64
  • Floats: always f64 at runtime

Homogeneous Arrays

Arrays of uniform type use optimized encoding:

Array ContentsEncoding Strategy
Schema-typed objects (matching a @struct)Struct array encoding with null bitmaps
Value::Int arraysPacked Int32 encoding
Value::String arraysString table indices (u32)
All other arrays (UInt, Float, Bool, mixed, etc.)Heterogeneous encoding with per-element type tags

Type Coercion at Compile Time

When compiling schema-bound data, type mismatches use default values rather than erroring:

Target TypeMismatch Behavior
Numeric fieldsIntegers/floats coerce; non-numeric becomes 0
String fieldsNon-string becomes empty string ""
Bytes fieldsNon-bytes becomes empty bytes (length 0)
Timestamp fieldsNon-timestamp becomes epoch (0)

This “best effort” approach prioritizes successful compilation over strict validation. Validate at the application level before compilation for strict type checking.

Bytes Literal

The text format supports b"..." hex literals for byte data:

payload: b"cafef00d"
empty: b""
checksum: b"CA FE"   # ERROR -- no spaces allowed
  • Contents are hex digits only (uppercase or lowercase)
  • Length must be even (2 hex chars per byte)
  • dumps() and decompile emit b"..." for Value::Bytes, enabling full text round-trip
  • JSON export encodes bytes as "0xcafef00d" strings; JSON import does not auto-convert back to bytes

Schemas

Schemas are the foundation of TeaLeaf’s compact encoding. They define structure once so data can use positional encoding.

Defining Schemas

Use @struct to define a schema:

@struct point (x: int, y: int)

With multiple fields and types:

@struct user (
  id: int,
  name: string,
  email: string?,
  active: bool,
)

Optional Type Annotations

Field types can be omitted – they default to string:

@struct config (host, port: int, debug: bool)
# host defaults to string type

Using Schemas with @table

The @table directive binds an array of tuples to a schema:

@struct user (id: int, name: string, email: string)

users: @table user [
  (1, "Alice", "alice@example.com"),
  (2, "Bob", "bob@example.com"),
  (3, "Carol", "carol@example.com"),
]

Each tuple’s values are matched positionally to the schema fields.

Nested Structs

Structs can reference other structs. Nested tuples inherit schema binding from their parent field type:

@struct address (street: string, city: string, zip: string)

@struct person (
  name: string,
  home: address,
  work: address?,
)

people: @table person [
  (
    "Alice Smith",
    ("123 Main St", "Berlin", "10115"),     # Parsed as address
    ("456 Office Blvd", "Berlin", "10117"), # Parsed as address
  ),
]

Deep Nesting

Schemas can nest arbitrarily deep:

@struct method (type: string, last_four: string)
@struct payment (amount: float, method: method)
@struct order (id: int, customer: string, payment: payment)

orders: @table order [
  (1, "Alice", (99.99, ("credit", "4242"))),
  (2, "Bob", (49.50, ("debit", "1234"))),
]

Array Fields

Schema fields can be arrays of primitives or other structs:

@struct employee (
  id: int,
  name: string,
  skills: []string,
  scores: []int,
)

employees: @table employee [
  (1, "Alice", ["rust", "python"], [95, 88]),
  (2, "Bob", ["java"], [72]),
]

Nullable Fields

The ? modifier makes a field nullable:

@struct user (
  id: int,
  name: string,
  email: string?,   # can be ~
  phone: string?,   # can be ~
)

users: @table user [
  (1, "Alice", "alice@example.com", "+1-555-0100"),
  (2, "Bob", ~, ~),  # email and phone are null
]

Binary Encoding Benefits

Schemas enable significant binary compression:

  1. Positional storage – field names stored once in the schema table, not per row
  2. Null bitmaps – one bit per nullable field per row, instead of full null markers
  3. Type-homogeneous arrays – packed encoding when all elements match a schema
  4. String deduplication – repeated values like city names stored once in the string table

Example Size Savings

For 1,000 user records with 5 fields:

ApproachApproximate Size
JSON (field names repeated)~80KB
TeaLeaf text (schema + tuples)~35KB
TeaLeaf binary (compressed)~15KB

Schema Compatibility

Compatible Changes

ChangeNotes
Rename fieldData is positional; names are documentation only
Widen typeint8int64, float32float64 (automatic)

Incompatible Changes (Require Recompile)

ChangeResolution
Add fieldRecompile source .tl file
Remove fieldRecompile source .tl file
Reorder fieldsRecompile source .tl file
Narrow typeRecompile source .tl file

Recompilation Workflow

The .tl file is the master. When schemas change:

tealeaf compile data.tl -o data.tlbx

TeaLeaf prioritizes simplicity over automatic schema evolution:

  • No migration machinery – recompile when schemas change
  • No version negotiation – the embedded schema is the source of truth
  • Explicit over implicit – tuples require values for all fields

Binary Format

The TeaLeaf binary format (.tlbx) is the compact, machine-efficient representation. This page documents the binary layout.

Constants

ConstantValue
MagicTLBX (4 bytes, ASCII)
Version Major2
Version Minor0
Header Size64 bytes

File Structure

┌──────────────────┐
│ Header (64 B)    │
├──────────────────┤
│ String Table     │
├──────────────────┤
│ Schema Table     │
├──────────────────┤
│ Section Index    │
├──────────────────┤
│ Data Sections    │
└──────────────────┘

All multi-byte values are little-endian.

Header (64 bytes)

OffsetSizeFieldDescription
04MagicTLBX
42Version Major2
62Version Minor0
84Flagsbit 0: compress (advisory), bit 1: root_array
124Reserved(unused)
168String Table Offsetu64 LE
248Schema Table Offsetu64 LE
328Index Offsetu64 LE
408Data Offsetu64 LE
484String Countu32 LE
524Schema Countu32 LE
564Section Countu32 LE
604Reserved(for future checksum; currently 0)

Flag semantics:

  • Bit 0 (COMPRESS): Advisory. Indicates one or more sections use ZLIB (deflate) compression. Compression is determined per-section via the entry flags in the section index. This flag is a hint for tooling only.
  • Bit 1 (ROOT_ARRAY): Indicates the source document was a root-level JSON array.

String Table

All unique strings are deduplicated and stored once:

┌─────────────────────────┐
│ Size: u32               │
│ Count: u32              │
├─────────────────────────┤
│ Offsets: [u32 × Count]  │
│ Lengths: [u32 × Count]  │
├─────────────────────────┤
│ String Data (UTF-8)     │
└─────────────────────────┘

Strings are referenced by 32-bit index throughout the file. This provides:

  • Deduplication"Seattle" stored once, even if used 1,000 times
  • Fast lookup – O(1) index-based access
  • Compact references – 4 bytes per reference instead of the full string

Schema Table

The schema table stores both struct and union definitions:

┌──────────────────────────────────────┐
│ Size: u32                            │
│ Struct Count: u16                    │
│ Union Count: u16                     │
├──────────────────────────────────────┤
│ Struct Offsets: [u32 × struct_count] │
│ Struct Definitions                   │
├──────────────────────────────────────┤
│ Union Offsets: [u32 × union_count]   │
│ Union Definitions                    │
└──────────────────────────────────────┘

Backward compatibility: The Union Count field at offset +6 was previously reserved (always 0). Old readers that ignore this field and only read Struct Count structs continue to work – they simply skip the union data.

Struct Definition

Schema:
  name_idx: u32      (string table index)
  field_count: u16
  flags: u16         (reserved)

  Field (repeated × field_count):
    name_idx: u32    (string table index)
    type: u8         (TLType code)
    flags: u8        (bit 0: nullable, bit 1: is_array)
    extra: u16       (type reference -- see below)

Field extra values:

  • For STRUCT (0x22) fields: string table index of the struct type name (0xFFFF = untyped object)
  • For TAGGED (0x31) fields: string table index of the union type name (0xFFFF = untyped tagged value)
  • For all other field types: 0xFFFF

Union Definition

Union:
  name_idx: u32         (string table index)
  variant_count: u16
  flags: u16            (reserved)

  Variant (repeated × variant_count):
    name_idx: u32       (string table index)
    field_count: u16
    flags: u16          (reserved)

    Field (repeated × field_count):
      name_idx: u32     (string table index)
      type: u8          (TLType code)
      flags: u8         (bit 0: nullable, bit 1: is_array)
      extra: u16        (same semantics as struct field extra)

Each union variant uses the same 8-byte field entry format as struct fields.

Type Codes

0x00  NULL        0x0A  FLOAT32     0x20  ARRAY      0x30  REF
0x01  BOOL        0x0B  FLOAT64     0x21  OBJECT     0x31  TAGGED
0x02  INT8        0x10  STRING      0x22  STRUCT     0x32  TIMESTAMP
0x03  INT16       0x11  BYTES       0x23  MAP
0x04  INT32       0x12  JSONNUMBER  0x24  TUPLE (reserved)
0x05  INT64
0x06  UINT8
0x07  UINT16
0x08  UINT32
0x09  UINT64

TUPLE (0x24) is reserved but not currently emitted. Tuples in text are parsed as arrays.

JSONNUMBER (0x12) stores arbitrary-precision numeric strings that exceed the range of i64, u64, or f64. Stored as a string table index, identical to STRING encoding.

Section Index

Maps named sections to data locations:

┌─────────────────────────┐
│ Size: u32               │
│ Count: u32              │
├─────────────────────────┤
│ Entries (32 B each)     │
└─────────────────────────┘

Each entry (32 bytes):

FieldTypeDescription
key_idxu32String table index for section name
offsetu64Absolute file offset to data
sizeu32Compressed size in bytes
uncompressed_sizeu32Original size before compression
schema_idxu16Schema index (0xFFFF if none)
typeu8TLType code
flagsu8bit 0: compressed, bit 1: is_array
item_countu32Count for arrays/maps
reservedu32(future use)

Data Encoding

Primitives

TypeEncoding
Null0 bytes
Bool1 byte (0x00 or 0x01)
Int8/UInt81 byte
Int16/UInt162 bytes, LE
Int32/UInt324 bytes, LE
Int64/UInt648 bytes, LE
Float324 bytes, IEEE 754 LE
Float648 bytes, IEEE 754 LE
Stringu32 index into string table
Bytesvarint length + raw bytes
Timestampi64 Unix milliseconds (LE, 8 bytes) + i16 timezone offset in minutes (LE, 2 bytes). Total: 10 bytes

Varint Encoding

Used for bytes length:

  • Continuation bit (0x80) + 7 value bits
  • Least-significant group first

Arrays (Top-Level, Homogeneous)

For Value::Int (when all values fit in i32) or Value::String arrays:

Count: u32
Element Type: u8 (Int32 or String)
Elements: [packed data]

All other uniform-type arrays (UInt, Bool, Float, Timestamp, Int64) use heterogeneous encoding.

Arrays (Top-Level, Heterogeneous)

For mixed-type arrays:

Count: u32
Element Type: 0xFF (marker)
Elements: [type: u8, data, type: u8, data, ...]

Arrays (Schema-Typed Fields)

Array fields within @struct use homogeneous encoding for ANY element type:

Count: u32
Element Type: u8 (field's declared type)
Elements: [packed typed values]

Objects

Field Count: u16
Fields: [
  key_idx: u32    (string table index)
  type: u8        (TLType code)
  data: [type-specific]
]

Struct Arrays (Optimal Encoding)

Count: u32
Schema Index: u16
Null Bitmap Size: u16
Rows: [
  Null Bitmap: [u8 × bitmap_size]
  Values: [non-null field values only]
]

The null bitmap tracks which fields are null:

  • Bit i set = field i is null
  • Only non-null values are stored
  • Bitmap size = ceil((field_count + 7) / 8)

Maps

Count: u32
Entries: [
  key_type: u8
  key_data: [type-specific]
  value_type: u8
  value_data: [type-specific]
]

References

name_idx: u32    (string table index for reference name)

Tagged Values

tag_idx: u32     (string table index for tag name)
value_type: u8   (TLType code)
value_data: [type-specific]

Compression

  • Algorithm: ZLIB (deflate)
  • Threshold: Compress if data > 64 bytes AND compressed < 90% of original
  • Granularity: Per-section (each section compressed independently)
  • Flag: Bit 0 of entry flags indicates compression
  • Decompression: Readers check the flag and decompress transparently

JSON Interoperability

TeaLeaf provides built-in bidirectional JSON conversion for easy integration with existing tools and systems.

JSON to TeaLeaf

CLI

# JSON to TeaLeaf text (with automatic schema inference)
tealeaf from-json input.json -o output.tl

# JSON to TeaLeaf binary
tealeaf json-to-tlbx input.json -o output.tlbx

Rust API

#![allow(unused)]
fn main() {
let doc = TeaLeaf::from_json(json_string)?;

// With automatic schema inference for arrays
let doc = TeaLeaf::from_json_with_schemas(json_string)?;
}

.NET API

using var doc = TLDocument.FromJson(jsonString);

Type Mappings (JSON → TeaLeaf)

JSON TypeTeaLeaf Type
nullNull
true / falseBool
number (integer)Int (or UInt if > i64::MAX)
number (decimal, finite f64)Float
number (exceeds i64/u64/f64)JsonNumber
stringString
arrayArray
objectObject

Limitations

JSON import is “plain JSON only” – it does not recognize the special JSON forms used for TeaLeaf export:

JSON FormResult
{"$ref": "name"}Plain Object (not a Ref)
{"$tag": "...", "$value": ...}Plain Object (not a Tagged)
[[key, value], ...]Plain Array (not a Map)
ISO 8601 stringsPlain String (not a Timestamp)

For full round-trip fidelity with these types, use binary format (.tlbx) or reconstruct programmatically.

TeaLeaf to JSON

CLI

# Text to JSON
tealeaf to-json input.tl -o output.json

# Binary to JSON
tealeaf tlbx-to-json input.tlbx -o output.json

Both commands write to stdout if -o is not specified.

Rust API

#![allow(unused)]
fn main() {
let json = doc.to_json()?;         // pretty-printed
let json = doc.to_json_compact()?;  // minified
}

.NET API

string json = doc.ToJson();         // pretty-printed
string json = doc.ToJsonCompact();   // minified

Type Mappings (TeaLeaf → JSON)

TeaLeaf TypeJSON Representation
Nullnull
Booltrue / false
Int, UIntnumber
Floatnumber
JsonNumbernumber (parsed back to JSON number)
Stringstring
Bytesstring (hex with 0x prefix)
Arrayarray
Objectobject
Maparray of [key, value] pairs
Timestampstring (ISO 8601)
Ref{"$ref": "name"}
Tagged{"$tag": "tagname", "$value": value}

Schema Inference

When converting JSON to TeaLeaf, the from-json command (and from_json_with_schemas API) can automatically infer schemas from arrays of uniform objects.

How It Works

  1. Array Detection – identifies arrays of objects with identical field sets
  2. Name Inference – singularizes parent key names ("products"product schema)
  3. Type Inference – determines field types across all array items
  4. Nullable Detection – fields with any null values become nullable (string?)
  5. Nested Schemas – creates separate schemas for nested objects within array elements

Example

Input JSON:

{
  "customers": [
    {
      "id": 1,
      "name": "Alice",
      "billing_address": {"street": "123 Main", "city": "Boston"}
    },
    {
      "id": 2,
      "name": "Bob",
      "billing_address": {"street": "456 Oak", "city": "Denver"}
    }
  ]
}

Inferred TeaLeaf output:

@struct billing_address (city: string, street: string)
@struct customer (billing_address: billing_address, id: int, name: string)

customers: @table customer [
  (("Boston", "123 Main"), 1, "Alice"),
  (("Denver", "456 Oak"), 2, "Bob"),
]

Nested Schema Inference

When array elements contain nested objects, TeaLeaf creates schemas for those nested objects if they have uniform structure across all items:

  • Nested objects become their own @struct definitions
  • Parent schemas reference nested schemas by name (not object type)
  • Deeply nested objects are handled recursively

Round-Trip Considerations

PathFidelity
.tl.json.tlLossy – schemas, comments, refs, tags, timestamps, maps are simplified
.tl.tlbx.tlLossless for data (comments stripped)
.tl.tlbx.jsonSame as .tl.json
.json.tl.jsonGenerally lossless for JSON-native types
.json.tlbx.jsonGenerally lossless for JSON-native types

For types that don’t round-trip through JSON (Ref, Tagged, Map, Timestamp, Bytes), use the binary format for lossless storage.

Grammar

The formal grammar for the TeaLeaf text format in EBNF notation.

EBNF Grammar

document     = { directive | pair | ref_def } ;

directive    = struct_def | union_def | include | root_array ;
struct_def   = "@struct" name "(" fields ")" ;
union_def    = "@union" name "{" variants "}" ;
include      = "@include" string ;
root_array   = "@root-array" ;

variants     = variant { "," variant } ;
variant      = name "(" [ fields ] ")" ;

fields       = field { "," field } ;
field        = name [ ":" type ] ;  (* type defaults to string if omitted *)
type         = [ "[]" ] base_type [ "?" ] ;
base_type    = "bool" | "int" | "int8" | "int16" | "int32" | "int64"
             | "uint" | "uint8" | "uint16" | "uint32" | "uint64"
             | "float" | "float32" | "float64" | "string" | "bytes"
             | "timestamp" | name ;

pair         = key ":" value ;
key          = name | string ;
value        = primitive | object | array | tuple | table | map
             | tagged | ref | timestamp ;

primitive    = string | bytes_lit | number | bool | "~" ;
bytes_lit    = "b\"" { hexdigit hexdigit } "\"" ;
object       = "{" [ ( pair | ref_def ) { "," ( pair | ref_def ) } ] "}" ;
array        = "[" [ value { "," value } ] "]" ;
tuple        = "(" [ value { "," value } ] ")" ;
table        = "@table" name array ;
map          = "@map" "{" [ map_entry { "," map_entry } ] "}" ;
map_entry    = map_key ":" value ;
map_key      = string | name | integer ;
tagged       = ":" name value ;
ref          = "!" name ;
ref_def      = "!" name ":" value ;
timestamp    = date [ "T" time [ timezone ] ] ;

date         = digit{4} "-" digit{2} "-" digit{2} ;
time         = digit{2} ":" digit{2} [ ":" digit{2} [ "." digit{1,3} ] ] ;
timezone     = "Z" | ( "+" | "-" ) digit{2} [ ":" digit{2} | digit{2} ] ;

string       = name | '"' chars '"' | '"""' multiline '"""' ;
number       = integer | float | hex | binary ;
integer      = [ "-" ] digit+ ;
float        = [ "-" ] digit+ "." digit+ [ ("e"|"E") ["+"|"-"] digit+ ]
             | [ "-" ] digit+ ("e"|"E") ["+"|"-"] digit+
             | "NaN" | "inf" | "-inf" ;
hex          = [ "-" ] ("0x" | "0X") hexdigit+ ;
binary       = [ "-" ] ("0b" | "0B") ("0"|"1")+ ;
bool         = "true" | "false" ;
name         = (letter | "_") { letter | digit | "_" | "-" | "." } ;
comment      = "#" { any } newline ;

chars        = { any_char | escape } ;
escape       = "\\" | "\\\"" | "\\n" | "\\t" | "\\r" | "\\b" | "\\f"
             | "\\u" hexdigit hexdigit hexdigit hexdigit ;

Production Notes

Document Structure

A document is a sequence of:

  • Directives@struct, @union, @include, @root-array (processed before data)
  • Pairskey: value (the actual data)
  • Reference definitions!name: value (reusable named values)

Key Rules

  • Keys can be bare identifiers (name) or quoted strings ("Content-Type")
  • Trailing commas are allowed in all list contexts (arrays, objects, tuples, maps, fields)
  • Comments (# to end of line) can appear anywhere whitespace is valid
  • Whitespace is insignificant except inside strings

Type Defaults

When a field type is omitted in a @struct, it defaults to string:

@struct config (host, port: int, debug: bool)
# "host" is implicitly string

Tuple Semantics

Standalone tuples are parsed as arrays. Only within a @table context do tuples acquire schema binding:

# This is an array [1, 2, 3]
plain: (1, 2, 3)

# These are schema-bound tuples
@struct point (x: int, y: int)
points: @table point [(0, 0), (1, 1)]

Root Array Directive

The @root-array directive marks the document as representing a root-level JSON array rather than a JSON object. This is used for JSON round-trip fidelity – when a JSON array is imported via from-json, the directive is emitted so that to-json produces an array at the top level instead of an object:

@root-array

0: {id: 1, name: alice}
1: {id: 2, name: bob}

Without @root-array, the JSON output would be {"0": {...}, "1": {...}}. With it, the output is [{...}, {...}].

Map Key Restrictions

Map keys are restricted to hashable types: strings, names, and integers. Complex values (objects, arrays) cannot be map keys.

Reference Scoping

References can be defined at:

  • Top level!name: value alongside pairs
  • Inside objects{!ref: value, field: !ref}

References are resolved within the document scope.

CLI Overview

The tealeaf command-line tool provides all operations for working with TeaLeaf files.

Usage

tealeaf <command> [options]

Commands

CommandDescription
compileCompile text (.tl) to binary (.tlbx)
decompileDecompile binary (.tlbx) to text (.tl)
infoShow file information (auto-detects format)
validateValidate text format syntax
to-jsonConvert TeaLeaf text to JSON
from-jsonConvert JSON to TeaLeaf text
tlbx-to-jsonConvert TeaLeaf binary to JSON
json-to-tlbxConvert JSON to TeaLeaf binary
helpShow help text

Global Options

tealeaf help         # Show usage
tealeaf -h           # Show usage
tealeaf --help       # Show usage

Exit Codes

CodeMeaning
0Success
1Error (parse error, I/O error, invalid arguments)

Error messages are written to stderr. Data output goes to stdout (when no -o flag is specified).

Quick Examples

# Full workflow
tealeaf validate data.tl
tealeaf compile data.tl -o data.tlbx
tealeaf info data.tlbx
tealeaf to-json data.tl -o data.json
tealeaf decompile data.tlbx -o recovered.tl

# JSON conversion
tealeaf from-json api_response.json -o structured.tl
tealeaf json-to-tlbx api_response.json -o compact.tlbx
tealeaf tlbx-to-json compact.tlbx -o exported.json

compile

Compile a TeaLeaf text file (.tl) to the compact binary format (.tlbx).

Usage

tealeaf compile <input.tl> -o <output.tlbx>

Arguments

ArgumentRequiredDescription
<input.tl>YesPath to the TeaLeaf text file
-o <output.tlbx>YesPath for the output binary file

Description

The compile command:

  1. Parses the text file (including any @include directives)
  2. Builds the string table (deduplicates all strings)
  3. Encodes schemas into the schema table
  4. Encodes each top-level key-value pair as a data section
  5. Applies per-section ZLIB compression (enabled by default)
  6. Writes the binary file with the 64-byte header

Compression is applied to sections larger than 64 bytes where the compressed size is less than 90% of the original.

Examples

# Basic compilation
tealeaf compile config.tl -o config.tlbx

# Compile and inspect
tealeaf compile data.tl -o data.tlbx
tealeaf info data.tlbx

Output

On success, prints:

  • Input and output file paths
  • Input size, output size, and compression ratio (percentage)

Error Cases

ErrorCause
Parse errorInvalid TeaLeaf syntax in input file
I/O errorInput file not found or output path not writable
Include errorReferenced @include file not found

See Also

decompile

Convert a TeaLeaf binary file (.tlbx) back to the human-readable text format (.tl).

Usage

tealeaf decompile <input.tlbx> -o <output.tl>

Arguments

ArgumentRequiredDescription
<input.tlbx>YesPath to the TeaLeaf binary file
-o <output.tl>YesPath for the output text file

Description

The decompile command:

  1. Opens the binary file and reads the header
  2. Loads the string table and schema table
  3. Reads the section index
  4. Decompresses sections as needed
  5. Reconstructs @struct definitions from the schema table
  6. Writes each section as a key-value pair in text format

Notes

  • Comments are not preserved – comments from the original .tl are stripped during compilation
  • Formatting may differ – the decompiled output uses the default formatting, which may differ from the original source
  • Data is lossless – all values, schemas, and structure are preserved
  • Bytes are lossless – bytes values are written as b"..." hex literals, which round-trip correctly

Examples

# Decompile a binary file
tealeaf decompile data.tlbx -o data_recovered.tl

# Round-trip verification
tealeaf compile original.tl -o compiled.tlbx
tealeaf decompile compiled.tlbx -o roundtrip.tl
tealeaf compile roundtrip.tl -o roundtrip.tlbx
# compiled.tlbx and roundtrip.tlbx should be equivalent

See Also

  • compile – reverse operation
  • info – inspect without decompiling

info

Display information about a TeaLeaf file. Auto-detects whether the file is text or binary format.

Usage

tealeaf info <file>

Arguments

ArgumentRequiredDescription
<file>YesPath to a .tl or .tlbx file

Description

The info command auto-detects the file format (by checking for the TLBX magic bytes) and displays relevant information.

For Text Files (.tl)

  • Number of top-level keys
  • Key names
  • Number of schema definitions
  • Schema details (name, fields, types)

For Binary Files (.tlbx)

  • Version information
  • File size
  • Header details (offsets, counts)
  • String table statistics (count, total size)
  • Schema table details (names, field counts)
  • Section index (key names, sizes, compression ratios)

Examples

# Inspect a text file
tealeaf info config.tl

# Inspect a binary file
tealeaf info data.tlbx

See Also

validate

Validate a TeaLeaf text file for syntactic correctness without compiling it.

Usage

tealeaf validate <file.tl>

Arguments

ArgumentRequiredDescription
<file.tl>YesPath to the TeaLeaf text file

Description

The validate command parses the text file and reports any syntax errors. It does not produce any output files.

Validation checks include:

  • Lexical analysis (valid tokens, string escaping)
  • Structural parsing (matched brackets, valid directives)
  • Schema reference validity (@table references defined @struct)
  • Include file resolution
  • Type syntax in schema definitions

Examples

# Validate a file
tealeaf validate config.tl

# Validate before compiling
tealeaf validate data.tl && tealeaf compile data.tl -o data.tlbx

Exit Codes

CodeMeaning
0File is valid
1Validation errors found

On success, prints ✓ Valid along with schema and key counts. On failure, prints ✗ Invalid: <error message> and exits with code 1.

See Also

  • info – inspect file contents
  • compile – compile (implies validation)

to-json / from-json

Convert between TeaLeaf text format and JSON.

to-json

Convert a TeaLeaf text file to JSON.

Usage

tealeaf to-json <input.tl> [-o <output.json>]

Arguments

ArgumentRequiredDescription
<input.tl>YesPath to the TeaLeaf text file
-o <output.json>NoOutput file path. If omitted, writes to stdout

Examples

# Write to file
tealeaf to-json data.tl -o data.json

# Write to stdout
tealeaf to-json data.tl

# Pipe to another tool
tealeaf to-json data.tl | jq '.users'

Output Format

The output is pretty-printed JSON. See JSON Interoperability for type mapping details.


from-json

Convert a JSON file to TeaLeaf text format with automatic schema inference.

Usage

tealeaf from-json <input.json> -o <output.tl>

Arguments

ArgumentRequiredDescription
<input.json>YesPath to the JSON file
-o <output.tl>YesPath for the output TeaLeaf text file

Schema Inference

from-json automatically infers schemas from JSON arrays of uniform objects:

  1. Array Detection – identifies arrays where all elements are objects with identical keys
  2. Name Inference – singularizes the parent key name ("users"user schema)
  3. Type Inference – determines field types across all items
  4. Nullable Detection – fields with any null become nullable (string?)
  5. Nested Schemas – creates schemas for nested uniform objects

Examples

# Convert with schema inference
tealeaf from-json api_data.json -o structured.tl

# Full pipeline: JSON → TeaLeaf text → Binary
tealeaf from-json data.json -o data.tl
tealeaf compile data.tl -o data.tlbx

Example: Schema Inference in Action

Input (employees.json):

{
  "employees": [
    {"id": 1, "name": "Alice", "dept": "Engineering"},
    {"id": 2, "name": "Bob", "dept": "Design"}
  ]
}

Output (employees.tl):

@struct employee (dept: string, id: int, name: string)

employees: @table employee [
  ("Engineering", 1, "Alice"),
  ("Design", 2, "Bob"),
]

See Also

tlbx-to-json / json-to-tlbx

Convert between TeaLeaf binary format and JSON directly, without going through the text format.

tlbx-to-json

Convert a TeaLeaf binary file to JSON.

Usage

tealeaf tlbx-to-json <input.tlbx> [-o <output.json>]

Arguments

ArgumentRequiredDescription
<input.tlbx>YesPath to the TeaLeaf binary file
-o <output.json>NoOutput file path. If omitted, writes to stdout

Examples

# Write to file
tealeaf tlbx-to-json data.tlbx -o data.json

# Write to stdout
tealeaf tlbx-to-json data.tlbx

# Pipe to jq for filtering
tealeaf tlbx-to-json data.tlbx | jq '.config'

Notes

  • Produces the same JSON output as to-json on the equivalent text file
  • Reads the binary directly – no intermediate text conversion

json-to-tlbx

Convert a JSON file directly to TeaLeaf binary format.

Usage

tealeaf json-to-tlbx <input.json> -o <output.tlbx>

Arguments

ArgumentRequiredDescription
<input.json>YesPath to the JSON file
-o <output.tlbx>YesPath for the output binary file

Examples

# Direct JSON to binary
tealeaf json-to-tlbx api_data.json -o compact.tlbx

# Verify the result
tealeaf info compact.tlbx
tealeaf tlbx-to-json compact.tlbx -o verify.json

Notes

  • Performs schema inference (same as from-json)
  • Compiles directly to binary – no intermediate .tl file
  • Compression is enabled by default

Workflow Comparison

# Two-step (via text)
tealeaf from-json data.json -o data.tl
tealeaf compile data.tl -o data.tlbx

# One-step (direct)
tealeaf json-to-tlbx data.json -o data.tlbx

Both approaches produce equivalent binary output.

See Also

Rust Guide: Overview

TeaLeaf is written in Rust. The tealeaf-core crate provides the full API for parsing, compiling, reading, and converting TeaLeaf documents.

Crates

CrateDescription
tealeaf-coreCore library: parser, compiler, reader, CLI, JSON conversion
tealeaf-deriveProc-macro crate: #[derive(ToTeaLeaf, FromTeaLeaf)]
tealeaf-ffiC-compatible FFI layer for language bindings

Installation

Add to your Cargo.toml:

[dependencies]
tealeaf-core = { version = "2.0.0-beta.8", features = ["derive"] }

The derive feature pulls in tealeaf-derive for proc-macro support.

Core Types

TeaLeaf

The main document type:

#![allow(unused)]
fn main() {
use tealeaf::TeaLeaf;

// Parse from text
let doc = TeaLeaf::parse("name: Alice\nage: 30")?;

// Load from file
let doc = TeaLeaf::load("data.tl")?;

// Load from JSON
let doc = TeaLeaf::from_json(json_str)?;

// With schema inference
let doc = TeaLeaf::from_json_with_schemas(json_str)?;
}

Value

The value enum representing all TeaLeaf types:

#![allow(unused)]
fn main() {
use tealeaf::Value;

pub enum Value {
    Null,
    Bool(bool),
    Int(i64),
    UInt(u64),
    Float(f64),
    String(String),
    Bytes(Vec<u8>),
    Array(Vec<Value>),
    Object(ObjectMap<String, Value>),  // IndexMap alias, preserves insertion order
    Map(Vec<(Value, Value)>),
    Ref(String),
    Tagged(String, Box<Value>),
    Timestamp(i64, i16),  // (unix_millis, tz_offset_minutes)
    JsonNumber(String),   // arbitrary-precision number (raw JSON decimal string)
}
}

Schema and Field

Schema definitions:

#![allow(unused)]
fn main() {
use tealeaf::{Schema, Field, FieldType};

let schema = Schema {
    name: "user".to_string(),
    fields: vec![
        Field { name: "id".into(), field_type: FieldType { base: "int".into(), nullable: false, is_array: false } },
        Field { name: "name".into(), field_type: FieldType { base: "string".into(), nullable: false, is_array: false } },
        Field { name: "email".into(), field_type: FieldType { base: "string".into(), nullable: true, is_array: false } },
    ],
};
}

Accessing Data

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("data.tl")?;

// Get a value by key
if let Some(Value::String(name)) = doc.get("name") {
    println!("Name: {}", name);
}

// Get a schema
if let Some(schema) = doc.schema("user") {
    for field in &schema.fields {
        println!("  {}: {}", field.name, field.field_type.base);
    }
}
}

Output Operations

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("data.tl")?;

// Compile to binary
doc.compile("data.tlbx", true)?;  // true = enable compression

// Convert to JSON
let json = doc.to_json()?;         // pretty-printed
let json = doc.to_json_compact()?;  // minified

// Convert to TeaLeaf text (with schemas)
let text = doc.to_tl_with_schemas();
}

Conversion Traits

Two traits enable Rust struct ↔ TeaLeaf conversion:

#![allow(unused)]
fn main() {
pub trait ToTeaLeaf {
    fn to_tealeaf_value(&self) -> Value;
    fn collect_schemas() -> IndexMap<String, Schema>;
    fn tealeaf_field_type() -> FieldType;
}

pub trait FromTeaLeaf: Sized {
    fn from_tealeaf_value(value: &Value) -> Result<Self, ConvertError>;
}
}

These are typically derived via #[derive(ToTeaLeaf, FromTeaLeaf)] – see Derive Macros.

Extension Trait

ToTeaLeafExt provides convenience methods for any ToTeaLeaf implementor:

#![allow(unused)]
fn main() {
pub trait ToTeaLeafExt: ToTeaLeaf {
    fn to_tealeaf_doc(&self, key: &str) -> TeaLeaf;
    fn to_tl_string(&self, key: &str) -> String;
    fn to_tlbx(&self, key: &str, path: &str, compress: bool) -> Result<()>;
    fn to_tealeaf_json(&self, key: &str) -> Result<String>;
}
}

Example:

#![allow(unused)]
fn main() {
let user = User { id: 1, name: "Alice".into(), active: true };

// One-liner serialization
let text = user.to_tl_string("user");
user.to_tlbx("user", "user.tlbx", true)?;
let json = user.to_tealeaf_json("user")?;
}

Next Steps

Derive Macros

The tealeaf-derive crate provides two proc-macros for automatic Rust struct ↔ TeaLeaf conversion.

Setup

Enable the derive feature:

[dependencies]
tealeaf-core = { version = "2.0.0-beta.8", features = ["derive"] }

ToTeaLeaf

Converts a Rust struct or enum to a TeaLeaf Value:

#![allow(unused)]
fn main() {
use tealeaf::{ToTeaLeaf, ToTeaLeafExt};

#[derive(ToTeaLeaf)]
struct Config {
    host: String,
    port: i32,
    debug: bool,
}

let config = Config { host: "localhost".into(), port: 8080, debug: true };

// Serialize to TeaLeaf text
let text = config.to_tl_string("config");
// @struct config (host: string, port: int, debug: bool)
// config: (localhost, 8080, true)

// Compile directly to binary
config.to_tlbx("config", "config.tlbx", true)?;

// Convert to JSON
let json = config.to_tealeaf_json("config")?;

// Get as Value
let value = config.to_tealeaf_value();

// Get schemas
let schemas = Config::collect_schemas();
}

FromTeaLeaf

Deserializes a TeaLeaf Value back to a Rust struct:

#![allow(unused)]
fn main() {
use tealeaf::{Reader, FromTeaLeaf};

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Config {
    host: String,
    port: i32,
    debug: bool,
}

let reader = Reader::open("config.tlbx")?;
let value = reader.get("config")?;
let config = Config::from_tealeaf_value(&value)?;
}

Struct Example

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
    id: i64,
    name: String,
    #[tealeaf(optional)]
    email: Option<String>,
    active: bool,
    #[tealeaf(rename = "join_date", type = "timestamp")]
    joined: i64,
}
}

This generates:

  • Schema: @struct user (id: int64, name: string, email: string?, active: bool, join_date: timestamp)
  • ToTeaLeaf: serializes to a positional tuple matching the schema
  • FromTeaLeaf: deserializes from an object or struct-array row

Enum Example

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
enum Shape {
    Circle { radius: f64 },
    Rectangle { width: f64, height: f64 },
    Point,
}

let shapes = vec![
    Shape::Circle { radius: 5.0 },
    Shape::Rectangle { width: 10.0, height: 20.0 },
    Shape::Point,
];
}

Enum variants are serialized as tagged values:

shapes: [:circle {radius: 5.0}, :rectangle {width: 10.0, height: 20.0}, :point ~]

Nested Structs

Structs can reference other ToTeaLeaf/FromTeaLeaf types:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Address {
    street: String,
    city: String,
    zip: String,
}

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Person {
    name: String,
    home: Address,
    #[tealeaf(optional)]
    work: Option<Address>,
}
}

The collect_schemas() method automatically collects schemas from nested types.

Collections

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Team {
    name: String,
    members: Vec<String>,      // []string
    scores: Vec<i32>,          // []int
    leads: Vec<Person>,        // []person (nested struct array)
}
}

Supported Types

Rust TypeTeaLeaf Type
boolbool
i8, i16, i32int8, int16, int
i64int64
u8, u16, u32uint8, uint16, uint
u64uint64
f32float32
f64float
String, &strstring
Vec<u8>bytes
Vec<T>[]T
Option<T>T? (nullable)
IndexMap<String, T>object (order-preserving)
HashMap<String, T>object
Custom struct (with derive)named struct reference

See Also

Attributes Reference

All attributes use the #[tealeaf(...)] namespace and can be applied to structs, enums, or individual fields.

Container Attributes

Applied to a struct or enum:

rename = "name"

Override the schema name used in TeaLeaf output:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
#[tealeaf(rename = "app_config")]
struct Config {
    host: String,
    port: i32,
}
// Generates: @struct app_config (host: string, port: int)
}

Without rename, the struct name is converted to snake_case (Configconfig).

key = "name"

Override the default document key when serializing:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
#[tealeaf(key = "my_config")]
struct Config { /* ... */ }
}

root_array

Mark a struct as a root-level array element (changes serialization to omit the wrapping key):

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
#[tealeaf(root_array)]
struct LogEntry {
    timestamp: i64,
    message: String,
}
}

Field Attributes

Applied to individual struct fields:

rename = "name"

Override the field name in the schema:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
    #[tealeaf(rename = "user_name")]
    name: String,
}
// Generates: @struct user (user_name: string)
}

skip

Exclude a field from serialization/deserialization:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
    name: String,
    #[tealeaf(skip)]
    internal_cache: Option<Vec<u8>>,
}
}

Skipped fields must implement Default for deserialization.

optional

Mark a field as nullable in the schema:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
    name: String,
    #[tealeaf(optional)]
    email: Option<String>,  // string?
}
}

Note: Fields of type Option<T> are automatically detected as optional. The #[tealeaf(optional)] attribute is mainly useful for documentation or when using wrapper types.

type = "tealeaf_type"

Override the TeaLeaf type for a field:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Event {
    #[tealeaf(type = "timestamp")]
    created_at: i64,  // Would normally be int64, but we want timestamp

    #[tealeaf(type = "uint64")]
    large_count: i64,  // Override the default signed type
}
}

Valid type names: bool, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float, float32, float64, string, bytes, timestamp.

flatten

Inline the fields of a nested struct into the parent:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Metadata {
    created_by: String,
    version: i32,
}

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Document {
    title: String,
    #[tealeaf(flatten)]
    meta: Metadata,
}
// Generates: @struct document (title: string, created_by: string, version: int)
// Instead of: @struct document (title: string, meta: metadata)
}

default

Use Default::default() when deserializing a missing field:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Config {
    host: String,
    #[tealeaf(default)]
    port: i32,  // defaults to 0 if missing
}
}

default = "expr"

Use a custom expression for the default value:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Config {
    host: String,
    #[tealeaf(default = "8080")]
    port: i32,
    #[tealeaf(default = "true")]
    debug: bool,
}
}

Combining Attributes

Multiple attributes can be combined:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Event {
    #[tealeaf(rename = "ts", type = "timestamp")]
    timestamp: i64,

    #[tealeaf(optional, rename = "msg")]
    message: Option<String>,

    #[tealeaf(skip)]
    cached_hash: u64,

    #[tealeaf(flatten)]
    metadata: EventMeta,
}
}

Attribute Summary Table

AttributeLevelDescription
rename = "name"Container or FieldOverride schema/field name
key = "name"ContainerOverride document key
root_arrayContainerSerialize as root array element
skipFieldExclude from serialization
optionalFieldMark as nullable (T?)
type = "name"FieldOverride TeaLeaf type
flattenFieldInline nested struct fields
defaultFieldUse Default::default()
default = "expr"FieldUse custom default expression

Builder API

The TeaLeafBuilder provides a fluent API for constructing TeaLeaf documents programmatically.

Basic Usage

#![allow(unused)]
fn main() {
use tealeaf::{TeaLeafBuilder, Value};

let doc = TeaLeafBuilder::new()
    .add_value("name", Value::String("Alice".into()))
    .add_value("age", Value::Int(30))
    .add_value("active", Value::Bool(true))
    .build();

// Compile to binary
doc.compile("output.tlbx", true)?;

// Convert to JSON
let json = doc.to_json()?;
}

Methods

new()

Create a new empty builder:

#![allow(unused)]
fn main() {
let builder = TeaLeafBuilder::new();
}

add_value(key, value)

Add a raw Value to the document:

#![allow(unused)]
fn main() {
builder.add_value("count", Value::Int(42))
}

add<T: ToTeaLeaf>(key, dto)

Add a struct that implements ToTeaLeaf. Automatically collects schemas from the type:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
struct Config {
    host: String,
    port: i32,
}

let config = Config { host: "localhost".into(), port: 8080 };

let doc = TeaLeafBuilder::new()
    .add("config", &config)
    .build();
}

add_vec<T: ToTeaLeaf>(key, items)

Add an array of ToTeaLeaf items. Automatically collects schemas:

#![allow(unused)]
fn main() {
let users = vec![
    User { id: 1, name: "Alice".into() },
    User { id: 2, name: "Bob".into() },
];

let doc = TeaLeafBuilder::new()
    .add_vec("users", &users)
    .build();
}

add_schema(schema)

Manually add a schema definition:

#![allow(unused)]
fn main() {
use tealeaf::{Schema, Field, FieldType};

let schema = Schema {
    name: "point".to_string(),
    fields: vec![
        Field {
            name: "x".into(),
            field_type: FieldType { base: "int".into(), nullable: false, is_array: false },
        },
        Field {
            name: "y".into(),
            field_type: FieldType { base: "int".into(), nullable: false, is_array: false },
        },
    ],
};

let doc = TeaLeafBuilder::new()
    .add_schema(schema)
    .add_value("origin", Value::Array(vec![Value::Int(0), Value::Int(0)]))
    .build();
}

root_array()

Mark the document as a root-level array (rather than a key-value document):

#![allow(unused)]
fn main() {
let doc = TeaLeafBuilder::new()
    .root_array()
    .add_value("items", Value::Array(vec![
        Value::Int(1),
        Value::Int(2),
        Value::Int(3),
    ]))
    .build();
}

build()

Finalize and return the TeaLeaf document:

#![allow(unused)]
fn main() {
let doc = builder.build();
}

Complete Example

use tealeaf::{TeaLeafBuilder, ToTeaLeaf, FromTeaLeaf, Value};

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Address {
    street: String,
    city: String,
}

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Employee {
    id: i64,
    name: String,
    address: Address,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let employees = vec![
        Employee {
            id: 1,
            name: "Alice".into(),
            address: Address { street: "123 Main".into(), city: "Seattle".into() },
        },
        Employee {
            id: 2,
            name: "Bob".into(),
            address: Address { street: "456 Oak".into(), city: "Austin".into() },
        },
    ];

    let doc = TeaLeafBuilder::new()
        .add_value("company", Value::String("Acme Corp".into()))
        .add_vec("employees", &employees)
        .add_value("version", Value::Int(1))
        .build();

    // Output
    doc.compile("company.tlbx", true)?;
    println!("{}", doc.to_tl_with_schemas());
    println!("{}", doc.to_json()?);

    Ok(())
}

Schemas & Types

Working with schemas and the type system in Rust.

Schema Structure

#![allow(unused)]
fn main() {
pub struct Schema {
    pub name: String,
    pub fields: Vec<Field>,
}

pub struct Field {
    pub name: String,
    pub field_type: FieldType,
}

pub struct FieldType {
    pub base: String,       // "int", "string", "user", etc.
    pub nullable: bool,     // field: T?
    pub is_array: bool,     // field: []T
}
}

Creating Schemas Manually

#![allow(unused)]
fn main() {
use tealeaf::{Schema, Field, FieldType};

let user_schema = Schema {
    name: "user".to_string(),
    fields: vec![
        Field {
            name: "id".into(),
            field_type: FieldType { base: "int".into(), nullable: false, is_array: false },
        },
        Field {
            name: "name".into(),
            field_type: FieldType { base: "string".into(), nullable: false, is_array: false },
        },
        Field {
            name: "tags".into(),
            field_type: FieldType { base: "string".into(), nullable: false, is_array: true },
        },
        Field {
            name: "email".into(),
            field_type: FieldType { base: "string".into(), nullable: true, is_array: false },
        },
    ],
};
}

Collecting Schemas from Derive

When using #[derive(ToTeaLeaf)], schemas are collected automatically:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
struct Address { street: String, city: String }

#[derive(ToTeaLeaf)]
struct User { name: String, home: Address }

// Collects schemas for both `user` and `address`
let schemas = User::collect_schemas();
assert!(schemas.contains_key("user"));
assert!(schemas.contains_key("address"));
}

Accessing Schemas from Documents

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("data.tl")?;

// Get a specific schema
if let Some(schema) = doc.schema("user") {
    println!("Schema: {} ({} fields)", schema.name, schema.fields.len());
    for field in &schema.fields {
        let nullable = if field.field_type.nullable { "?" } else { "" };
        let array = if field.field_type.is_array { "[]" } else { "" };
        println!("  {}: {}{}{}", field.name, array, field.field_type.base, nullable);
    }
}

// Iterate all schemas
for (name, schema) in &doc.schemas {
    println!("{}: {} fields", name, schema.fields.len());
}
}

Accessing Schemas from Binary Reader

Schemas are embedded in the binary format. Parse a key’s value and inspect the document schemas:

#![allow(unused)]
fn main() {
use tealeaf::Reader;

let reader = Reader::open("data.tlbx")?;

// List available keys
for key in reader.keys() {
    let value = reader.get(key)?;
    println!("{}: {:?}", key, value);
}
}

For full schema introspection, decompile the binary back to a TeaLeaf document and access doc.schemas.

Value Type System

The Value enum maps to TeaLeaf types:

VariantTeaLeaf TypeNotes
Value::Nullnull~ in text
Value::Bool(b)bool
Value::Int(i)int/int8/int16/int32/int64Size chosen by inference
Value::UInt(u)uint/uint8/uint16/uint32/uint64Size chosen by inference
Value::Float(f)float/float64Always f64 at runtime
Value::String(s)string
Value::Bytes(b)bytes
Value::Array(v)arrayHeterogeneous or typed
Value::Object(m)objectString-keyed map
Value::Map(pairs)mapOrdered, any key type
Value::Ref(name)ref!name reference
Value::Tagged(tag, val)tagged:tag value
Value::Timestamp(ms, tz)timestampUnix milliseconds + timezone offset (minutes)
Value::JsonNumber(s)json-numberArbitrary-precision number (raw JSON decimal string)

Type Inference at Write Time

When compiling, the writer selects the smallest encoding:

#![allow(unused)]
fn main() {
// Value::Int(42) → int8 in binary (fits in i8)
// Value::Int(1000) → int16 (fits in i16)
// Value::Int(100_000) → int32 (fits in i32)
// Value::Int(5_000_000_000) → int64
}

Schema-Typed Data

When data matches a schema (via @table), binary encoding uses:

  • Positional storage (no field name repetition)
  • Null bitmaps (one bit per nullable field)
  • Type-homogeneous arrays (packed encoding for []int, []string, etc.)

Error Handling

TeaLeaf uses the thiserror crate for structured error types.

Error Types

The main error enum:

Error VariantDescription
IoFile I/O error (wraps std::io::Error)
InvalidMagicBinary file doesn’t start with TLBX magic bytes
InvalidVersionUnsupported binary format version
InvalidTypeUnknown type code in binary data
InvalidUtf8String encoding error
UnexpectedTokenParse error – expected one token, got another
UnexpectedEofPremature end of input
UnknownStruct@table references a struct that hasn’t been defined
MissingFieldRequired field not provided in data
ParseErrorGeneric parse error with message
ValueOutOfRangeNumeric value exceeds target type range

Conversion Errors

The ConvertError type is used by FromTeaLeaf:

#![allow(unused)]
fn main() {
pub enum ConvertError {
    MissingField { struct_name: String, field: String },
    TypeMismatch { expected: String, got: String, path: String },
    Nested { path: String, source: Box<ConvertError> },
    Custom(String),
}
}

Handling Errors

Parse Errors

#![allow(unused)]
fn main() {
use tealeaf::TeaLeaf;

match TeaLeaf::parse(input) {
    Ok(doc) => { /* use doc */ },
    Err(e) => {
        eprintln!("Parse error: {}", e);
        // e.g., "Unexpected token: expected ':', got '}' at line 5"
    }
}
}

I/O Errors

#![allow(unused)]
fn main() {
match TeaLeaf::load("nonexistent.tl") {
    Ok(doc) => { /* ... */ },
    Err(e) => {
        // Will be an Io variant wrapping std::io::Error
        eprintln!("Could not load file: {}", e);
    }
}
}

Binary Format Errors

#![allow(unused)]
fn main() {
use tealeaf::Reader;

match Reader::open("corrupted.tlbx") {
    Ok(reader) => { /* ... */ },
    Err(e) => {
        // Could be InvalidMagic, InvalidVersion, etc.
        eprintln!("Binary read error: {}", e);
    }
}
}

Conversion Errors

#![allow(unused)]
fn main() {
use tealeaf::{FromTeaLeaf, Value};

let value = Value::String("not a number".into());
match i32::from_tealeaf_value(&value) {
    Ok(n) => println!("Got: {}", n),
    Err(e) => {
        // ConvertError::TypeMismatch { expected: "Int", got: "String" }
        eprintln!("Conversion failed: {}", e);
    }
}
}

Error Propagation

All errors implement std::error::Error and Display, so they work with ? and anyhow/eyre:

#![allow(unused)]
fn main() {
fn process_file(path: &str) -> Result<(), Box<dyn std::error::Error>> {
    let doc = TeaLeaf::load(path)?;
    let json = doc.to_json()?;
    doc.compile("output.tlbx", true)?;
    Ok(())
}
}

Validation Without Errors

For checking validity without consuming the error:

#![allow(unused)]
fn main() {
let is_valid = TeaLeaf::parse(input).is_ok();
}

The CLI validate command uses this pattern to report validity without stopping on errors.

.NET Guide: Overview

TeaLeaf provides .NET bindings through a NuGet package that includes a C# source generator and a reflection-based serializer, both backed by the native Rust library via P/Invoke.

Architecture

┌─────────────────────────────────────────────┐
│  Your .NET Application                      │
├─────────────────────┬───────────────────────┤
│  Source Generator   │  Reflection Serializer│
│  (compile-time)     │  (runtime)            │
├─────────────────────┴───────────────────────┤
│  TeaLeaf Managed Layer (TLDocument, TLValue)│
├─────────────────────────────────────────────┤
│  P/Invoke (NativeMethods.cs)                │
├─────────────────────────────────────────────┤
│  tealeaf_ffi.dll / .so / .dylib (Rust)      │
└─────────────────────────────────────────────┘

Installation

dotnet add package TeaLeaf

The single package bundles everything:

ComponentWhat it provides
TeaLeafManaged wrapper types (TLDocument, TLValue, TLReader), reflection serializer
TeaLeaf.AnnotationsAttributes ([TeaLeaf], [TLSkip], etc.) – included as a dependency
TeaLeaf.GeneratorsC# incremental source generator – bundled as an analyzer
Native librariestealeaf_ffi for all supported platforms (win/linux/osx, x64/arm64)

Two Serialization Approaches

Zero-reflection, compile-time code generation:

[TeaLeaf]
public partial class User
{
    public int Id { get; set; }
    public string Name { get; set; } = "";
    [TLOptional] public string? Email { get; set; }
}

// Generated methods
string schema = User.GetTeaLeafSchema();
string text = user.ToTeaLeafText();
string json = user.ToTeaLeafJson();
user.CompileToTeaLeaf("user.tlbx");
var loaded = User.FromTeaLeaf(doc);

Requirements:

  • Class must be partial
  • Annotated with [TeaLeaf]
  • Properties must have public getters (and setters for deserialization)

2. Reflection Serializer

For generic types, dynamic scenarios, or types you don’t control:

using var doc = TeaLeafSerializer.ToDocument(user);
string text = TeaLeafSerializer.ToText(user);
string json = TeaLeafSerializer.ToJson(user);
var loaded = TeaLeafSerializer.Deserialize<User>(doc);

Core Types

TLDocument

The in-memory document, wrapping a native handle:

// Parse text
using var doc = TLDocument.Parse("name: alice\nage: 30");

// Load from file
using var doc = TLDocument.ParseFile("data.tl");

// From JSON
using var doc = TLDocument.FromJson(jsonString);

// Access values
string[] keys = doc.Keys;
using var value = doc["name"];

// Output
string text = doc.ToText();
string json = doc.ToJson();
doc.Compile("output.tlbx", compress: true);

TLValue

Represents any TeaLeaf value with type-safe accessors:

using var val = doc["users"];

// Type checking
TLType type = val.Type;
bool isNull = val.IsNull;

// Primitive access
bool? b = val.AsBool();
long? i = val.AsInt();
double? f = val.AsFloat();
string? s = val.AsString();
byte[]? bytes = val.AsBytes();
DateTimeOffset? ts = val.AsDateTime();

// Collection access
int len = val.ArrayLength;
using var elem = val[0];
using var field = val["name"];
string[] keys = val.ObjectKeys;

// Dynamic conversion
object? obj = val.ToObject();

TLReader

Binary file reader with optional memory mapping:

// Standard read
using var reader = TLReader.Open("data.tlbx");

// Memory-mapped (zero-copy for large files)
using var reader = TLReader.OpenMmap("data.tlbx");

// Access
string[] keys = reader.Keys;
using var val = reader["users"];

// Schema introspection
int schemaCount = reader.SchemaCount;
string name = reader.GetSchemaName(0);

Next Steps

Source Generator

The TeaLeaf source generator is a C# incremental source generator (IIncrementalGenerator) that generates serialization and deserialization code at compile time.

How It Works

  1. Roslyn detects classes annotated with [TeaLeaf]
  2. ModelAnalyzer examines the type’s properties, attributes, and nested types
  3. TLTextEmitter generates serialization methods
  4. DeserializerEmitter generates deserialization methods
  5. Generated code is added as a partial class extension

Requirements

  • The class must be partial
  • Annotated with [TeaLeaf] (from TeaLeaf.Annotations)
  • Public properties with getters (and setters for deserialization)
  • .NET 8.0+ with incremental source generator support

Basic Example

using TeaLeaf.Annotations;

[TeaLeaf]
public partial class User
{
    public int Id { get; set; }
    public string Name { get; set; } = "";
    [TLOptional] public string? Email { get; set; }
    public bool Active { get; set; }
}

Generated Methods

For each [TeaLeaf] class, the generator produces:

GetTeaLeafSchema()

Returns the @struct definition as a string:

string schema = User.GetTeaLeafSchema();
// "@struct user (id: int, name: string, email: string?, active: bool)"

ToTeaLeafText()

Serializes the instance to TeaLeaf text body format:

string text = user.ToTeaLeafText();
// "(1, \"Alice\", \"alice@example.com\", true)"

ToTeaLeafDocument(string key = "user")

Returns a complete TeaLeaf text document with schemas:

string doc = user.ToTeaLeafDocument();
// "@struct user (id: int, name: string, email: string?, active: bool)\nuser: (1, ...)"

ToTLDocument(string key = "user")

Parses through the native engine to create a TLDocument:

using var doc = user.ToTLDocument();
string json = doc.ToJson();
doc.Compile("user.tlbx");

ToTeaLeafJson(string key = "user")

Serializes to JSON via the native engine:

string json = user.ToTeaLeafJson();

CompileToTeaLeaf(string path, string key = "user", bool compress = false)

Compiles directly to a .tlbx binary file:

user.CompileToTeaLeaf("user.tlbx", compress: true);

FromTeaLeaf(TLDocument doc, string key = "user")

Deserializes from a TLDocument:

using var doc = TLDocument.ParseFile("user.tlbx");
var loaded = User.FromTeaLeaf(doc);

FromTeaLeaf(TLValue value)

Deserializes from a TLValue (for nested types):

using var val = doc["user"];
var loaded = User.FromTeaLeaf(val);

Nested Types

Types referencing other [TeaLeaf] types are fully supported:

[TeaLeaf]
public partial class Address
{
    public string Street { get; set; } = "";
    public string City { get; set; } = "";
}

[TeaLeaf]
public partial class Person
{
    public string Name { get; set; } = "";
    public Address Home { get; set; } = new();
    [TLOptional] public Address? Work { get; set; }
}

Generated schema:

@struct address (street: string, city: string)
@struct person (name: string, home: address, work: address?)

Collections

[TeaLeaf]
public partial class Team
{
    public string Name { get; set; } = "";
    public List<string> Tags { get; set; } = new();
    public List<Person> Members { get; set; } = new();
}

Generated schema:

@struct team (name: string, tags: []string, members: []person)

Enum Support

Enums are serialized as snake_case strings:

public enum Status { Active, Inactive, Suspended }

[TeaLeaf]
public partial class User
{
    public string Name { get; set; } = "";
    public Status Status { get; set; }
}

In TeaLeaf text: ("Alice", active)

Type Mapping

C# TypeTeaLeaf Type
boolbool
intint
longint64
shortint16
sbyteint8
uintuint
ulonguint64
ushortuint16
byteuint8
doublefloat
floatfloat32
decimalfloat
stringstring
DateTimetimestamp
DateTimeOffsettimestamp
byte[]bytes
List<T>[]T
T? / Nullable<T>T?
Enumstring
[TeaLeaf] classstruct reference

See Also

.NET Attributes Reference

All TeaLeaf annotations are in the TeaLeaf.Annotations namespace.

Type-Level Attributes

[TeaLeaf] / [TeaLeaf("struct_name")]

Marks a class for source generator processing:

[TeaLeaf]           // Schema name: "my_class" (auto snake_case)
public partial class MyClass { }

[TeaLeaf("config")] // Schema name: "config" (explicit)
public partial class AppConfiguration { }

The optional string parameter sets the struct name used in the TeaLeaf schema. If omitted, the class name is converted to snake_case.

The attribute also has an EmitSchema property (defaults to true). When set to false, the source generator skips @struct and @table output for arrays of this type:

[TeaLeaf(EmitSchema = false)]  // Data only, no @struct definition
public partial class RawData { }

[TLKey("key_name")]

Overrides the top-level key used when serializing as a document entry:

[TeaLeaf]
[TLKey("app_settings")]
public partial class Config
{
    public string Host { get; set; } = "";
    public int Port { get; set; }
}

// Default key would be "config", but TLKey overrides to "app_settings"
string doc = config.ToTeaLeafDocument(); // key is "app_settings"

Property-Level Attributes

[TLSkip]

Exclude a property from serialization and deserialization:

[TeaLeaf]
public partial class User
{
    public int Id { get; set; }
    public string Name { get; set; } = "";

    [TLSkip]
    public string ComputedDisplayName => $"User #{Id}: {Name}";
}

[TLOptional]

Mark a property as nullable in the schema:

[TeaLeaf]
public partial class User
{
    public string Name { get; set; } = "";

    [TLOptional]
    public string? Email { get; set; }

    [TLOptional]
    public int? Age { get; set; }
}
// Schema: @struct user (name: string, email: string?, age: int?)

Note: Properties of nullable reference types (string?) or Nullable<T> types (int?) are automatically treated as optional. The [TLOptional] attribute is mainly for explicit documentation.

[TLRename("field_name")]

Override the field name in the TeaLeaf schema:

[TeaLeaf]
public partial class User
{
    [TLRename("user_name")]
    public string Name { get; set; } = "";

    [TLRename("is_active")]
    public bool Active { get; set; }
}
// Schema: @struct user (user_name: string, is_active: bool)

Without [TLRename], property names are converted to snake_case (Namename, IsActiveis_active).

[TLType("type_name")]

Override the TeaLeaf type for a field:

[TeaLeaf]
public partial class Event
{
    public string Name { get; set; } = "";

    [TLType("timestamp")]
    public long CreatedAt { get; set; }  // Would be int64, forced to timestamp

    [TLType("uint64")]
    public long LargeCount { get; set; }  // Would be int64, forced to uint64
}

Valid type names: bool, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float, float32, float64, string, bytes, timestamp.

Attribute Summary

AttributeLevelDescription
[TeaLeaf] / [TeaLeaf("name")]ClassEnable source generation, optional struct name
[TLKey("key")]ClassOverride document key
[TLSkip]PropertyExclude from serialization
[TLOptional]PropertyMark as nullable in schema
[TLRename("name")]PropertyOverride field name
[TLType("type")]PropertyOverride TeaLeaf type

Combining Attributes

[TeaLeaf("event_record")]
[TLKey("events")]
public partial class EventRecord
{
    [TLRename("event_id")]
    public int Id { get; set; }

    public string Name { get; set; } = "";

    [TLType("timestamp")]
    public long CreatedAt { get; set; }

    [TLOptional]
    [TLRename("extra_data")]
    public string? Metadata { get; set; }

    [TLSkip]
    public string DisplayLabel => $"{Name} ({Id})";
}

Generated schema:

@struct event_record (event_id: int, name: string, created_at: timestamp, extra_data: string?)

Reflection Serializer

The TeaLeafSerializer class provides runtime reflection-based serialization for scenarios where the source generator isn’t suitable.

When to Use

ScenarioApproach
Known types at compile timeSource Generator (recommended)
Generic types (T)Reflection Serializer
Types you don’t control (third-party)Reflection Serializer
Dynamic/runtime-determined typesReflection Serializer
Maximum performanceSource Generator

API

All methods are on the static TeaLeafSerializer class.

Serialization

// To document text (schemas + data)
string docText = TeaLeafSerializer.ToDocument<User>(user);
string docText = TeaLeafSerializer.ToDocument<User>(user, key: "custom_key");

// To TeaLeaf text (data only, no schemas)
string text = TeaLeafSerializer.ToText<User>(user);

// To TLDocument (for further operations)
using var doc = TeaLeafSerializer.ToTLDocument<User>(user);
using var doc = TeaLeafSerializer.ToTLDocument<User>(user, key: "custom_key");

// To JSON (via native engine)
string json = TeaLeafSerializer.ToJson<User>(user);

// Compile to binary
TeaLeafSerializer.Compile<User>(user, "output.tlbx", compress: true);

Deserialization

// From TLDocument
using var doc = TLDocument.Parse(tlText);
var user = TeaLeafSerializer.FromDocument<User>(doc);
var user = TeaLeafSerializer.FromDocument<User>(doc, key: "custom_key");

// From TLValue (for nested types)
using var val = doc.Get("user");
var user = TeaLeafSerializer.FromValue<User>(val);

// From text
var user = TeaLeafSerializer.FromText<User>(tlText);

Schema Generation

// Get schema string
string schema = TeaLeafSerializer.GetSchema<User>();
// "@struct user (id: int, name: string, email: string?)"

// Get TeaLeaf type name for a C# type
string typeName = TeaLeafTextHelper.GetTLTypeName(typeof(int));    // "int"
string typeName = TeaLeafTextHelper.GetTLTypeName(typeof(long));   // "int64"
string typeName = TeaLeafTextHelper.GetTLTypeName(typeof(DateTime)); // "timestamp"

Type Mapping

The reflection serializer uses TeaLeafTextHelper.GetTLTypeName() for type resolution:

C# TypeTeaLeaf Type
boolbool
intint
longint64
shortint16
sbyteint8
uintuint
ulonguint64
ushortuint16
byteuint8
doublefloat
floatfloat32
decimalfloat
stringstring
DateTimetimestamp
DateTimeOffsettimestamp
byte[]bytes
List<T>[]T
Dictionary<string, T>object
Enumstring
[TeaLeaf] classstruct reference

Attributes

The reflection serializer respects the same attributes as the source generator:

  • [TeaLeaf] / [TeaLeaf("name")] – struct name
  • [TLKey("key")] – document key
  • [TLSkip] – skip property
  • [TLOptional] – nullable field
  • [TLRename("name")] – rename field
  • [TLType("type")] – override type

Text Helpers

The TeaLeafTextHelper class provides utilities used by the serializer:

// PascalCase to snake_case
TeaLeafTextHelper.ToSnakeCase("MyProperty"); // "my_property"

// String quoting
TeaLeafTextHelper.NeedsQuoting("hello world"); // true
TeaLeafTextHelper.QuoteIfNeeded("hello world"); // "\"hello world\""
TeaLeafTextHelper.EscapeString("line\nnewline"); // "line\\nnewline"

// Value formatting
var sb = new StringBuilder();
TeaLeafTextHelper.AppendValue(sb, 42, typeof(int)); // "42"
TeaLeafTextHelper.AppendValue(sb, null, typeof(string)); // "~"

Performance Considerations

The reflection serializer uses System.Reflection at runtime, which is slower than the source generator approach. For hot paths or high-throughput scenarios, prefer the source generator.

However, the actual binary compilation and native operations are identical – both approaches use the same native Rust library under the hood. The performance difference is only in the C# serialization/deserialization layer.

Native Types

The managed wrapper types provide safe access to the native TeaLeaf library. All native types implement IDisposable and must be disposed to prevent memory leaks.

TLDocument

Represents a parsed TeaLeaf document.

Construction

// Parse text
using var doc = TLDocument.Parse("name: alice\nage: 30");

// Parse from file (text or binary -- auto-detected)
using var doc = TLDocument.ParseFile("data.tl");
using var doc = TLDocument.ParseFile("data.tlbx");

// From JSON string
using var doc = TLDocument.FromJson("{\"name\": \"alice\"}");

Value Access

// Get value by key
using var val = doc["name"];       // indexer
using var val = doc.Get("name");   // method

// Get all keys
string[] keys = doc.Keys;

Output

// To text
string text = doc.ToText();               // full document (schemas + data)
string data = doc.ToTextDataOnly();       // data only (no schemas)

// To JSON
string json = doc.ToJson();               // pretty-printed
string json = doc.ToJsonCompact();        // minified

// Compile to binary
doc.Compile("output.tlbx", compress: true);

Disposal

TLDocument wraps a native pointer. Always dispose:

using var doc = TLDocument.Parse(text);  // using statement (recommended)

// Or manual disposal
var doc = TLDocument.Parse(text);
try { /* use doc */ }
finally { doc.Dispose(); }

TLValue

Represents any TeaLeaf value with type-safe accessors.

Type Checking

TLType type = value.Type;    // Enum: Null, Bool, Int, UInt, Float, String, etc.
bool isNull = value.IsNull;  // Shorthand for Type == TLType.Null

Primitive Accessors

Each returns null if the value is not the expected type:

bool? b = value.AsBool();
long? i = value.AsInt();
ulong? u = value.AsUInt();
double? f = value.AsFloat();
string? s = value.AsString();
long? ts = value.AsTimestamp();          // Unix milliseconds
short? tz = value.AsTimestampOffset();  // Timezone offset in minutes (0 = UTC)
DateTimeOffset? dt = value.AsDateTime(); // Converted from timestamp (preserves offset)
byte[]? bytes = value.AsBytes();

Object Access

string[] keys = value.ObjectKeys;          // All field names
using var field = value.GetField("name");  // Get by key
using var field = value["name"];           // Indexer shorthand

Array Access

int len = value.ArrayLength;
using var elem = value.GetArrayElement(0); // By index
using var elem = value[0];                 // Indexer shorthand

foreach (var item in value.AsArray())
{
    // item is a TLValue -- caller must dispose
    using (item)
    {
        Console.WriteLine(item.AsString());
    }
}

Map Access

int len = value.MapLength;
using var key = value.GetMapKey(0);
using var val = value.GetMapValue(0);

foreach (var (k, v) in value.AsMap())
{
    using (k) using (v)
    {
        Console.WriteLine($"{k.AsString()}: {v.AsString()}");
    }
}

Reference and Tag Access

string? refName = value.AsRefName();   // For Ref values
string? tagName = value.AsTagName();   // For Tagged values
using var inner = value.AsTagValue();  // Inner value of a Tagged

Dynamic Conversion

object? obj = value.ToObject();
// Returns: bool, long, ulong, double, string, byte[],
// DateTimeOffset, object[], Dictionary<string, object?>, or null

TLReader

Binary file reader with optional memory-mapped I/O.

Construction

// Standard file read
using var reader = TLReader.Open("data.tlbx");

// Memory-mapped (recommended for large files)
using var reader = TLReader.OpenMmap("data.tlbx");

Value Access

string[] keys = reader.Keys;
using var val = reader["users"];
using var val = reader.Get("users");

Schema Introspection

foreach (var schema in reader.Schemas)
{
    Console.WriteLine($"Schema: {schema.Name}");
    foreach (var field in schema.Fields)
    {
        Console.WriteLine($"  {field.Name}: {(field.IsArray ? "[]" : "")}{field.Type}{(field.IsNullable ? "?" : "")}");
    }
}

// Look up a specific schema by name
var userSchema = reader.GetSchema("user");
if (userSchema != null)
{
    Console.WriteLine($"user has {userSchema.Fields.Count} fields");
}

TLType Enum

public enum TLType
{
    Null = 0,
    Bool = 1,
    Int = 2,
    UInt = 3,
    Float = 4,
    String = 5,
    Bytes = 6,
    Array = 7,
    Object = 8,
    Map = 9,
    Ref = 10,
    Tagged = 11,
    Timestamp = 12,
}

Memory Management

All native types (TLDocument, TLValue, TLReader) hold native pointers and must be disposed:

// Preferred: using statement
using var doc = TLDocument.Parse(text);

// For values from collections, dispose each item:
foreach (var item in value.AsArray())
{
    using (item)
    {
        // process
    }
}

// For map entries:
foreach (var (key, val) in value.AsMap())
{
    using (key) using (val)
    {
        // process
    }
}

Accessing a disposed object throws ObjectDisposedException.

Diagnostics

The TeaLeaf source generator reports diagnostics (warnings and errors) through the standard C# compiler diagnostic system.

Diagnostic Codes

CodeSeverityMessage
TL001ErrorType must be declared as partial
TL002WarningUnsupported property type
TL003ErrorInvalid TLType attribute value
TL004WarningNested type not annotated with [TeaLeaf]
TL005WarningCircular type reference detected
TL006ErrorOpen generic types are not supported

TL001: Type Must Be Partial

The source generator needs to add methods to your class. This requires the partial modifier.

// ERROR: TL001
[TeaLeaf]
public class User { }  // Missing 'partial'

// FIXED
[TeaLeaf]
public partial class User { }

TL002: Unsupported Property Type

A property type isn’t directly mappable to a TeaLeaf type.

[TeaLeaf]
public partial class Config
{
    public IntPtr NativeHandle { get; set; }  // WARNING: TL002
}

The property will be skipped. Supported types include all primitives, string, DateTime, DateTimeOffset, byte[], List<T>, Dictionary<string, T>, enums, and other [TeaLeaf]-annotated classes.

TL003: Invalid TLType Value

The [TLType] attribute was given an unrecognized type name.

[TeaLeaf]
public partial class Event
{
    [TLType("datetime")]   // ERROR: TL003 -- "datetime" is not a valid type
    public long Created { get; set; }

    [TLType("timestamp")]  // CORRECT
    public long Updated { get; set; }
}

Valid values: bool, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float, float32, float64, string, bytes, timestamp.

TL004: Nested Type Not Annotated

A property references a class type that doesn’t have the [TeaLeaf] attribute.

public class Address  // Missing [TeaLeaf]
{
    public string City { get; set; } = "";
}

[TeaLeaf]
public partial class User
{
    public Address Home { get; set; } = new();  // WARNING: TL004
}

Fix by adding [TeaLeaf] to the nested type:

[TeaLeaf]
public partial class Address
{
    public string City { get; set; } = "";
}

TL005: Circular Type Reference

A type references itself (directly or transitively), which may cause a stack overflow at runtime during serialization.

[TeaLeaf]
public partial class TreeNode
{
    public string Name { get; set; } = "";
    public TreeNode? Child { get; set; }  // WARNING: TL005 -- circular reference
}

The code will still compile, but recursive structures must be bounded (e.g., use [TLOptional] with null termination) to avoid infinite recursion.

TL006: Open Generic Types

Generic type parameters are not supported:

// ERROR: TL006
[TeaLeaf]
public partial class Container<T>
{
    public T Value { get; set; }
}

Use concrete types instead. For generic scenarios, use the Reflection Serializer.

Viewing Diagnostics

Diagnostics appear in:

  • Visual Studio – Error List window
  • VS Code – Problems panel (with C# extension)
  • dotnet build – terminal output
  • MSBuild – build log

Example compiler output:

User.cs(3,22): error TL001: TeaLeaf type 'User' must be declared as partial
Config.cs(8,16): warning TL004: Property 'Address' type is not annotated with [TeaLeaf]

Platform Support

The TeaLeaf NuGet package includes pre-built native libraries for all major platforms.

Supported Platforms

OSArchitectureNative LibraryStatus
Windowsx64tealeaf_ffi.dllSupported
WindowsARM64tealeaf_ffi.dllSupported
Linuxx64 (glibc)libtealeaf_ffi.soSupported
LinuxARM64 (glibc)libtealeaf_ffi.soSupported
macOSx64 (Intel)libtealeaf_ffi.dylibSupported
macOSARM64 (Apple Silicon)libtealeaf_ffi.dylibSupported

.NET Requirements

  • .NET 8.0 or later
  • C# compiler with incremental source generator support (for the source generator)

NuGet Package Structure

The NuGet package bundles native libraries for all platforms using the runtimes folder convention:

TeaLeaf.nupkg
├── lib/net8.0/
│   ├── TeaLeaf.dll
│   ├── TeaLeaf.Annotations.dll
│   └── TeaLeaf.Generators.dll
└── runtimes/
    ├── win-x64/native/tealeaf_ffi.dll
    ├── win-arm64/native/tealeaf_ffi.dll
    ├── linux-x64/native/libtealeaf_ffi.so
    ├── linux-arm64/native/libtealeaf_ffi.so
    ├── osx-x64/native/libtealeaf_ffi.dylib
    └── osx-arm64/native/libtealeaf_ffi.dylib

The .NET runtime automatically selects the correct native library based on the host platform.

Native Library Loading

The managed layer uses [DllImport("tealeaf_ffi")] for P/Invoke. The .NET runtime resolves the native library through:

  1. NuGet runtimes folder – automatic for published apps
  2. Application directory – for self-contained deployments
  3. System library pathPATH (Windows), LD_LIBRARY_PATH (Linux), DYLD_LIBRARY_PATH (macOS)

Deployment

Framework-Dependent

dotnet publish -c Release

The native library is copied to the output directory automatically.

Self-Contained

dotnet publish -c Release --self-contained -r win-x64
dotnet publish -c Release --self-contained -r linux-x64
dotnet publish -c Release --self-contained -r osx-arm64

Docker

For Linux containers, use the appropriate runtime:

FROM mcr.microsoft.com/dotnet/runtime:8.0
# Native library is included in the publish output
COPY --from=build /app/publish .

Building Native Libraries from Source

If you need a platform not included in the NuGet package:

# Clone the repository
git clone https://github.com/krishjag/tealeaf.git
cd tealeaf

# Build the FFI library
cargo build --release --package tealeaf-ffi

# Output location
# Windows: target/release/tealeaf_ffi.dll
# Linux:   target/release/libtealeaf_ffi.so
# macOS:   target/release/libtealeaf_ffi.dylib

Place the built library in your application directory or system library path.

Troubleshooting

DllNotFoundException

The native library could not be found. Check:

  1. The package includes your platform (dotnet --info to check RID)
  2. For self-contained apps, ensure the correct -r flag is used
  3. For manual builds, ensure the library is in the application directory

BadImageFormatException

Architecture mismatch between the .NET runtime and native library. Ensure both are the same architecture (x64/ARM64).

EntryPointNotFoundException

Version mismatch between the managed and native libraries. Ensure both are from the same release.

FFI Reference: Overview

The tealeaf-ffi crate exposes a C-compatible API for integrating TeaLeaf into any language that supports C FFI (Foreign Function Interface).

Architecture

┌──────────────────────┐
│  Host Language       │
│  (.NET, Python, etc.)│
├──────────────────────┤
│  FFI Bindings        │
│  (P/Invoke, ctypes)  │
├──────────────────────┤
│  tealeaf_ffi         │  ← C ABI library
│  (cdylib + staticlib)│
├──────────────────────┤
│  tealeaf-core        │  ← Rust core library
└──────────────────────┘

The FFI layer provides:

  • Document parsing – parse text, files, and JSON
  • Value access – type-safe accessors for all value types
  • Binary reader – read .tlbx files with optional memory mapping
  • Schema introspection – query schema structure at runtime
  • JSON conversion – to/from JSON
  • Binary compilation – compile documents to .tlbx
  • Error handling – thread-local last-error pattern
  • Memory management – explicit free functions for all allocated resources

Output Libraries

The crate builds both dynamic and static libraries:

PlatformDynamic LibraryStatic Library
Windowstealeaf_ffi.dlltealeaf_ffi.lib
Linuxlibtealeaf_ffi.solibtealeaf_ffi.a
macOSlibtealeaf_ffi.dyliblibtealeaf_ffi.a

C Header

The build generates a C header via cbindgen:

#include "tealeaf_ffi.h"

// Parse a document
TLDocument* doc = tl_parse("name: alice\nage: 30");
if (!doc) {
    char* err = tl_get_last_error();
    fprintf(stderr, "Error: %s\n", err);
    tl_string_free(err);
    return 1;
}

// Access a value
TLValue* val = tl_document_get(doc, "name");
if (val && tl_value_type(val) == TL_STRING) {
    char* name = tl_value_as_string(val);
    printf("Name: %s\n", name);
    tl_string_free(name);
}

tl_value_free(val);
tl_document_free(doc);

Opaque Types

The FFI uses opaque pointer types:

TypeDescription
TLDocument*Parsed document handle
TLValue*Value handle (any type)
TLReader*Binary file reader handle

All handles must be freed with their corresponding _free function.

Error Model

TeaLeaf FFI uses the thread-local last-error pattern:

  1. Functions that can fail return NULL (pointers) or a result struct
  2. On failure, the error message is stored in thread-local storage
  3. Call tl_get_last_error() to retrieve it
  4. Call tl_clear_error() to clear it
TLDocument* doc = tl_parse("invalid {");
if (!doc) {
    char* err = tl_get_last_error();
    // err contains the parse error message
    tl_string_free(err);
}

Null Safety

All FFI functions that accept pointers are null-safe:

  • Passing NULL returns a safe default (0, false, NULL) rather than crashing
  • This makes it safe to chain calls without checking each one

Next Steps

FFI API Reference

Complete listing of all exported FFI functions.

Error Handling

tl_get_last_error

char* tl_get_last_error(void);

Returns the last error message, or NULL if no error. Caller must free with tl_string_free.

tl_clear_error

void tl_clear_error(void);

Clears the thread-local error state.

Version

tl_version

const char* tl_version(void);

Returns the library version string (e.g., "2.0.0-beta.8"). The returned pointer is static – do not free it.

Document API

tl_parse

TLDocument* tl_parse(const char* text);

Parse a TeaLeaf text string. Returns NULL on failure (check tl_get_last_error).

tl_parse_file

TLDocument* tl_parse_file(const char* path);

Parse a TeaLeaf text file. Returns NULL on failure.

tl_document_free

void tl_document_free(TLDocument* doc);

Free a document. Safe to call with NULL.

tl_document_get

TLValue* tl_document_get(const TLDocument* doc, const char* key);

Get a value by key. Returns NULL if key not found or doc is NULL. Caller must free with tl_value_free.

tl_document_keys

char** tl_document_keys(const TLDocument* doc);

Get all top-level keys as a NULL-terminated array. Caller must free with tl_string_array_free.

tl_document_to_text

char* tl_document_to_text(const TLDocument* doc);

Convert document to TeaLeaf text (with schemas). Caller must free with tl_string_free.

tl_document_to_text_data_only

char* tl_document_to_text_data_only(const TLDocument* doc);

Convert document to TeaLeaf text (data only, no schemas). Caller must free with tl_string_free.

tl_document_compile

TLResult tl_document_compile(const TLDocument* doc, const char* path, bool compress);

Compile document to binary file. Returns a TLResult indicating success or failure.

JSON API

tl_document_from_json

TLDocument* tl_document_from_json(const char* json);

Parse a JSON string into a TLDocument. Returns NULL on failure.

tl_document_to_json

char* tl_document_to_json(const TLDocument* doc);

Convert document to pretty-printed JSON. Caller must free with tl_string_free.

tl_document_to_json_compact

char* tl_document_to_json_compact(const TLDocument* doc);

Convert document to minified JSON. Caller must free with tl_string_free.

Value API

tl_value_type

TLValueType tl_value_type(const TLValue* value);

Get the type of a value. Returns TL_NULL (0) if value is NULL.

tl_value_free

void tl_value_free(TLValue* value);

Free a value. Safe to call with NULL.

Primitive Accessors

bool    tl_value_as_bool(const TLValue* value);       // false if not bool
int64_t tl_value_as_int(const TLValue* value);        // 0 if not int
uint64_t tl_value_as_uint(const TLValue* value);      // 0 if not uint
double  tl_value_as_float(const TLValue* value);      // 0.0 if not float
char*   tl_value_as_string(const TLValue* value);     // NULL if not string; free with tl_string_free
int64_t tl_value_as_timestamp(const TLValue* value);  // 0 if not timestamp (millis only)
int16_t tl_value_as_timestamp_offset(const TLValue* value); // 0 if not timestamp (tz offset in minutes)

Bytes Accessors

size_t       tl_value_bytes_len(const TLValue* value);   // 0 if not bytes
const uint8_t* tl_value_bytes_data(const TLValue* value); // NULL if not bytes; pointer valid while value lives

Reference/Tag Accessors

char*    tl_value_ref_name(const TLValue* value);   // NULL if not ref; free with tl_string_free
char*    tl_value_tag_name(const TLValue* value);    // NULL if not tagged; free with tl_string_free
TLValue* tl_value_tag_value(const TLValue* value);   // NULL if not tagged; free with tl_value_free

Array Accessors

size_t   tl_value_array_len(const TLValue* value);                 // 0 if not array
TLValue* tl_value_array_get(const TLValue* value, size_t index);   // NULL if out of bounds; free with tl_value_free

Object Accessors

TLValue* tl_value_object_get(const TLValue* value, const char* key); // NULL if not found; free with tl_value_free
char**   tl_value_object_keys(const TLValue* value);                  // NULL-terminated; free with tl_string_array_free

Map Accessors

size_t   tl_value_map_len(const TLValue* value);                    // 0 if not map
TLValue* tl_value_map_get_key(const TLValue* value, size_t index);  // NULL if out of bounds; free with tl_value_free
TLValue* tl_value_map_get_value(const TLValue* value, size_t index);// NULL if out of bounds; free with tl_value_free

Binary Reader API

tl_reader_open

TLReader* tl_reader_open(const char* path);

Open a binary file for reading. Returns NULL on failure.

tl_reader_open_mmap

TLReader* tl_reader_open_mmap(const char* path);

Open a binary file with memory-mapped I/O (zero-copy). Returns NULL on failure.

tl_reader_free

void tl_reader_free(TLReader* reader);

Free a reader. Safe to call with NULL.

tl_reader_get

TLValue* tl_reader_get(const TLReader* reader, const char* key);

Get a value by key from binary. Returns NULL if not found. Caller must free with tl_value_free.

tl_reader_keys

char** tl_reader_keys(const TLReader* reader);

Get all section keys. Returns NULL-terminated array. Free with tl_string_array_free.

Schema API

size_t tl_reader_schema_count(const TLReader* reader);
char*  tl_reader_schema_name(const TLReader* reader, size_t index);
size_t tl_reader_schema_field_count(const TLReader* reader, size_t schema_index);
char*  tl_reader_schema_field_name(const TLReader* reader, size_t schema_index, size_t field_index);
char*  tl_reader_schema_field_type(const TLReader* reader, size_t schema_index, size_t field_index);
bool   tl_reader_schema_field_nullable(const TLReader* reader, size_t schema_index, size_t field_index);
bool   tl_reader_schema_field_is_array(const TLReader* reader, size_t schema_index, size_t field_index);

All char* returns from schema functions must be freed with tl_string_free. Out-of-bounds indices return NULL/0/false.

Memory Management

tl_string_free

void tl_string_free(char* s);

Free a string returned by any FFI function. Safe to call with NULL.

tl_string_array_free

void tl_string_array_free(char** arr);

Free a NULL-terminated string array. Frees each string and the array pointer. Safe to call with NULL.

tl_result_free

void tl_result_free(TLResult* result);

Free any allocated memory inside a TLResult. Safe to call with NULL.

Type Enum

typedef enum {
    TL_NULL      = 0,
    TL_BOOL      = 1,
    TL_INT       = 2,
    TL_UINT      = 3,
    TL_FLOAT     = 4,
    TL_STRING    = 5,
    TL_BYTES     = 6,
    TL_ARRAY     = 7,
    TL_OBJECT    = 8,
    TL_MAP       = 9,
    TL_REF       = 10,
    TL_TAGGED    = 11,
    TL_TIMESTAMP = 12,
} TLValueType;

Memory Management

The FFI layer uses explicit manual memory management. Understanding ownership rules is critical for writing correct bindings.

Ownership Rules

Rule 1: Caller Owns Returned Pointers

Every function that returns a heap-allocated pointer transfers ownership to the caller. The caller must free it with the appropriate function:

Return TypeFree Function
TLDocument*tl_document_free()
TLValue*tl_value_free()
TLReader*tl_reader_free()
char*tl_string_free()
char**tl_string_array_free()
TLResulttl_result_free()

Rule 2: Borrowed Pointers Are Read-Only

Functions that take const T* parameters borrow the pointer. The FFI layer does not take ownership or free inputs:

// doc is borrowed -- you still own it and must free it later
TLValue* val = tl_document_get(doc, "key");
// ... use val ...
tl_value_free(val);  // free the returned value
tl_document_free(doc);  // free the document separately

Rule 3: Null Is Always Safe

Every free function and every accessor accepts NULL safely:

tl_document_free(NULL);  // no-op
tl_value_free(NULL);     // no-op
tl_string_free(NULL);    // no-op

TLValue* val = tl_document_get(NULL, "key");  // returns NULL
bool b = tl_value_as_bool(NULL);              // returns false

Common Patterns

Parse → Use → Free

TLDocument* doc = tl_parse("name: alice");
if (doc) {
    TLValue* name = tl_document_get(doc, "name");
    if (name) {
        char* str = tl_value_as_string(name);
        if (str) {
            printf("%s\n", str);
            tl_string_free(str);
        }
        tl_value_free(name);
    }
    tl_document_free(doc);
}

Iterating Arrays

TLValue* arr = tl_document_get(doc, "items");
size_t len = tl_value_array_len(arr);

for (size_t i = 0; i < len; i++) {
    TLValue* elem = tl_value_array_get(arr, i);
    // use elem...
    tl_value_free(elem);  // free each element
}

tl_value_free(arr);  // free the array value

Iterating Object Keys

TLValue* obj = tl_document_get(doc, "config");
char** keys = tl_value_object_keys(obj);

if (keys) {
    for (int i = 0; keys[i] != NULL; i++) {
        TLValue* val = tl_value_object_get(obj, keys[i]);
        // use val...
        tl_value_free(val);
    }
    tl_string_array_free(keys);  // frees all strings AND the array
}

tl_value_free(obj);

Iterating Maps

TLValue* map = tl_document_get(doc, "headers");
size_t len = tl_value_map_len(map);

for (size_t i = 0; i < len; i++) {
    TLValue* key = tl_value_map_get_key(map, i);
    TLValue* val = tl_value_map_get_value(map, i);

    char* k = tl_value_as_string(key);
    char* v = tl_value_as_string(val);
    printf("%s: %s\n", k, v);

    tl_string_free(k);
    tl_string_free(v);
    tl_value_free(key);
    tl_value_free(val);
}

tl_value_free(map);

String Arrays

char** keys = tl_document_keys(doc);
if (keys) {
    for (int i = 0; keys[i] != NULL; i++) {
        printf("Key: %s\n", keys[i]);
    }
    tl_string_array_free(keys);  // ONE call frees everything
}

Bytes Data

The tl_value_bytes_data function returns a borrowed pointer valid only while the value lives:

TLValue* val = tl_document_get(doc, "data");
size_t len = tl_value_bytes_len(val);
const uint8_t* data = tl_value_bytes_data(val);

// Copy if you need the data after freeing the value
uint8_t* copy = malloc(len);
memcpy(copy, data, len);

tl_value_free(val);  // data pointer is now invalid
// copy is still valid

Error Strings

Error strings are owned by the caller:

char* err = tl_get_last_error();
if (err) {
    fprintf(stderr, "Error: %s\n", err);
    tl_string_free(err);  // must free
}

Common Mistakes

MistakeConsequenceFix
Not freeing returned pointersMemory leakAlways pair creation with _free
Using pointer after freeUse-after-free / crashSet pointer to NULL after free
Freeing borrowed bytes_data pointerDouble-free / crashOnly free with tl_value_free on the value
Calling wrong free functionUndefined behaviorMatch the free to the allocation type
Freeing strings from string_array individuallyDouble-freeUse tl_string_array_free once

Building from Source

How to build the TeaLeaf FFI library from source.

Prerequisites

  • Rust toolchain (1.70+)
  • A C compiler (for cbindgen header generation)

Build

git clone https://github.com/krishjag/tealeaf.git
cd tealeaf
cargo build --release --package tealeaf-ffi

Output Files

PlatformDynamic LibraryStatic Library
Windowstarget/release/tealeaf_ffi.dlltarget/release/tealeaf_ffi.lib
Linuxtarget/release/libtealeaf_ffi.sotarget/release/libtealeaf_ffi.a
macOStarget/release/libtealeaf_ffi.dylibtarget/release/libtealeaf_ffi.a

C Header

The build generates a C header via cbindgen (configured in tealeaf-ffi/cbindgen.toml):

# Header is generated during build
# Location: target/tealeaf_ffi.h (or as configured)

Cross-Compilation

Linux ARM64

# Install cross-compilation tools
sudo apt install gcc-aarch64-linux-gnu
rustup target add aarch64-unknown-linux-gnu

# Build
cargo build --release --package tealeaf-ffi --target aarch64-unknown-linux-gnu

Windows ARM64

rustup target add aarch64-pc-windows-msvc
cargo build --release --package tealeaf-ffi --target aarch64-pc-windows-msvc

macOS (from any platform via cross)

# Using cross (https://github.com/cross-rs/cross)
cargo install cross
cross build --release --package tealeaf-ffi --target aarch64-apple-darwin
cross build --release --package tealeaf-ffi --target x86_64-apple-darwin

Linking

Dynamic Linking

# GCC/Clang
gcc -o myapp myapp.c -L/path/to/lib -ltealeaf_ffi

# MSVC
cl myapp.c /link tealeaf_ffi.lib

At runtime, ensure the dynamic library is in the library search path.

Static Linking

# GCC/Clang (Linux)
gcc -o myapp myapp.c /path/to/libtealeaf_ffi.a -lpthread -ldl -lm

# macOS
gcc -o myapp myapp.c /path/to/libtealeaf_ffi.a -framework Security -lpthread

Static linking eliminates the runtime dependency but produces a larger binary.

Dependencies

The FFI crate has minimal dependencies:

[dependencies]
tealeaf-core = { workspace = true }

[build-dependencies]
cbindgen = "0.27"

The resulting library links against:

  • Linux: libpthread, libdl, libm
  • macOS: Security.framework, libpthread
  • Windows: standard Windows system libraries

Writing New Language Bindings

To create bindings for a new language:

  1. Generate or write FFI declarations matching the C header
  2. Load the dynamic library (or link statically)
  3. Wrap opaque pointers in your language’s resource management (destructors, Dispose, __del__, etc.)
  4. Map the error model – check for NULL returns and call tl_get_last_error
  5. Handle string ownership – copy strings to your language’s string type, then free the C string

Example: Python (ctypes)

import ctypes

lib = ctypes.CDLL("libtealeaf_ffi.so")

# Define function signatures
lib.tl_parse.restype = ctypes.c_void_p
lib.tl_parse.argtypes = [ctypes.c_char_p]

lib.tl_document_get.restype = ctypes.c_void_p
lib.tl_document_get.argtypes = [ctypes.c_void_p, ctypes.c_char_p]

lib.tl_value_as_string.restype = ctypes.c_char_p
lib.tl_value_as_string.argtypes = [ctypes.c_void_p]

# Use it
doc = lib.tl_parse(b"name: alice")
val = lib.tl_document_get(doc, b"name")
name = lib.tl_value_as_string(val)
print(name.decode())  # "alice"

lib.tl_value_free(val)
lib.tl_document_free(doc)

Testing

# Run FFI tests
cargo test --package tealeaf-ffi

# Run all workspace tests
cargo test --workspace

LLM Context Engineering

TeaLeaf’s primary use case is context engineering for Large Language Model applications. This guide explains why and how.

The Problem

LLM context windows are limited and expensive. Typical structured data (tool definitions, conversation history, user profiles) consumes tokens proportional to format verbosity:

{
  "messages": [
    {"role": "user", "content": "Hello", "tokens": 2},
    {"role": "assistant", "content": "Hi there!", "tokens": 3},
    {"role": "user", "content": "What's the weather?", "tokens": 5},
    {"role": "assistant", "content": "Let me check...", "tokens": 4}
  ]
}

Every message repeats "role", "content", "tokens". With 50+ messages, this overhead adds up.

The TeaLeaf Approach

@struct Message (role: string, content: string, tokens: int?)

messages: @table Message [
  (user, Hello, 2),
  (assistant, "Hi there!", 3),
  (user, "What's the weather?", 5),
  (assistant, "Let me check...", 4),
]

Field names defined once. Data is positional. For 50 messages, this saves ~40% in text size and ~80% in binary.

Context Assembly Pattern

Define Schemas for Your Context

@struct Tool (name: string, description: string, params: []string)
@struct Message (role: string, content: string, tokens: int?)
@struct UserProfile (id: int, name: string, preferences: []string)

system_prompt: """
  You are a helpful assistant with access to the user's profile
  and conversation history. Use the tools when appropriate.
"""

user: @table UserProfile [
  (42, "Alice", ["concise_responses", "code_examples"]),
]

tools: @table Tool [
  (search, "Search the web for information", ["query"]),
  (calculate, "Evaluate a mathematical expression", ["expression"]),
  (weather, "Get current weather for a location", ["city", "country"]),
]

history: @table Message [
  (user, Hello, 2),
  (assistant, "Hi there! How can I help?", 7),
]

Binary Caching

Compiled .tlbx files make excellent context caches:

#![allow(unused)]
fn main() {
use tealeaf::{TeaLeafBuilder, ToTeaLeaf};

// Build context document
let doc = TeaLeafBuilder::new()
    .add_value("system_prompt", Value::String(system_prompt))
    .add_vec("tools", &tools)
    .add_vec("history", &messages)
    .add("user", &user_profile)
    .build();

// Cache as binary (fast to read back)
doc.compile("context_cache.tlbx", true)?;

// Later: load instantly from binary
let cached = tealeaf::Reader::open("context_cache.tlbx")?;
}

Sending to LLM

Convert to text for LLM consumption:

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("context.tl")?;
let context_text = doc.to_tl_with_schemas();
// Send context_text as part of the prompt
}

Or convert specific sections:

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("context.tl")?;
let json = doc.to_json()?;
// Use JSON for APIs that expect it
}

Size Comparison: Real-World Context

For a typical LLM context with 50 messages, 10 tools, and a user profile:

FormatApproximate Size
JSON~15 KB
TeaLeaf Text~8 KB
TeaLeaf Binary~4 KB
TeaLeaf Binary (compressed)~3 KB

Token savings are significant but less than byte savings. BPE tokenizers partially compress repeated JSON field names, so byte savings overstate token savings by 5-18 percentage points depending on data repetitiveness. For typical structured data, expect ~36% fewer data tokens (median), with savings increasing for larger and more structured datasets.

Token Comparison (verified via OpenAI tokenizer)

DatasetJSON tokensTeaLeaf tokensSavings
Healthcare records90357237%
Retail orders9,8295,63243%

At the API level, prompt instructions are identical for both formats, diluting data-only savings (~36%) to ~30% of total input tokens.

Structured Outputs

LLMs can also produce TeaLeaf-formatted responses:

@struct Insight (category: string, finding: string, confidence: float)

analysis: @table Insight [
  (revenue, "Q4 revenue grew 15% YoY", 0.92),
  (churn, "Customer churn decreased by 3%", 0.87),
  (forecast, "Projected 20% growth in Q1", 0.73),
]

This can then be parsed and processed programmatically:

#![allow(unused)]
fn main() {
let response = TeaLeaf::parse(&llm_output)?;
if let Some(Value::Array(insights)) = response.get("analysis") {
    for insight in insights {
        // Process each structured insight
    }
}
}

Best Practices

  1. Define schemas for all structured context – tool definitions, messages, profiles
  2. Use @table for arrays of uniform objects – conversation history, search results
  3. Cache compiled binary for frequently-used context segments
  4. Use text format for LLM input – models understand the schema notation
  5. String deduplication helps when context has repetitive strings (roles, tool names)
  6. Separate static and dynamic context – compile static context once, merge at runtime

Benchmark Results

The accuracy-benchmark suite tests 12 tasks across 10 business domains on Claude Sonnet 4.5 and GPT-5.2:

  • ~36% fewer data tokens compared to JSON (savings increase with larger datasets)
  • No accuracy loss – scores within noise across all providers
  • See the benchmark README for full methodology and results.

Schema Evolution

TeaLeaf takes a deliberately simple approach to schema evolution: when schemas change, recompile.

Design Philosophy

  • No migration machinery – no schema versioning or compatibility negotiation
  • Source file is master – the .tl file defines the current schema
  • Explicit over implicit – tuples require values for all fields
  • Binary is a compiled artifact – regenerate it like you would a compiled binary

Compatible Changes

These changes do not require recompilation of existing binary files:

Rename Fields

Field data is stored positionally. Names are documentation only:

# Before
@struct user (name: string, email: string)

# After -- binary still works
@struct user (full_name: string, email_address: string)

Widen Types

Automatic safe widening when reading:

# Before: field was int8
@struct sensor (id: int8, reading: float32)

# After: widened to int32 -- readers auto-widen
@struct sensor (id: int, reading: float)

Widening path: int8int16int32int64, float32float64

Incompatible Changes

These changes require recompilation from the .tl source:

Add a Field

# Before
@struct user (id: int, name: string)

# After -- added email field
@struct user (id: int, name: string, email: string?)

Old binary files won’t have the new field. Recompile:

tealeaf compile users.tl -o users.tlbx

Remove a Field

# Before
@struct user (id: int, name: string, legacy_field: string)

# After -- removed legacy_field
@struct user (id: int, name: string)

Reorder Fields

Binary data is positional. Changing field order changes the meaning of stored data:

# Before
@struct point (x: int, y: int)

# After -- DON'T DO THIS without recompiling
@struct point (y: int, x: int)

Narrow Types

Narrowing (e.g., int64int8) can lose data:

# Before
@struct data (value: int64)

# After -- potential data loss
@struct data (value: int8)

Recompilation Workflow

When schemas change:

# 1. Edit the .tl source file
# 2. Validate
tealeaf validate data.tl

# 3. Recompile
tealeaf compile data.tl -o data.tlbx

# 4. Verify
tealeaf info data.tlbx

Migration Strategy

For applications that need to handle schema changes:

Approach 1: Version Keys

Use different top-level keys for different schema versions:

@struct user_v1 (id: int, name: string)
@struct user_v2 (id: int, name: string, email: string?, role: string)

# Old data
users_v1: @table user_v1 [(1, alice), (2, bob)]

# New data
users_v2: @table user_v2 [(3, carol, "carol@ex.com", admin)]

Approach 2: Application-Level Migration

Read old binary, transform in code, write new binary:

#![allow(unused)]
fn main() {
// Read old binary format
let old_doc = tealeaf::Reader::open("data_v1.tlbx")?;

// Transform
let new_doc = TeaLeafBuilder::new()
    .add_vec("users", &migrate_users(&old_doc.get("users")?))
    .build();

// Write new format
new_doc.compile("data_v2.tlbx", true)?;
}

Approach 3: Nullable Fields

Add new fields as nullable to maintain backward compatibility:

@struct user (
  id: int,
  name: string,
  email: string?,    # new field, nullable
  phone: string?,    # new field, nullable
)

Old data can have ~ for new fields. New data populates them.

Comparison with Other Formats

AspectTeaLeafProtobufAvro
Schema locationInline in data fileExternal .protoEmbedded in binary
Adding fieldsRecompileCompatible (field numbers)Compatible (defaults)
Removing fieldsRecompileCompatible (skip unknown)Compatible (skip)
Migration toolNone (recompile)protocSchema registry
ComplexityLowMediumHigh

TeaLeaf trades automatic evolution for simplicity. If your use case requires frequent schema changes across distributed systems, consider Protobuf or Avro.

Performance

Performance characteristics of TeaLeaf across different operations.

Size Efficiency

Benchmark Results

FormatSmall Object10K Points1K Users
JSON1.00x1.00x1.00x
Protobuf0.38x0.65x0.41x
MessagePack0.35x0.63x0.38x
TeaLeaf Binary3.56x0.15x0.47x

Analysis

  • Small objects: TeaLeaf has a 64-byte header overhead. For objects under ~200 bytes, JSON or MessagePack are more compact.
  • Large arrays: TeaLeaf’s string deduplication and schema-based compression shine. For 10K+ records, TeaLeaf achieves 6-7x better compression than JSON.
  • Medium datasets (1K records): TeaLeaf is competitive with Protobuf, with the advantage of embedded schemas.

Where Size Matters Most

ScenarioRecommendation
< 100 bytes payloadUse MessagePack or raw JSON
1-10 KBTeaLeaf text or JSON (overhead amortized)
10 KB - 1 MBTeaLeaf binary with compression
> 1 MBTeaLeaf binary with compression (best gains)

Parse/Decode Speed

TeaLeaf’s dynamic key-based access is ~2-5x slower than Protobuf’s generated code:

OperationTeaLeafProtobufJSON (serde)
Parse textModerateN/AFast
Decode binaryModerateFastN/A
Random key accessO(1) hashO(1) fieldO(n) parse
Full iterationModerateFastFast

Why TeaLeaf Is Slower Than Protobuf

  1. Dynamic dispatch – TeaLeaf resolves fields by name at runtime; Protobuf uses generated code with known offsets
  2. String table lookup – each string access requires a table lookup
  3. Schema resolution – schema structure is parsed from binary at load time

When This Matters

  • Hot loops decoding millions of records → consider Protobuf
  • Cold reads or moderate throughput → TeaLeaf is fine
  • Size-constrained transmission → TeaLeaf’s smaller binary compensates for slower decode

Memory-Mapped Reading

For large binary files, use memory-mapped I/O:

#![allow(unused)]
fn main() {
// Rust
let reader = Reader::open_mmap("large_file.tlbx")?;
}
// .NET
using var reader = TLReader.OpenMmap("large_file.tlbx");

Benefits:

  • No upfront allocation – data loaded on demand by the OS
  • Shared pages – multiple processes can read the same file
  • Lazy loading – only accessed sections are read from disk

Compilation Performance

Compiling .tl to .tlbx:

Input SizeCompile Time (approximate)
1 KB< 1 ms
100 KB~10 ms
1 MB~100 ms
10 MB~1 second

Compression adds ~20-50% to compile time but can reduce output size by 50-90%.

Optimization Tips

1. Use Schemas for Tabular Data

Schema-bound @table data gets optimal encoding:

  • Positional storage (no field name repetition)
  • Null bitmaps (1 bit per nullable field vs full null markers)
  • Type-homogeneous arrays

2. Enable Compression for Large Files

Compression is most effective for:

  • Sections larger than 64 bytes
  • Data with repeated string values
  • Numeric arrays with patterns
tealeaf compile data.tl -o data.tlbx  # compression on by default

3. Use Binary Format for Storage

Text is for authoring; binary is for storage and transmission:

Text (.tl) → Author, review, version control
Binary (.tlbx) → Deploy, cache, transmit

4. Cache Compiled Binary

For data that’s read frequently but written rarely:

#![allow(unused)]
fn main() {
// Compile once
doc.compile("cache.tlbx", true)?;

// Read many times (fast)
let reader = Reader::open_mmap("cache.tlbx")?;
}

5. Minimize String Diversity

String deduplication works best when values repeat:

  • Enum-like fields ("active", "inactive") → deduplicated
  • UUIDs or timestamps → each is unique, no deduplication benefit

6. Use the Right Integer Sizes

The writer auto-selects the smallest representation, but schema types guide encoding:

@struct sensor (
  id: uint16,       # 2 bytes instead of 4
  reading: float32, # 4 bytes instead of 8
  flags: uint8,     # 1 byte instead of 4
)

Round-Trip Fidelity

Understanding which conversion paths preserve data perfectly and where information can be lost.

Round-Trip Matrix

PathData PreservedLost
.tl.tlbx.tlAll data and schemasComments, formatting
.tl.json.tlBasic types (string, number, bool, null, array, object)Schemas, comments, refs, tags, maps, timestamps, bytes
.tl.tlbx.jsonSame as .tl.jsonSame losses
.json.tl.jsonAll JSON-native types(generally lossless)
.json.tlbx.jsonAll JSON-native types(generally lossless)
.tlbx.tlbx (recompile)All data(lossless)

Lossless: Text ↔ Binary

The text-to-binary-to-text round-trip preserves all data and schema information:

tealeaf compile original.tl -o compiled.tlbx
tealeaf decompile compiled.tlbx -o roundtrip.tl
tealeaf compile roundtrip.tl -o roundtrip.tlbx
# compiled.tlbx and roundtrip.tlbx contain equivalent data

What’s lost:

  • Comments (stripped during compilation)
  • Whitespace and formatting
  • The decompiled output may have different formatting than the original

What’s preserved:

  • All schemas (@struct definitions)
  • All values (every type)
  • Key ordering
  • Schema-typed data (table structure)

Lossy: TeaLeaf → JSON

JSON cannot represent all TeaLeaf types. The following conversions are one-way:

Timestamps → Strings

created: 2024-01-15T10:30:00Z

JSON output:

{"created": "2024-01-15T10:30:00.000Z"}

Reimporting: the ISO 8601 string becomes a plain String, not a Timestamp.

Maps → Arrays

headers: @map {200: "OK", 404: "Not Found"}

JSON output:

{"headers": [[200, "OK"], [404, "Not Found"]]}

Reimporting: becomes a plain nested array, not a Map.

References → Objects

!ref: {x: 1, y: 2}
point: !ref

JSON output:

{"point": {"$ref": "ref"}}

Reimporting: becomes a plain object with $ref key, not a Ref.

Tagged Values → Objects

event: :click {x: 100, y: 200}

JSON output:

{"event": {"$tag": "click", "$value": {"x": 100, "y": 200}}}

Reimporting: becomes a plain object, not a Tagged.

Bytes → Hex Strings (JSON only)

Bytes round-trip losslessly within TeaLeaf text format using b"..." literals:

data: b"cafef00d"

However, JSON export converts bytes to hex strings:

{"data": "0xcafef00d"}

Reimporting from JSON: becomes a plain string, not bytes.

Schemas → Lost

@struct user (id: int, name: string)
users: @table user [(1, alice), (2, bob)]

JSON output:

{"users": [{"id": 1, "name": "alice"}, {"id": 2, "name": "bob"}]}

The @struct definition is not represented in JSON. However, from-json can re-infer schemas from uniform arrays.

Bytes and Text Format

Bytes now round-trip losslessly through text format using the b"..." literal:

Binary (bytes value) → Decompile → Text (b"..." literal) → Compile → Binary (bytes value)

The decompiler emits b"cafef00d" for bytes values, and the parser reads them back as Value::Bytes.

Ensuring Lossless Round-Trips

Use Binary for Storage

If you need to preserve all TeaLeaf types (refs, tags, maps, timestamps, bytes), keep data in .tlbx:

# Lossless cycle
tealeaf compile data.tl -o data.tlbx
tealeaf decompile data.tlbx -o data.tl
# data.tl preserves all types (except comments)

Use JSON Only for Interop

JSON conversion is for integrating with JSON-based tools. Don’t use it as a primary storage format if your data uses TeaLeaf-specific types.

Verify with CLI

# Compile → JSON two ways, compare
tealeaf to-json data.tl -o from_text.json
tealeaf compile data.tl -o data.tlbx
tealeaf tlbx-to-json data.tlbx -o from_binary.json
# from_text.json and from_binary.json should be identical

Type Preservation Summary

TeaLeaf TypeBinary Round-TripJSON Round-Trip
NullLosslessLossless
BoolLosslessLossless
IntLosslessLossless
UIntLosslessLossless (as number)
FloatLosslessLossless
StringLosslessLossless
BytesLosslessLossy (→ hex string)
ArrayLosslessLossless
ObjectLosslessLossless
MapLosslessLossy (→ array of pairs)
RefLosslessLossy (→ $ref object)
TaggedLosslessLossy (→ $tag/$value object)
TimestampLosslessLossy (→ ISO 8601 string)
SchemasLosslessLost (re-inferred on import)
CommentsLost (stripped)Lost

Architecture Decision Records

This section documents significant architecture decisions made in the TeaLeaf project. Each record captures the context, decision, and consequences of a choice that affects the project’s design or implementation.

ADR Index

ADRTitleStatusDate
ADR-0001Use IndexMap for Insertion Order PreservationAccepted2026-02-05
ADR-0002Fuzzing Architecture and StrategyAccepted2026-02-06
ADR-0003Maximum Nesting Depth Limit (256)Accepted2026-02-06
ADR-0004ZLIB Compression for Binary FormatAccepted2026-02-06

What is an ADR?

An Architecture Decision Record (ADR) is a short document that captures an important architectural decision along with its context and consequences. ADRs help future contributors understand why certain design choices were made, not just what was built.

ADR Lifecycle

Each ADR has one of the following statuses:

  • Proposed — Under discussion, not yet implemented
  • Accepted — Approved and implemented (or in progress)
  • Superseded — Replaced by a newer ADR (linked in the record)
  • Deprecated — No longer applicable due to project changes

ADR-0001: Use IndexMap for Insertion Order Preservation

  • Status: Accepted
  • Date: 2026-02-05
  • Applies to: tealeaf-core, tealeaf-derive, tealeaf-ffi

Context

TeaLeaf’s primary use case is context engineering for LLM applications, where structured data passes through multiple format conversions (JSON → .tl.tlbx and back). Users intentionally order their JSON keys to convey semantic meaning — for example, placing name before description before details to mirror how a human would read the document. Prior to this change, all user-facing maps used HashMap<K, V>, and the text serializer and binary writer explicitly sorted keys alphabetically before output.

This caused two problems:

  1. Semantic ordering was lost. A user who wrote {"zebra": 1, "apple": 2} in their JSON would get {"apple": 2, "zebra": 1} after a round-trip through TeaLeaf. For LLM prompt engineering, this reordering could change how models interpret the context.

  2. Sorting was unnecessary work. Every serialization path (dumps(), compile(), write_value(), to_tl_with_schemas()) collected keys into a Vec, sorted them, and then iterated — adding O(n log n) overhead to every output operation.

Alternatives Considered

ApproachProsCons
Keep HashMap + sort (status quo)Deterministic output, no dependency changeLoses user intent, sorting overhead
Vec of (key, value) pairsOrder preserved, no new dependencyLoses O(1) key lookup, breaks API surface broadly
IndexMapOrder preserved, O(1) lookup, drop-in APISlightly slower decode (insertion cost), new dependency
BTreeMapSorted + deterministicStill not insertion-ordered, lookup O(log n)

Decision

Replace HashMap with IndexMap (from the indexmap crate v2) in all user-facing ordered containers:

  • Value::ObjectObjectMap<String, Value> (type alias for IndexMap)
  • TeaLeaf.data, TeaLeaf.schemas, TeaLeaf.unionsIndexMap<String, _>
  • Parser output, Reader.sections, trait return types → IndexMap

Internal lookup tables stay as HashMap because they don’t need ordering:

  • Writer.string_map, Writer.schema_map, Writer.union_map
  • Reader.schema_map, Reader.union_map, Reader.cache

Additionally:

  • Enable serde_json’s preserve_order feature so JSON parsing also preserves key order
  • Remove all explicit keys.sort() calls from serialization paths
  • Re-export IndexMap and ObjectMap from tealeaf-core so derive macros and downstream crates don’t need a direct indexmap dependency

Consequences

Positive

  • Round-trip fidelity. JSON → TeaLeaf → JSON now preserves the original key order at every level (sections, object fields, schema definitions).
  • Encoding is faster. Removing O(n log n) sort calls from every serialization path yields measurable improvements in encode benchmarks (6–17% for small/medium objects).
  • Simpler serialization code. Serialization loops iterate the map directly instead of collecting-sorting-iterating.
  • Binary format is unchanged. Old .tlbx files remain fully readable. The reader always produces keys in file order, which for old files happens to be alphabetical.

Negative

  • Binary decode is slower. IndexMap::insert() is slower than HashMap::insert() because it maintains a dense insertion-order array alongside the hash table. Benchmarks show +56% to +105% regression for decode-heavy workloads (large arrays of objects, deeply nested structs). For the primary use case (LLM context), this is acceptable because:

    • Documents are typically encoded once and consumed as text (not repeatedly decoded from binary)
    • The absolute times remain in the microsecond-to-millisecond range
    • Encode performance (the more common hot path) improved
  • New dependency. indexmap v2 is a well-maintained, widely-used crate (used by serde_json internally), so supply-chain risk is minimal.

  • Public API change. TeaLeaf::new() now takes IndexMap instead of HashMap. This is a breaking change, mitigated by:

    • The project is in beta (2.0.0-beta.2)
    • From<HashMap<String, Value>> for Value conversion is retained for backward compatibility
    • Downstream code using .get(), .insert(), .iter() works identically

Benchmark Summary

WorkloadEncodeDecode
small_object-16% (faster)
nested_structs-10% to -17% (faster)+56% to +68% (slower)
large_array_10000-5% (faster)+105% (slower)
tabular_5000-69% (faster)-48% (faster)

Note: Tabular workloads use struct-array encoding (columnar), which has fewer per-row IndexMap insertions. The decode regression is concentrated in generic object decoding where each row creates a new ObjectMap with field-by-field inserts.

References

ADR-0002: Fuzzing Architecture and Strategy

  • Status: Accepted
  • Date: 2026-02-06
  • Applies to: tealeaf-core

Context

TeaLeaf is a data format with multiple serialization paths: text parsing, text serialization, binary compilation, binary reading, and JSON import/export. Each path accepts untrusted input in production scenarios (user-supplied .tl files, .tlbx binaries, JSON strings from external APIs). Malformed or adversarial input must never cause undefined behavior, panics in non-roundtrip code paths, or memory safety violations.

The project already had unit tests, canonical fixture tests, and adversarial tests (hand-crafted malformed inputs). However, these approaches have inherent limitations:

  1. Unit/fixture tests are author-biased. They test cases the developer thought of, missing emergent edge cases from format interactions (e.g., deeply nested structures with unicode escapes inside hex-prefixed numbers).

  2. Adversarial tests are finite. The hand-crafted corpus in adversarial-tests/ covers known attack patterns but cannot explore the combinatorial input space.

  3. Round-trip fidelity is hard to test exhaustively. The property “serialize then parse produces the same value” requires testing across all Value variants, nesting depths, and string content — a space too large for manual enumeration.

Alternatives Considered

ApproachProsCons
Property-based testing (proptest/quickcheck)Integrated into cargo test, structure-awareLimited mutation depth, no coverage feedback, deterministic
AFL++Mature, multiple mutation strategiesRequires instrumentation harness, harder CI integration on GitHub Actions
cargo-fuzz (libFuzzer)Native Rust support, coverage-guided, dictionary support, easy CIRequires nightly toolchain, Linux-only
HonggfuzzHardware-assisted coverageLess Rust ecosystem integration, complex setup

Decision

Use cargo-fuzz (libFuzzer) with a three-layer fuzzing strategy:

Layer 1: Byte-level fuzzing (6 targets)

Coverage-guided mutation of raw bytes, testing each attack surface independently:

TargetInputTests
fuzz_parseRaw bytes as TL textParser robustness against arbitrary byte sequences
fuzz_serializeRaw bytes as TL textParse then re-serialize roundtrip fidelity
fuzz_roundtripRaw bytes as TL textFull text → parse → serialize → re-parse → value equality
fuzz_readerRaw bytes as .tlbx binaryBinary reader robustness against malformed files
fuzz_jsonRaw bytes as JSON stringJSON import → TL export → re-import roundtrip
fuzz_json_schemasRaw bytes as JSON stringJSON import with schema inference → roundtrip

Layer 2: Dictionary-guided fuzzing

libFuzzer dictionaries provide grammar-aware tokens that seed the mutation engine, dramatically improving coverage for structured formats where random bytes rarely produce valid syntax:

DictionaryUsed byKey tokens
tl.dictfuzz_parse, fuzz_serialize, fuzz_roundtripKeywords (true, false, null, NaN, inf), directives (@struct, @table, @union), type names, escape sequences, boundary numbers, timestamp patterns
json.dictfuzz_json, fuzz_json_schemasJSON delimiters, escape sequences, surrogate pair markers, serde_json magic strings, boundary numbers

Measured coverage impact (30-second fresh corpus):

TargetWithout dictWith dictImprovement
fuzz_parse1790 edges1922 edges+7.4%
fuzz_json1339 edges1533 edges+14.5%

Layer 3: Structure-aware fuzzing (1 target)

The fuzz_structured target bypasses the parser entirely, generating valid Value trees directly from fuzzer bytes using the arbitrary crate. This tests serialization and binary compilation paths with guaranteed-valid inputs that would take byte-level fuzzers much longer to discover:

  • Bounded recursion (max depth 3) prevents stack overflow
  • 13 Value variants including JsonNumber, Tagged, Ref, Map, Bytes
  • Three roundtrip tests per invocation: text serialize/parse, binary compile/read, JSON no-panic
  • Reaches 2464 coverage edges in just 733 runs (vs thousands of runs for byte-level targets)

Fuzz infrastructure layout

tealeaf-core/fuzz/
  Cargo.toml              # Fuzz workspace with libfuzzer-sys + arbitrary
  fuzz_targets/
    fuzz_parse.rs         # Layer 1: text parser robustness
    fuzz_serialize.rs     # Layer 1: text roundtrip
    fuzz_roundtrip.rs     # Layer 1: full text roundtrip with value equality
    fuzz_reader.rs        # Layer 1: binary reader robustness
    fuzz_json.rs          # Layer 1: JSON import roundtrip
    fuzz_json_schemas.rs  # Layer 1: JSON with schema inference roundtrip
    fuzz_structured.rs    # Layer 3: structure-aware value generation
  dictionaries/
    tl.dict               # Layer 2: TL text format tokens
    json.dict             # Layer 2: JSON format tokens
  corpus/                 # Persistent corpus (per-target subdirectories)
  artifacts/              # Crash artifacts (per-target subdirectories)

CI integration

Fuzz targets run on GitHub Actions ubuntu-latest (2-core, 7 GB RAM) with the following constraints:

  • 120 seconds per target (coverage saturates within ~30 seconds; 120s provides buffer for deeper exploration)
  • Serial execution — targets run one at a time to avoid memory pressure (each can use up to 512 MB RSS)
  • RSS limit: 512 MB per target
  • Dictionary-guided runs for text and JSON targets
  • Nightly Rust toolchain required (libFuzzer instrumentation)
  • Total wall time: ~15 minutes (7 targets × 120s + build overhead)

Value equality semantics

All roundtrip targets use a custom values_equal() function rather than PartialEq to handle expected coercions:

  • Int(n) == UInt(n) when n >= 0 (sign-agnostic integer comparison)
  • JsonNumber(s) == Int(i) when s parses to i (precision-preserving numbers may roundtrip as integers if they fit)
  • Float comparison uses to_bits() for exact bit-level equality (distinguishes +0.0 from -0.0, handles NaN)

Consequences

Positive

  • Discovered real bugs. Fuzzing found a NaN quoting bug (NaN roundtripped as Float(NaN) instead of being preserved through text format) and the precision loss that motivated Value::JsonNumber.
  • Continuous regression detection. CI runs catch regressions in parser/serializer correctness automatically on every push.
  • Coverage-guided exploration. libFuzzer’s coverage feedback explores code paths that hand-written tests miss, particularly in error handling and edge case branches.
  • Dictionary tokens accelerate exploration. Measured 7-14% coverage improvement with dictionaries, at zero runtime cost (dictionaries only seed the mutation engine).
  • Structure-aware fuzzing tests serializer independently. By generating valid Value trees directly, fuzz_structured achieves deep serializer coverage without depending on parser correctness.

Negative

  • Nightly Rust toolchain required. cargo-fuzz requires nightly for -Z flags and sanitizer instrumentation. This is isolated to the fuzz workspace and does not affect the main build.
  • Linux-only. libFuzzer doesn’t support Windows natively. Local fuzzing requires WSL on Windows; CI uses Ubuntu runners.
  • CI time cost. ~15 minutes per run. Acceptable for a post-push check; not suitable for pre-commit.
  • Corpus growth. The persistent corpus grows over time as new coverage-increasing inputs are discovered. Periodic corpus minimization (cargo fuzz cmin) is recommended.

Not covered

  • Protocol-level fuzzing. The FFI boundary (tealeaf-ffi) is not fuzzed directly. FFI functions are thin wrappers around the core library, which is fuzzed.
  • .NET binding fuzzing. The .NET layer is tested through its own test suite and the adversarial harness, but not through libFuzzer.
  • Concurrency testing. All fuzz targets are single-threaded. Thread-safety of Reader (which uses mmap) is tested separately.

References

ADR-0003: Maximum Nesting Depth Limit (256)

  • Status: Accepted
  • Date: 2026-02-06
  • Applies to: tealeaf-core (parser, binary reader)

Context

TeaLeaf accepts untrusted input in production — user-supplied .tl files, .tlbx binaries from external sources, and JSON strings from APIs. Recursive data structures (arrays, objects, maps, tagged values) create call stacks proportional to input nesting depth. Without a limit, an attacker can craft a payload like key: [[[[... with thousands of levels, causing a stack overflow and process termination.

Two constants enforce the limit:

ConstantFileValue
MAX_PARSE_DEPTHparser.rs256
MAX_DECODE_DEPTHreader.rs256

Both constants are set to the same value to ensure text-binary parity: any document that parses successfully from .tl text can also round-trip through .tlbx binary without hitting a different depth ceiling.

The limit is checked at every recursive entry point:

  • Parser: parse_value() — arrays, objects, maps, tuples, tagged values
  • Reader: decode_value(), decode_array(), decode_object(), decode_struct(), decode_struct_array(), decode_map()

When exceeded, both paths return a descriptive error rather than panicking or overflowing the stack.

Ecosystem Comparison

Parser / LibraryDefault Max DepthConfigurable?
TeaLeaf256No (compile-time constant)
serde_json (Rust)128Yes (disable_recursion_limit)
serde_yaml (Rust)128No
System.Text.Json (.NET)64Yes (MaxDepth)
ASP.NET Core (default)32Yes
Jackson (Java)1000 (v2), 500 (v3)Yes
Go encoding/json10,000No
Python json (stdlib)~1,000 (interpreter limit)Via sys.setrecursionlimit
Protocol Buffers (Java/C++)100Yes
Protocol Buffers (Go)10,000Yes
rmp-serde (MessagePack)1,024Yes
CBOR (ciborium, Rust)128Yes
toml (Rust)NoneNo (vulnerable to stack overflow)

Observations

  • Conservative defaults are trending down. Jackson reduced from 1,000 to 500 in v3. .NET defaults to 64. Protocol Buffers targets 100.
  • 128 is the most common Rust ecosystem default (serde_json, serde_yaml, ciborium).
  • No production data format needs > 100 levels. Deeply nested structures indicate either machine-generated intermediate representations or adversarial input.
  • Formats without limits have CVEs. The toml crate’s lack of depth limiting is tracked as an open issue. Python’s reliance on interpreter limits has caused production crashes.

Decision

Set MAX_PARSE_DEPTH and MAX_DECODE_DEPTH to 256.

Why 256 over 128?

TeaLeaf schemas add implicit nesting. A @struct with an array of @struct-typed objects creates 3 levels of nesting (object → array → object) for what the user perceives as one level of structure. With schema compositions, 128 could be reached in complex but legitimate documents. 256 provides a 2x margin above the Rust ecosystem default while remaining well within safe stack bounds.

Why not configurable?

  • Simplicity. A compile-time constant is zero-cost at runtime (no configuration plumbing, no state to manage).
  • Consistent behavior. All TeaLeaf implementations (Rust, FFI, .NET) enforce the same limit. A configurable limit would require coordination across language boundaries.
  • 256 is generous enough. No known use case requires deeper nesting. If a legitimate need arises, the constant can be bumped in a patch release without breaking any public API.

Stack safety margin

On x86-64 Linux with the default 8 MB stack, each recursive call uses roughly 200–400 bytes of stack frame. At 256 depth, the worst case is ~100 KB — well under 2% of the available stack. This leaves ample room for the caller’s own stack frames and for platforms with smaller stacks (e.g., 1 MB thread stacks).

Test Coverage

TestLocationWhat it verifies
test_parse_depth_256_succeedsparser.rs200-level nesting parses successfully
test_fuzz_deeply_nested_arrays_no_stack_overflowparser.rs500-level nesting returns error (no crash)
parse_deep_nesting_okadversarial.rs7-level nesting succeeds in adversarial harness
fuzz_structured depth=3fuzz_structured.rsStructure-aware fuzzer bounds depth to 3
canonical/large_data.tlCanonical suiteDeep nesting fixture round-trips correctly

Consequences

Positive

  • Stack overflow protection. Malicious or malformed input with extreme nesting is rejected with a clear error message instead of crashing the process.
  • Text-binary parity. The same limit in parser and reader means any document that parses from text will also decode from binary, and vice versa.
  • Predictable resource usage. Callers can reason about maximum stack consumption without inspecting input.

Negative

  • Theoretical limitation. Documents with more than 256 levels of nesting are rejected. In practice, no known data format use case requires this depth.
  • Not configurable. Users who need deeper nesting must rebuild from source with a modified constant. This is an intentional trade-off for simplicity.

Neutral

  • No performance cost. The depth check is a single integer comparison per recursive call — unmeasurable relative to the cost of decoding a value.

ADR-0004: ZLIB Compression for Binary Format

  • Status: Accepted
  • Date: 2026-02-06
  • Applies to: tealeaf-core (writer, reader), spec §4.3 and §4.9

Context

The .tlbx binary format compresses individual sections to reduce file size. The implementation has always used ZLIB (deflate) via the flate2 crate. However, the spec contained a contradiction:

  • §4.3 (Header Flags) described the COMPRESS flag as indicating “zstd compression” and required readers to detect compression via the zstd frame magic (0xFD2FB528).
  • §4.9 (Compression) correctly stated the algorithm as “ZLIB (deflate)”.

This contradiction meant a third-party implementation following §4.3 would look for zstd-compressed data that doesn’t exist, while one following §4.9 would work correctly. The spec needed a single, definitive answer.

Decision

Standardize on ZLIB (deflate) as the sole compression algorithm for .tlbx binary format v2.

Why not zstd?

zstd is a superior algorithm in general-purpose benchmarks, but TeaLeaf’s design neutralizes its advantages:

  1. String deduplication removes the most compressible data. The string table deduplicates all strings before compression runs. What remains for the compressor is packed integers, null bitmaps, and string table indices — low-entropy binary data with little redundancy.

  2. Sections are small. The compression threshold is 64 bytes. Most sections are a few hundred bytes to a few KB. At these sizes, zlib and zstd achieve nearly identical compression ratios without dictionaries.

  3. zstd’s dictionary mode doesn’t help here. Dictionary compression — where zstd’s largest advantage lies for small payloads — requires pre-training on representative data. TeaLeaf documents are schema-variable and content-diverse (the primary use case is LLM context engineering with arbitrary structured data). A static dictionary would not generalize across different schemas and data shapes.

  4. The 90% threshold filters aggressively. Sections that don’t compress to under 90% of their original size are stored uncompressed. This threshold means most small sections aren’t compressed at all, making the algorithm choice irrelevant for the majority of sections.

  5. Decompression speed is irrelevant at this scale. zstd decompresses 3-5x faster than zlib, but a few-hundred-byte section decompresses in microseconds with either algorithm. The difference is unmeasurable in practice.

Why zlib?

  1. Universal availability. ZLIB/deflate is implemented in every language’s standard library or a widely-available package. zstd requires an additional native dependency in most ecosystems.

  2. No breaking change. Every .tlbx file ever produced uses zlib. Switching would require either a format version bump (breaking all existing files) or dual-algorithm detection logic (complexity for every implementation).

  3. Simpler for third-party implementations. One algorithm, no magic-byte detection, no conditional dependency. A conformant reader needs only zlib decompression.

  4. Compression is not the primary size reduction strategy. TeaLeaf’s token efficiency comes from the text format’s conciseness and the binary format’s schema-aware encoding (struct arrays, string deduplication, type-specific packing). Compression is a secondary optimization applied on top.

Spec Changes

SectionBeforeAfter
§4.3 (Header Flags)“zstd compression”, “zstd frame magic”“ZLIB (deflate) compression”, per-section flag detection
§4.9 (Compression)Already correct (“ZLIB (deflate)”)No change

Consequences

Positive

  • Spec is internally consistent. §4.3 and §4.9 now agree on ZLIB.
  • Third-party interop is unambiguous. Implementers need one algorithm, clearly documented.
  • No migration required. All existing .tlbx files remain valid.

Negative

  • Foregoes zstd’s speed advantage. In workloads with large sections (tens of KB+), zstd would decompress faster. TeaLeaf’s current section sizes don’t reach this threshold.

Neutral

  • Future versions can reconsider. If TeaLeaf v3 introduces large-section use cases (e.g., embedded binary blobs), zstd could be adopted with a format version bump. This ADR applies to binary format v2 only.

Architecture

High-level architecture of the TeaLeaf project.

Crate Structure

tealeaf/
├── tealeaf-core/          # Core library + CLI
│   ├── src/
│   │   ├── main.rs        # CLI entry point
│   │   ├── lib.rs         # Public API (TeaLeaf, Value, Schema, traits)
│   │   ├── reader.rs      # Binary file reader
│   │   ├── writer.rs      # Binary file writer (compiler)
│   │   ├── builder.rs     # TeaLeafBuilder fluent API
│   │   └── convert.rs     # ToTeaLeaf/FromTeaLeaf trait impls for primitives
│   └── tests/
│       ├── canonical.rs   # Canonical fixture tests
│       └── derive.rs      # Derive macro tests
│
├── tealeaf-derive/        # Proc-macro crate
│   ├── lib.rs             # Macro entry points
│   ├── attrs.rs           # Attribute parsing
│   ├── to_tealeaf.rs      # ToTeaLeaf derive implementation
│   ├── from_tealeaf.rs    # FromTeaLeaf derive implementation
│   ├── schema.rs          # Schema generation logic
│   └── util.rs            # Shared utilities
│
├── tealeaf-ffi/           # C FFI layer
│   ├── src/lib.rs         # All FFI exports
│   └── build.rs           # cbindgen header generation
│
├── bindings/dotnet/       # .NET bindings
│   ├── TeaLeaf.Annotations/   # Attribute definitions
│   ├── TeaLeaf.Generators/    # Source generator
│   ├── TeaLeaf/               # Managed wrappers + serializer
│   └── TeaLeaf.Tests/         # Test project
│
├── canonical/             # Canonical test fixtures
│   ├── samples/           # .tl text files
│   ├── expected/          # Expected .json outputs
│   ├── binary/            # Pre-compiled .tlbx files
│   └── errors/            # Invalid files for error testing
│
└── spec/                  # Format specification
    └── TEALEAF_SPEC.md

Data Flow

Parse Pipeline

Text input (.tl)
    │
    ▼
Lexer → Token stream
    │
    ▼
Parser → AST (directives + key-value pairs)
    │
    ├── Schema definitions → IndexMap<String, Schema>
    ├── Reference definitions → resolved inline
    └── Key-value pairs → IndexMap<String, Value>
    │
    ▼
TeaLeaf { schemas, data }

Compile Pipeline

TeaLeaf { schemas, data }
    │
    ▼
String collector → String table (deduplicated)
    │
    ▼
Schema encoder → Schema table (binary)
    │
    ▼
Value encoder → Data sections (per key)
    │    │
    │    ├── Primitives → fixed-size encoding
    │    ├── Strings → string table index (u32)
    │    ├── Struct arrays → null bitmap + positional values
    │    └── Other → type-tagged encoding
    │
    ▼
Compressor (per section, if > 64 bytes)
    │
    ▼
Writer → .tlbx file
    ├── Header (64 bytes)
    ├── String table
    ├── Schema table
    ├── Section index
    └── Data sections

Read Pipeline

.tlbx file
    │
    ▼
Reader (or MmapReader)
    │
    ├── Header validation (magic, version)
    ├── String table → lazy access
    ├── Schema table → lazy access
    └── Section index → key → offset mapping
    │
    ▼
Value access (by key)
    │
    ├── Locate section in index
    ├── Decompress if needed
    ├── Decode value by type code
    └── Return Value enum

Key Design Decisions

Positional Schema Encoding

Field names appear only in the schema table. Data rows use position to identify fields. This trades readability of binary for compactness.

Per-Section Compression

Each top-level key is a separate section compressed independently. This allows:

  • Random access without decompressing the entire file
  • Selective decompression (only read sections you need)

Thread-Local Error Handling (FFI)

The FFI uses thread-local storage for error messages instead of out-parameters or exceptions. This simplifies the C API while remaining thread-safe.

Source Generator vs Reflection

The .NET binding offers both approaches because:

  • Source generators produce optimal code but require partial classes
  • Reflection works with any type but is slower
  • Both share the same native library for actual encoding/decoding

Insertion Order Preservation (IndexMap)

All user-facing maps use IndexMap instead of HashMap to preserve insertion order across format conversions. Internal lookup tables (string interning, schema/union resolution, caches) remain HashMap for performance. See ADR-0001 for the full decision record including benchmark impact.

No Schema Versioning

TeaLeaf deliberately avoids schema evolution machinery. The rationale:

  • Simpler implementation and specification
  • Source file is always the truth
  • Recompilation is explicit and deterministic
  • Applications that need evolution can layer it on top

Binary Encoding Details

Deep dive into how values are encoded in the .tlbx binary format.

Encoding Strategy

The encoder selects the encoding strategy based on value type and context:

Top-Level Values

Each top-level key-value pair becomes a section in the binary file. The section’s type code and flags determine how to decode it.

Primitive Encoding

TypeEncodingSize
NullNothing (type code alone)0 bytes
Bool0x00 or 0x011 byte
Int8Signed byte1 byte
Int162 bytes, little-endian2 bytes
Int324 bytes, little-endian4 bytes
Int648 bytes, little-endian8 bytes
UInt8-64Same as signed, unsigned1-8 bytes
Float32IEEE 754, little-endian4 bytes
Float64IEEE 754, little-endian8 bytes
Stringu32 string table index4 bytes
Bytesvarint length + raw datavariable
Timestampi64 Unix ms + i16 tz offset (minutes), LE10 bytes

Integer Size Selection

The writer automatically selects the smallest representation:

Value::Int(42)      → Int8 (1 byte)     // fits in i8
Value::Int(1000)    → Int16 (2 bytes)   // fits in i16
Value::Int(100000)  → Int32 (4 bytes)   // fits in i32
Value::Int(5×10⁹)  → Int64 (8 bytes)   // needs i64

Struct Array Encoding

The most optimized encoding path is for arrays of schema-typed objects:

┌──────────────────────┐
│ Count: u32           │  Number of rows
│ Schema Index: u16    │  Which schema these rows follow
│ Null Bitmap Size: u16│  Bytes per row for null tracking
├──────────────────────┤
│ Row 0:               │
│   Null Bitmap: [u8]  │  One bit per field (1 = null)
│   Field 0 data       │  Only if not null
│   Field 1 data       │  Only if not null
│   ...                │
├──────────────────────┤
│ Row 1:               │
│   Null Bitmap: [u8]  │
│   Field data...      │
├──────────────────────┤
│ ...                  │
└──────────────────────┘

Null Bitmap

  • Size: ceil((field_count + 7) / 8) bytes per row
  • Bit i set = field i is null
  • Only non-null fields have data written

For a schema with 5 fields, the bitmap is 1 byte. If bit 2 is set, field 2 is null and its data is skipped.

Field Data

Each non-null field is encoded according to its schema type:

  • Primitive types: fixed-size encoding
  • String: u32 string table index
  • Nested struct: recursively encoded fields (with their own null bitmap)
  • Array field: count + typed elements

Homogeneous Array Encoding

Top-level arrays use homogeneous (packed) encoding only for two types:

Integer Arrays (i32 only)

All elements must be Value::Int and fit within the i32 range (-2³¹ to 2³¹ - 1). Integer arrays where any value exceeds i32 fall through to heterogeneous encoding.

Count: u32
Element Type: 0x04 (Int32)
Elements: [i32 × Count]  -- packed, no type tags

String Arrays

Count: u32
Element Type: 0x10 (String)
Elements: [u32 × Count]  -- string table indices

All Other Top-Level Arrays

Arrays of UInt, Bool, Float, Timestamp, Int64 (values exceeding i32), and mixed-type arrays all use heterogeneous encoding (see below). This keeps the top-level format simple for third-party implementations.

Schema-Typed Field Arrays

Arrays within struct fields are a separate case — they use homogeneous encoding for their schema-declared type, regardless of the top-level restrictions:

Count: u32
Element Type: u8 (from schema field type)
Elements: [packed data]

Heterogeneous Array Encoding

For mixed-type arrays and all top-level arrays not covered by Int32/String homogeneous encoding:

Count: u32
Element Type: 0xFF (heterogeneous marker)
Elements: [
  type: u8, data,
  type: u8, data,
  ...
]

Each element carries its own type tag.

Object Encoding

Field Count: u16
Fields: [
  key_idx: u32    (string table index)
  type: u8        (value type code)
  data: [...]     (type-specific encoding)
]

Objects are the untyped key-value container. Unlike struct arrays, each field carries its name and type.

Map Encoding

Count: u32
Entries: [
  key_type: u8,    key_data: [...],
  value_type: u8,  value_data: [...],
]

Both keys and values carry type tags.

Reference Encoding

name_idx: u32    (string table index for the reference name)

A reference is just a string table pointer to the target name.

Tagged Value Encoding

tag_idx: u32     (string table index for the tag name)
value_type: u8   (type code of the inner value)
value_data: [...]  (type-specific encoding of the inner value)

Varint Encoding

Used for bytes length:

Value: 300 (0x012C)
Encoded: 0xAC 0x02

Bit layout:
  0xAC = 1_0101100  → continuation bit set, value bits: 0101100 (44)
  0x02 = 0_0000010  → no continuation, value bits: 0000010 (2)

  Result: 44 + (2 << 7) = 44 + 256 = 300
  • Continuation bit: 0x80 – if set, more bytes follow
  • 7 value bits per byte
  • Least-significant group first

Compression

Applied per section:

  1. Check if uncompressed size > 64 bytes
  2. Compress with ZLIB (deflate)
  3. If compressed size < 90% of original, use compressed version
  4. Set compression flag in section index entry
  5. Store both size (compressed) and uncompressed_size in the index

String Table

The string table is a core component of the binary format that provides string deduplication.

Purpose

In a typical document with 1,000 user records, field values like "active", "Engineering", or city names repeat frequently. Without deduplication, each occurrence stores the full string. The string table stores each unique string once and uses 4-byte indices everywhere else.

Structure

┌─────────────────────────────┐
│ Total Size: u32              │  Size of the entire string table section
│ Count: u32                   │  Number of unique strings
├─────────────────────────────┤
│ Offsets: [u32 × Count]       │  Byte offset of each string in the data section
│ Lengths: [u32 × Count]       │  Length of each string (up to 4 GB)
├─────────────────────────────┤
│ String Data: [u8...]         │  Concatenated UTF-8 string data
└─────────────────────────────┘

How It Works

During Compilation

  1. The writer traverses all values in the document
  2. Every unique string is collected (keys, string values, schema names, field names, ref names, tag names)
  3. Duplicates are eliminated
  4. Each string gets an index (0, 1, 2, …)
  5. The string table is written first in the file
  6. All subsequent encoding uses indices instead of raw strings

During Reading

  1. The reader loads the string table at startup
  2. When decoding a string value, it reads a u32 index
  3. The index maps to an offset and length in the string data
  4. The string is read from the data section

Lookup Performance

String table access is O(1) by index:

index → offsets[index] → offset in data section
index → lengths[index] → number of bytes to read
string = data[offset..offset+length]

Size Impact

Example: 1,000 Users with 5 Fields

Without deduplication:

  • Field names repeated 1,000 times each
  • Common values (“active”, “Engineering”) repeated many times
  • Estimated overhead: ~20-30 KB just for repeated strings

With string table:

  • Each unique string stored once
  • References are 4 bytes each
  • Estimated savings: 60-80% on string data

Extreme Case: Large Tabular Data

For 10,000 rows with 10 fields, field names alone would consume:

ApproachField Name Storage
JSON (per-field)~10 × 10,000 × avg(8 bytes) = ~800 KB
TeaLeaf (string table)10 × avg(8 bytes) + 100,000 × 4 bytes = ~400 KB
TeaLeaf with schema10 × avg(8 bytes) = ~80 bytes (field names in schema only!)

With schema-typed data, field names appear only in the schema table – the string table contains only the actual string values.

What Gets Deduplicated

String SourceDeduplicated?
Top-level key namesYes
Object field namesYes
String valuesYes
Schema namesYes
Schema field namesYes
Reference namesYes
Tag namesYes

Maximum String Length

String lengths are stored as u32, supporting individual strings up to ~4 GB. The total string table size (all strings + metadata) is also capped at u32::MAX by the table’s Size header field.

Interaction with Compression

The string table itself is not compressed (it’s needed for decoding). However, data sections that reference the string table benefit doubly:

  • String references are 4 bytes (already compact)
  • ZLIB compression can further compress repetitive index patterns

Implementation Note

The string table uses a HashMap<String, u32> during compilation for O(1) dedup lookups. The final table is written as parallel arrays (offsets + lengths + data) for O(1) indexed access during reading.

Schema Inference

TeaLeaf can automatically infer schemas from JSON arrays of uniform objects. This page explains the algorithm.

When Schema Inference Runs

Schema inference is triggered by:

  • tealeaf from-json CLI command
  • tealeaf json-to-tlbx CLI command
  • TeaLeaf::from_json_with_schemas() Rust API

It is not triggered by:

  • TeaLeaf::from_json() (plain import, no schemas)
  • TLDocument.FromJson() (.NET API – plain import)

Algorithm

Step 1: Array Detection

Scan top-level JSON values for arrays where all elements are objects with identical key sets:

{
  "users": [           // ← Candidate: array of uniform objects
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"}
  ],
  "tags": ["a", "b"],  // ← Not candidate: array of strings
  "config": {...}       // ← Not candidate: not an array
}

Step 2: Name Inference

The schema name is derived from the parent key by singularization:

KeyInferred Schema Name
"users"user
"products"product
"employees"employee
"addresses"address
"data"data (already singular)
"items_list"items_list (compound, kept as-is)

Basic singularization rules:

  • Remove trailing s if the word doesn’t end in ss
  • Remove trailing es for -es words
  • Remove trailing iesy

Step 3: Type Inference

For each field, scan all array elements to determine the type:

JSON Values SeenInferred TeaLeaf Type
All integersint
All numbers (mixed int/float)float
All stringsstring
All booleansbool
All objects (uniform keys)Nested struct reference
All arraysInferred element type
Mixed typesstring (fallback)

Step 4: Nullable Detection

If any element has null for a field, that field becomes nullable:

[
  {"id": 1, "email": "alice@ex.com"},
  {"id": 2, "email": null}           // ← email becomes string?
]

Step 5: Nested Schema Inference

If a field’s value is an object across all array elements, and those objects have identical keys, a nested schema is created:

{
  "users": [
    {"name": "Alice", "address": {"city": "Seattle", "zip": "98101"}},
    {"name": "Bob", "address": {"city": "Austin", "zip": "78701"}}
  ]
}

Inferred schemas:

@struct address (city: string, zip: string)
@struct user (address: address, name: string)

This is recursive – nested objects can have their own nested schemas.

Output

The inferred schemas are:

  1. Added to the document as @struct definitions
  2. The original JSON arrays are converted to @table tuples
  3. Written in the output file before the data

Example

Input JSON:

{
  "products": [
    {"id": 1, "name": "Widget", "price": 9.99, "in_stock": true},
    {"id": 2, "name": "Gadget", "price": 24.99, "in_stock": false}
  ]
}

Output TeaLeaf:

@struct product (id: int, in_stock: bool, name: string, price: float)

products: @table product [
  (1, true, Widget, 9.99),
  (2, false, Gadget, 24.99),
]

Limitations

  1. Field order – JSON objects have no guaranteed order. Fields are sorted alphabetically in the inferred schema.

  2. Type ambiguity – JSON numbers don’t distinguish int from float. If any element has a decimal, the field becomes float.

  3. Non-uniform arrays – arrays where objects have different key sets are not schema-inferred. They remain as plain arrays of objects.

  4. Deeply nested arrays – only the first level of array → schema inference is applied. Nested arrays within objects are not auto-inferred.

  5. No timestamp detection – ISO 8601 strings in JSON remain as strings, not timestamps.

Testing

TeaLeaf has a comprehensive test suite spanning the Rust core, FFI layer, and .NET bindings.

Test Structure

tealeaf/
├── tealeaf-core/tests/
│   ├── canonical.rs          # Canonical fixture round-trip tests
│   └── derive.rs             # Derive macro tests
│
├── tealeaf-ffi/src/lib.rs    # FFI safety tests (inline #[cfg(test)])
│
├── bindings/dotnet/
│   ├── TeaLeaf.Tests/        # .NET unit tests
│   └── TeaLeaf.Generators.Tests/  # Source generator tests
│
└── canonical/                # Shared test fixtures
    ├── samples/              # .tl text files (14 canonical samples)
    ├── expected/             # Expected .json outputs
    ├── binary/               # Pre-compiled .tlbx files
    └── errors/               # Invalid files for error testing

Running Tests

Rust

# All Rust tests
cargo test --workspace

# Core tests only
cargo test --package tealeaf-core

# Derive macro tests
cargo test --package tealeaf-core --test derive

# Canonical fixture tests
cargo test --package tealeaf-core --test canonical

# FFI tests
cargo test --package tealeaf-ffi

.NET

cd bindings/dotnet
dotnet test

Everything

# Rust
cargo test --workspace

# .NET
cd bindings/dotnet && dotnet test

Canonical Test Fixtures

The canonical/ directory contains 14 sample files that test every feature:

SampleFeatures Tested
primitivesAll primitive types (bool, int, float, string, null)
arraysSimple and nested arrays
objectsNested objects
schemas@struct definitions and @table usage
nested_schemasStruct-referencing-struct
deep_nestingMulti-level struct nesting
nullableNullable fields with ~ values
maps@map with various key types
references!ref definitions and usage
tagged:tag value tagged values
timestampsISO 8601 timestamp parsing
mixedCombination of multiple features
commentsComment handling
stringsQuoted, unquoted, multiline strings

Each sample has:

  • canonical/samples/{name}.tl – the text source
  • canonical/expected/{name}.json – expected JSON output
  • canonical/binary/{name}.tlbx – pre-compiled binary

Canonical Test Pattern

#![allow(unused)]
fn main() {
#[test]
fn test_canonical_sample() {
    let tl = TeaLeaf::load("canonical/samples/primitives.tl").unwrap();

    // Round-trip: text → binary → text
    let tmp = tempfile::NamedTempFile::new().unwrap();
    tl.compile(tmp.path(), true).unwrap();
    let reader = Reader::open(tmp.path()).unwrap();

    // Verify values match
    assert_eq!(reader.get("count").unwrap().as_int(), Some(42));

    // JSON output matches expected
    let json = tl.to_json().unwrap();
    let expected = std::fs::read_to_string("canonical/expected/primitives.json").unwrap();
    assert_json_eq(&json, &expected);
}
}

Error Fixtures

The canonical/errors/ directory contains intentionally invalid files:

FileError Tested
Invalid syntaxParser error handling
Missing struct@table references undefined schema
Type mismatchesSchema validation
Malformed binaryBinary reader error handling

Derive Macro Tests

Tests for #[derive(ToTeaLeaf, FromTeaLeaf)]:

  • Basic struct serialization/deserialization
  • All attribute combinations (rename, skip, optional, type, flatten, default)
  • Nested structs
  • Enum variants
  • Collection types (Vec, HashMap, IndexMap, Option)
  • Edge cases (empty structs, single-field structs)

.NET Test Categories

The .NET test suite covers:

Source Generator Tests

  • Schema generation for all type combinations
  • Serialization output (text, JSON, binary)
  • Deserialization from documents
  • Nested types and collections
  • Enum handling
  • Attribute processing

Reflection Serializer Tests

  • Generic serialization/deserialization
  • Type mapping accuracy
  • Nullable handling
  • Dictionary and List support

Native Type Tests

  • TLDocument lifecycle (parse, access, dispose)
  • TLValue type accessors
  • TLReader binary access
  • Schema introspection
  • Error handling (disposed objects, missing keys)

DTO Serialization Tests

  • Full round-trip (C# object → TeaLeaf → C# object)
  • Edge cases (empty strings, nulls, large numbers)
  • Collection serialization

Test Philosophy

  1. Canonical fixtures – shared across Rust and .NET, ensuring format consistency
  2. Round-trip testing – text → binary → text verifies no data loss
  3. JSON equivalence – text → JSON and binary → JSON produce identical output
  4. Error coverage – every error path has at least one test
  5. Cross-language – same fixtures tested in Rust, .NET, and via FFI

Benchmarks

TeaLeaf includes a Criterion-based benchmark suite that measures encode/decode performance and output size across multiple serialization formats.

Running Benchmarks

# Run all benchmarks
cargo bench -p tealeaf-core

# Run a specific scenario
cargo bench -p tealeaf-core -- small_object
cargo bench -p tealeaf-core -- large_array_1000
cargo bench -p tealeaf-core -- tabular_5000

# List available benchmarks
cargo bench -p tealeaf-core -- --list

Results are saved to target/criterion/ with HTML reports and JSON data. Criterion tracks historical performance across runs.

Formats Compared

Each scenario benchmarks encode and decode across six formats:

FormatLibraryNotes
TeaLeaf ParsetealeafText parsing (.tl → in-memory)
TeaLeaf BinarytealeafBinary compile/read (.tlbx)
JSONserde_jsonStandard JSON serialization
MessagePackrmp_serdeBinary, schemaless
CBORciboriumBinary, schemaless
ProtobufprostBinary with generated code from .proto definitions

Note: Protobuf benchmarks use prost with code generation via build.rs. The generated structs have known field offsets at compile time, giving Protobuf a structural speed advantage over TeaLeaf’s dynamic key-based access.

Benchmark Scenarios

GroupData ShapeSizesWhat It Tests
small_objectConfig-like object1Header overhead, small payload efficiency
large_array_100Array of Point structs100Array encoding at small scale
large_array_1000Array of Point structs1,000Array encoding at medium scale
large_array_10000Array of Point structs10,000Array encoding at large scale, throughput
nested_structsNested objects2 levelsNesting overhead
nested_structs_100Nested objects100 levelsDeep nesting scalability
mixed_typesHeterogeneous data1Strings, numbers, booleans mixed
tabular_100@table User records100Schema-bound tabular data, small
tabular_1000@table User records1,000Schema-bound tabular data, medium
tabular_5000@table User records5,000Schema-bound tabular data, large

Each group measures both encode (serialize) and decode (deserialize) operations, using Throughput::Elements for per-element metrics on scaled scenarios.

Size Comparison Results

From cargo run --example size_report on tealeaf-core:

FormatSmall Object10K Points1K Users
JSON1.00x1.00x1.00x
Protobuf0.38x0.65x0.41x
MessagePack0.35x0.63x0.38x
TeaLeaf Binary3.56x0.15x0.47x

Key observations:

  • Small objects: TeaLeaf has a 64-byte header overhead. For objects under ~200 bytes, JSON or MessagePack are more compact.
  • Large arrays: String deduplication and schema-based compression produce 6-7x better compression than JSON for 10K+ records.
  • Tabular data: @table encoding with positional storage is competitive with Protobuf, with the advantage of embedded schemas.

Speed Characteristics

TeaLeaf’s dynamic key-based access is ~2-5x slower than Protobuf’s generated code:

OperationTeaLeafProtobufJSON (serde)
Parse textModerateN/AFast
Decode binaryModerateFastN/A
Random key accessO(1) hashO(1) fieldO(n) parse

Why TeaLeaf is slower than Protobuf:

  1. Dynamic dispatch – fields resolved by name at runtime; Protobuf uses generated code with known offsets
  2. String table lookup – each string access requires a table lookup
  3. Schema resolution – schema structure parsed from binary at load time

When this matters:

  • Hot loops decoding millions of records → consider Protobuf
  • Cold reads or moderate throughput → TeaLeaf is fine
  • Size-constrained transmission → TeaLeaf’s smaller binary compensates for slower decode

Code Structure

tealeaf-core/benches/
├── benchmarks.rs          # Entry point: criterion_group + criterion_main
├── common/
│   ├── mod.rs             # Module exports
│   ├── data.rs            # Test data generation functions
│   └── structs.rs         # Rust struct definitions (serde-compatible)
└── scenarios/
    ├── mod.rs             # Module exports
    ├── small_object.rs    # Small config object benchmarks
    ├── large_array.rs     # Scaled array benchmarks (100-10K)
    ├── nested_structs.rs  # Nesting depth benchmarks (2-100)
    ├── mixed_types.rs     # Heterogeneous data benchmarks
    └── tabular_data.rs    # @table User record benchmarks (100-5K)

Each scenario module exports bench_encode and bench_decode functions. Scaled scenarios accept a size parameter.

For optimization tips and practical guidance on when to use each format, see Performance.

Accuracy Benchmark

The accuracy benchmark suite evaluates LLM providers’ ability to analyze structured data in TeaLeaf format. It sends analysis prompts with TeaLeaf-formatted business data to multiple providers and scores the responses.

Overview

The workflow:

  1. Takes JSON data from various business domains
  2. Converts it to TeaLeaf format using tealeaf-core
  3. Sends analysis prompts to multiple LLM providers
  4. Evaluates and compares the responses using a scoring framework

Supported Providers

ProviderEnvironment VariableModel
AnthropicANTHROPIC_API_KEYClaude Sonnet 4.5 (Extended Thinking)
OpenAIOPENAI_API_KEYGPT-5.2

Installation

Pre-built Binaries

Download the latest release from GitHub Releases:

PlatformArchitectureFile
Windowsx64accuracy-benchmark-windows-x64.zip
WindowsARM64accuracy-benchmark-windows-arm64.zip
macOSIntelaccuracy-benchmark-macos-x64.tar.gz
macOSApple Siliconaccuracy-benchmark-macos-arm64.tar.gz
Linuxx64accuracy-benchmark-linux-x64.tar.gz
LinuxARM64accuracy-benchmark-linux-arm64.tar.gz
Linuxx64 (static)accuracy-benchmark-linux-musl-x64.tar.gz

Build from Source

cargo build -p accuracy-benchmark --release

# Or run directly
cargo run -p accuracy-benchmark -- --help

Usage

# Run with all available providers
cargo run -p accuracy-benchmark -- run

# Run with specific providers
cargo run -p accuracy-benchmark -- run --providers anthropic,openai

# Run specific categories only
cargo run -p accuracy-benchmark -- run --categories finance,retail

# Compare TeaLeaf vs JSON format performance
cargo run -p accuracy-benchmark -- run --compare-formats

# Verbose output
cargo run -p accuracy-benchmark -- -v run

# List available tasks
cargo run -p accuracy-benchmark -- list-tasks

# Generate configuration template
cargo run -p accuracy-benchmark -- init-config -o my-config.json

Benchmark Tasks

The suite includes 12 tasks across 10 business domains:

IDDomainComplexityOutput Type
FIN-001FinanceSimpleCalculation
FIN-002FinanceModerateCalculation
RET-001RetailSimpleSummary
RET-002RetailComplexRecommendation
HLT-001HealthcareSimpleSummary
TEC-001TechnologyModerateAnalysis
MKT-001MarketingModerateCalculation
LOG-001LogisticsModerateAnalysis
HR-001HRModerateAnalysis
MFG-001ManufacturingModerateCalculation
RE-001Real EstateComplexRecommendation
LEG-001LegalComplexAnalysis

Data Sources

Each task specifies input data in one of two ways:

Inline JSON:

#![allow(unused)]
fn main() {
BenchmarkTask::new("FIN-001", "finance", "Analyze this data:\n\n{tl_data}")
    .with_json_data(serde_json::json!({
        "revenue": 1000000,
        "expenses": 750000
    }))
}

JSON file reference:

#![allow(unused)]
fn main() {
BenchmarkTask::new("LOG-001", "logistics", "Analyze this data:\n\n{tl_data}")
    .with_json_file("tasks/logistics/data/shipments.json")
}

The {tl_data} placeholder in the prompt template is replaced with TeaLeaf-formatted data before sending to the LLM.

Analysis Framework

Accuracy Metrics

Responses are evaluated across five dimensions:

MetricWeightDescription
Completeness25%Were all expected elements addressed?
Relevance25%How relevant is the response to the task?
Coherence20%Is the response well-structured?
Factual Accuracy20%Do values match validation patterns?
Actionability10%For recommendations – are they actionable?

Element Detection

Each task defines expected elements that should appear in the response:

#![allow(unused)]
fn main() {
// Keyword presence check
.expect("metric", "Total revenue calculation", true)

// Regex pattern validation
.expect_with_pattern("metric", "Percentage value", true, r"\d+\.?\d*%")
}
  • Without pattern: checks for keyword presence from description
  • With pattern: validates using regex (e.g., \$[\d,]+ for dollar amounts)

Scoring Rubrics

Different rubrics apply based on output type:

Output TypeKey Criteria
CalculationNumeric content (5+ numbers), structured output
AnalysisDepth, structure, evidence with data
RecommendationActionable language, prioritization, justification
SummaryCompleteness, conciseness, organization

Coherence Checks

  • Structure markers: ##, ###, **, -, numbered lists
  • Paragraph breaks (3+ paragraphs preferred)
  • Reasonable length (100-2000 words)

Actionability Keywords

For recommendation tasks, these keywords are detected:

  • recommend, should, suggest, consider, advise
  • action, implement, improve, optimize, prioritize
  • next step, immediate, critical, important

Format Comparison Results

Run with --compare-formats to compare TeaLeaf vs JSON input efficiency.

Sample run from February 5, 2026 with Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) and GPT-5.2 (gpt-5.2-2025-12-11):

ProviderTeaLeaf ScoreJSON ScoreAccuracy DiffTeaLeaf InputJSON InputInput Savings
anthropic0.9880.978+0.0105,7938,275-30.0%
openai0.9010.899+0.0024,8687,089-31.3%

Input tokens = data sent to the model. Output tokens vary by model verbosity.

Key findings:

ProviderAccuracyData Token Efficiency
anthropicComparable (+1.0%)TeaLeaf uses ~36% fewer data tokens
openaiComparable (+0.2%)TeaLeaf uses ~36% fewer data tokens

TeaLeaf data payloads use ~36% fewer tokens than equivalent JSON (median across 12 tasks, validated with tiktoken). Total input savings are ~30% because shared instruction text dilutes the data-only difference. Savings increase with larger and more structured datasets.

Sample Results: Reference benchmark results are available in accuracy-benchmark/results/sample/ in the repository.

Output Files

Results are saved in two formats:

TeaLeaf Format (analysis.tl)

# Accuracy Benchmark Results
# Generated: 2026-02-05 15:29:42 UTC

run_metadata: {
    run_id: "20260205-152419",
    started_at: 2026-02-05T15:24:19Z,
    completed_at: 2026-02-05T15:29:42Z,
    total_tasks: 12,
    providers: [anthropic, openai]
}

responses: @table api_response [
    (FIN-001, openai, "gpt-5.2-2025-12-11", 315, 490, 6742, 2026-02-05T15:24:38Z, success),
    (FIN-001, anthropic, "claude-sonnet-4-5-20250929", 396, 1083, 12309, 2026-02-05T15:24:31Z, success),
    ...
]

analysis_results: @table analysis_result [
    (FIN-001, openai, 0.667, 0.625, 0.943, 0.000),
    (FIN-001, anthropic, 1.000, 1.000, 1.000, 1.000),
    ...
]

comparisons: @table comparison_result [
    (FIN-001, [anthropic, openai], anthropic, 0.389),
    (RET-001, [anthropic, openai], anthropic, 0.047),
    ...
]

summary: {
    total_tasks: 12,
    wins: { anthropic: 11, openai: 1 },
    avg_scores: { anthropic: 0.988, openai: 0.901 },
    by_category: { ... },
    by_complexity: { ... }
}

JSON Summary (summary.json)

{
  "run_id": "20260205-152419",
  "total_tasks": 12,
  "provider_rankings": [
    { "provider": "anthropic", "wins": 11, "avg_score": 0.988 },
    { "provider": "openai", "wins": 1, "avg_score": 0.901 }
  ],
  "category_breakdown": {
    "retail": { "leader": "anthropic", "margin": 0.111 },
    "finance": { "leader": "anthropic", "margin": 0.197 },
    ...
  },
  "detailed_results_file": "analysis.tl"
}

Adding Custom Tasks

From JSON Data

#![allow(unused)]
fn main() {
BenchmarkTask::new(
    "CUSTOM-001",
    "custom_category",
    "Analyze this data:\n\n{tl_data}\n\nProvide summary and recommendations."
)
.with_json_file("tasks/custom/data/my_data.json")
.with_complexity(Complexity::Moderate)
.with_output_type(OutputType::Analysis)
.expect("summary", "Data overview", true)
.expect_with_pattern("metric", "Total value", true, r"\d+")
}

From TeaLeaf File

cargo run -p accuracy-benchmark -- run --tasks path/to/tasks.tl

Extending Providers

Implement the LLMProvider trait:

#![allow(unused)]
fn main() {
#[async_trait]
impl LLMProvider for NewProviderClient {
    fn name(&self) -> &str { "newprovider" }

    async fn complete(&self, request: CompletionRequest) -> ProviderResult<CompletionResponse> {
        // Implementation
    }
}
}

Then register in src/providers/mod.rs via create_all_providers() and create_providers().

Directory Structure

accuracy-benchmark/
├── src/
│   ├── main.rs           # CLI interface (clap)
│   ├── lib.rs            # Library exports
│   ├── config.rs         # Configuration management
│   ├── providers/        # LLM provider clients
│   │   ├── traits.rs     # LLMProvider trait
│   │   ├── anthropic.rs  # Claude implementation
│   │   └── openai.rs     # GPT implementation
│   ├── tasks/            # Task definitions
│   │   ├── mod.rs        # BenchmarkTask, DataSource
│   │   ├── categories.rs # Domain, Complexity, OutputType
│   │   └── loader.rs     # TeaLeaf file loader
│   ├── runner/           # Execution engine
│   │   ├── executor.rs   # Parallel task execution
│   │   └── rate_limiter.rs
│   ├── analysis/         # Response analysis
│   │   ├── metrics.rs    # AccuracyMetrics
│   │   ├── scoring.rs    # ScoringRubric
│   │   └── comparator.rs # Cross-provider comparison
│   └── reporting/        # Output generation
│       └── tl_writer.rs  # TeaLeaf format output
├── tasks/                # Sample data by domain
│   ├── finance/data/
│   ├── healthcare/data/
│   ├── retail/data/
│   └── ...
├── results/runs/         # Archived run results
└── Cargo.toml

Adversarial Tests

The adversarial test suite validates TeaLeaf’s error handling and robustness using crafted malformed inputs, binary corruption, compression edge cases, and large-corpus stress tests. All tests are isolated in the adversarial-tests/ directory to avoid touching core project files.

Current count: 58 tests across 9 categories.

Running Tests

# Run all adversarial tests
cd adversarial-tests/core-harness
cargo test --test adversarial

# With output
cargo test --test adversarial -- --nocapture

# Run via script (PowerShell)
./adversarial-tests/scripts/run_core_harness.ps1

# CLI adversarial tests
./adversarial-tests/scripts/run_cli_adversarial.ps1

# .NET adversarial harness
./adversarial-tests/scripts/run_dotnet_harness.ps1

Test Input Files

TeaLeaf Format (.tl) — 13 files

Crafted .tl files testing parser error paths:

FileError TestedExpected
bad_unclosed_string.tlUnclosed string literal ("Alice)Parse error
bad_missing_colon.tlMissing colon in key-value pairParse error
bad_invalid_escape.tlInvalid escape sequence (\q)Parse error
bad_number_overflow.tlNumber exceeding u64 boundsSee note below
bad_table_wrong_arity.tlTable row with wrong field countParse error
bad_schema_unclosed.tlUnclosed @struct definitionParse error
bad_unicode_escape_short.tlIncomplete \u escape (\u12)Parse error
bad_unicode_escape_invalid_hex.tlInvalid hex in \uZZZZParse error
bad_unicode_escape_surrogate.tlUnicode surrogate pair (\uD800)Parse error
bad_unterminated_multiline.tlUnterminated """ multiline stringParse error
invalid_utf8.tlInvalid UTF-8 byte sequenceParse error

Note: bad_number_overflow.tl does not cause a parse error. Numbers exceeding i64/u64 range are stored as Value::JsonNumber (exact decimal string), not rejected.

Edge cases that should succeed:

FileWhat It TestsExpected
deep_nesting.tl7 levels of nested arrays ([[[[[[[1]]]]]]])Parse OK
empty_doc.tlEmpty documentParse OK

JSON Format (.json) — 6 files

Files testing from_json error and edge-case paths:

FileWhat It TestsExpected
invalid_json_trailing.jsonTrailing comma or contentParse error
invalid_json_unclosed.jsonUnclosed object or arrayParse error
large_number.jsonNumber overflowing f64Stored as JsonNumber
deep_array.jsonDeeply nested arraysParse OK
empty_object.jsonEmpty JSON object {}Parse OK
root_array.jsonRoot-level array [1,2,3]Preserved as array

Binary Format (.tlbx) — 4 files (unused)

These fixture files exist but are not referenced by any test. All binary adversarial tests generate malformed data inline using tempfile::tempdir(). These files are only used by the CLI adversarial scripts in results/cli/.

FileContent
bad_magic.tlbxInvalid magic bytes
bad_version.tlbxInvalid version field
random_garbage.tlbxRandom bytes
truncated_header.tlbxIncomplete header

Test Functions

Parse Error Tests (10 tests)

FunctionInputAssertion
parse_invalid_syntax_unclosed_stringname: "Aliceis_err()
parse_invalid_escape_sequencename: "Alice\q"is_err()
parse_missing_colonname "Alice"is_err()
parse_schema_unclosedUnclosed @structis_err()
parse_table_wrong_arity3 fields for 2-field schemais_err()
parse_unicode_escape_short\u12is_err()
parse_unicode_escape_invalid_hex\uZZZZis_err()
parse_unicode_escape_surrogate\uD800is_err()
parse_unterminated_multiline_string"""unterminatedis_err()
from_json_invalid{"a":1,} (trailing comma)is_err()

Success / Edge-Case Parse Tests (3 tests)

FunctionInputAssertion
parse_number_overflow_falls_to_json_number18446744073709551616Parse succeeds; stored as Value::JsonNumber
parse_deep_nesting_ok[[[[[[[1]]]]]]]Parse succeeds; get("root") returns value
from_json_root_array_is_preserved[1,2,3]Stored under "root" key as Value::Array

Error Variant Coverage (5 tests)

Tests that exercise specific Error enum variants for code coverage:

FunctionWhat It TestsAssertion
parse_unknown_struct_in_table@table nonexistent references undefined structis_err(); message contains struct name
parse_unexpected_eof_unclosed_braceobj: {x: 1,is_err(); message indicates EOF
parse_unexpected_eof_unclosed_bracketarr: [1, 2,is_err()
reader_missing_fieldreader.get("nonexistent") on valid binaryis_err(); message contains key name
from_json_large_number_falls_to_json_number{"big": 18446744073709551616}Parsed as Value::JsonNumber

Type Coercion Tests (2 tests)

Validates spec §2.5 best-effort numeric coercion during binary compilation:

FunctionInputAssertion
writer_int_overflow_coerces_to_zeroint8 field with value 999Binary roundtrip produces Value::Int(0)
writer_uint_negative_coerces_to_zerouint8 field with value -1Binary roundtrip produces Value::UInt(0)

Binary Reader Tests (4 tests)

FunctionInputAssertion
reader_rejects_bad_magic[0x58, 0x58, 0x58, 0x58]Reader::open().is_err()
reader_rejects_bad_versionValid magic + version 3Reader::open().is_err()
load_invalid_file_errors.tl file with bad syntaxTeaLeaf::load().is_err()
load_invalid_utf8_errors[0xFF, 0xFE, 0xFA]TeaLeaf::load().is_err()

Binary Corruption Tests (12 tests)

Tests that take valid binary output, corrupt specific bytes, and verify the reader does not panic:

FunctionWhat It Corrupts
reader_corrupted_magic_byteFlips first magic byte
reader_corrupted_string_table_offsetPoints string table offset past EOF
reader_truncated_string_tableTruncates file right after header
reader_oversized_string_countSets string count to u32::MAX
reader_oversized_section_countSets section count to u32::MAX
reader_corrupted_schema_countSets schema count to u32::MAX
reader_flipped_bytes_in_section_dataFlips bytes in last 10 bytes of section data
reader_truncated_compressed_dataRemoves last 20 bytes from compressed file
reader_invalid_zlib_streamOverwrites data section with 0xBA bytes
reader_zero_length_fileEmpty Vec<u8>
reader_just_magic_no_headerOnly b"TLBX" (4 bytes, no header)
reader_corrupted_type_codeReplaces a type code byte with 0xFE

All corruption tests assert no panic. Most also verify that Reader::from_bytes() or reader.get() either returns an error or handles the corruption gracefully.

Compression Stress Tests (4 tests)

FunctionWhat It Tests
compression_at_threshold_boundaryData just over 64 bytes triggers compression attempt; roundtrip OK
compression_skipped_when_not_beneficialHigh-entropy data: compressed file not much larger than raw
compression_all_identical_bytes10K zeros: compressed size < half of raw; roundtrip OK
compression_below_threshold_stored_rawSmall data with compress=true: stored raw (same size as uncompressed)

Soak / Large-Corpus Tests (8 tests)

Stress tests for parser, writer, and reader with large inputs:

FunctionScaleWhat It Tests
soak_deeply_nested_arrays200 levels deepParser handles deep nesting without stack overflow
soak_wide_object10,000 fieldsParser and Value::Object handle wide objects
soak_large_array100,000 integersParser handles large arrays; first/last element correct
soak_large_array_binary_roundtrip100,000 integersCompile + read roundtrip with compression
soak_many_sections5,000 top-level keysBinary writer/reader handles many sections
soak_many_schemas500 @struct definitionsSchema table handles large schema counts
soak_string_deduplication15,000 strings (5K dupes)String dedup in binary writer; roundtrip correct
soak_long_string1 MB stringBinary writer/reader handles large string values

Memory-Mapped Reader Tests (10 tests)

Validates Reader::open_mmap() produces identical results to Reader::open() and Reader::from_bytes():

FunctionWhat It Tests
mmap_roundtrip_all_primitive_typesInt, float, bool, string, timestamp via mmap
mmap_roundtrip_containersArrays, objects, nested arrays via mmap
mmap_roundtrip_schemas@struct + @table data via mmap
mmap_roundtrip_compressed500-element compressed array via mmap
mmap_vs_open_equivalenceAll keys: open_mmap values == open values
mmap_vs_from_bytes_equivalenceAll keys: open_mmap values == from_bytes values
mmap_large_file50,000-element array via mmap
mmap_nonexistent_fileopen_mmap on missing path returns error
mmap_multiple_sections100 sections via mmap; boundary keys correct
mmap_string_dedup100 identical string values via mmap; dedup preserved

Directory Structure

adversarial-tests/
├── inputs/
│   ├── tl/              # 13 crafted .tl files (11 error + 2 success)
│   ├── json/            # 6 crafted .json files
│   └── tlbx/            # 4 .tlbx files (used by CLI scripts, not Rust tests)
├── core-harness/
│   ├── tests/
│   │   └── adversarial.rs   # 58 Rust integration tests
│   └── Cargo.toml
├── dotnet-harness/          # C# harness using TeaLeaf bindings
├── scripts/
│   ├── run_core_harness.ps1
│   ├── run_cli_adversarial.ps1
│   └── run_dotnet_harness.ps1
├── results/                 # CLI test logs and outputs
└── README.md

Adding New Tests

1. Add an Inline Test (preferred)

Most adversarial tests generate their inputs inline. This avoids stale fixture files and keeps the test self-contained:

#![allow(unused)]
fn main() {
#[test]
fn parse_new_error_case() {
    assert_parse_err("malformed: input here");
}
}

The assert_parse_err helper asserts that TeaLeaf::parse(input).is_err().

2. For Binary Tests

Use the make_valid_binary helper to produce valid bytes, then corrupt them:

#![allow(unused)]
fn main() {
#[test]
fn reader_new_corruption_case() {
    let mut data = make_valid_binary("val: 42", false);
    data[0] ^= 0xFF; // corrupt something
    let result = Reader::from_bytes(data);
    // Assert no panic; error or graceful handling OK
    if let Ok(r) = result {
        let _ = r.get("val");
    }
}
}

3. Input File Tests (for CLI scripts)

Place malformed input in the appropriate subdirectory for CLI adversarial testing:

adversarial-tests/inputs/tl/bad_new_case.tl

The CLI script run_cli_adversarial.ps1 exercises these files through the tealeaf CLI binary and logs results to results/cli/.

Contributing Guide

Contributions to TeaLeaf are welcome. The full contributing guide lives in the repository root:

CONTRIBUTING.md

That document covers project architecture, build instructions, testing, the canonical test suite, version management, PR process, and areas of interest for contributors.

This page highlights the key points. See the Development Setup page for environment setup details.

Ways to Contribute

  • Bug reports – file issues on GitHub with reproduction steps
  • Feature requests – open an issue describing the use case
  • Code contributions – submit pull requests
  • Documentation – fix typos, improve explanations, add examples
  • Language bindings – create bindings for Python, Java, Go, etc.
  • Test cases – add canonical test fixtures or edge case tests

Repository

Source code: github.com/krishjag/tealeaf

Pull Request Checklist

  1. Fork the repository and create a feature branch from main
  2. Make your changes
  3. Run tests and lints:
    cargo test --workspace
    cargo clippy --workspace
    cargo fmt --check
    
  4. If you modified .NET bindings: cd bindings/dotnet && dotnet test
  5. Submit a pull request against main

CI runs on Linux, macOS, and Windows automatically. Version consistency is validated on every PR.

Code Style

Rust

  • Standard rustfmt formatting (no custom config)
  • Standard clippy lints (no custom config)
  • Document public APIs with /// doc comments
  • Edition 2021

C# (.NET)

  • Standard C# naming conventions
  • XML doc comments for public APIs
  • Target frameworks: net6.0, net8.0, net10.0, netstandard2.0

Areas of Interest

New Language Bindings

The FFI layer exposes a C-compatible API that can be used from any language. See the FFI Overview for getting started.

Desired bindings:

  • Python (via ctypes or cffi)
  • Java/Kotlin (via JNI or JNA)
  • Go (via cgo)
  • JavaScript/TypeScript (via WASM or N-API)

Format Improvements

  • Union support in binary encoding
  • Bytes literal syntax in text format
  • Streaming/append-only mode

Tooling

  • Editor plugins (VS Code syntax highlighting for .tl)
  • Schema validation tooling
  • Web-based playground

License

By contributing, you agree that your contributions will be licensed under the MIT License.

Development Setup

How to set up a development environment for working on TeaLeaf. See also the comprehensive CONTRIBUTING.md in the repository root for project architecture, version management, and PR guidelines.

Prerequisites

ToolVersionPurpose
Rust1.70+Core library, CLI, FFI
.NET SDK8.0+.NET bindings and tests
GitAnyVersion control

Optional:

Clone and Build

git clone https://github.com/krishjag/tealeaf.git
cd tealeaf

# Build everything
cargo build --workspace

# Build release
cargo build --workspace --release

Project Layout

tealeaf/
├── tealeaf-core/          # Core library + CLI binary
├── tealeaf-derive/        # Proc-macro (derive macros)
├── tealeaf-ffi/           # C FFI layer
├── bindings/dotnet/       # .NET bindings
├── canonical/             # Shared test fixtures
├── spec/                  # Format specification
├── examples/              # Example files
├── docs-site/             # Documentation site (mdBook)
└── accuracy-benchmark/    # Accuracy benchmark tool

Running Tests

Rust

# All tests
cargo test --workspace

# Specific package
cargo test --package tealeaf-core
cargo test --package tealeaf-derive
cargo test --package tealeaf-ffi

# Specific test file
cargo test --package tealeaf-core --test canonical
cargo test --package tealeaf-core --test derive

# With output
cargo test --workspace -- --nocapture

.NET

cd bindings/dotnet
dotnet build
dotnet test

Lint

cargo clippy --workspace
cargo fmt --check

Development Workflows

Modifying the Parser

  1. Edit tealeaf-core/src/lib.rs (lexer and parser live here)
  2. Run cargo test --package tealeaf-core
  3. Check canonical fixtures still pass
  4. Add new test cases for the change

Modifying the Binary Format

  1. Edit tealeaf-core/src/writer.rs (encoder) and tealeaf-core/src/reader.rs (decoder)
  2. Run canonical round-trip tests: cargo test --package tealeaf-core --test canonical
  3. Regenerate binary fixtures if the format changed

Modifying Derive Macros

  1. Edit files in tealeaf-derive/src/
  2. Run: cargo test --package tealeaf-core --test derive
  3. Check that derive tests cover your change

Modifying FFI

  1. Edit tealeaf-ffi/src/lib.rs
  2. Run: cargo test --package tealeaf-ffi
  3. The C header is auto-regenerated by cbindgen during build

Modifying .NET Bindings

  1. Edit files in bindings/dotnet/
  2. Build: cd bindings/dotnet && dotnet build
  3. Test: dotnet test
  4. The native library must be built first: cargo build --package tealeaf-ffi

Documentation

Building the Documentation Site

# Install mdBook
cargo install mdbook

# Build
cd docs-site
mdbook build

# Serve locally with live reload
mdbook serve --open

Rust API Docs

cargo doc --workspace --no-deps --open

CI/CD

The project uses GitHub Actions for CI:

WorkflowPurpose
rust-cli.ymlBuild and test Rust on all platforms
dotnet-package.ymlBuild .NET package with native libraries
accuracy-benchmark.ymlBenchmark accuracy tests

All CI runs are triggered on push to main/develop and on pull requests.

Debugging

Rust

# Run with debug output
RUST_LOG=debug cargo run --package tealeaf-core -- info test.tl

# Run with backtrace
RUST_BACKTRACE=1 cargo test --package tealeaf-core

.NET

Use Visual Studio or VS Code with the C# extension for debugging the source generator and managed code.

For native library issues, attach a native debugger to the .NET test process.

Type Reference

Complete reference table for all TeaLeaf types, their text syntax, binary encoding, and language mappings.

Primitive Types

TeaLeaf TypeText SyntaxBinary CodeBinary SizeRust TypeC# Type
booltrue / false0x011 byteboolbool
int8420x021 bytei8sbyte
int1610000x032 bytesi16short
int / int321000000x044 bytesi32int
int6450000000000x058 bytesi64long
uint82550x061 byteu8byte
uint16655350x072 bytesu16ushort
uint / uint321000000x084 bytesu32uint
uint64184467440737095516150x098 bytesu64ulong
float323.140x0A4 bytesf32float
float / float643.140x0B8 bytesf64double
string"hello" / hello0x104 bytes (index)Stringstring
bytesb"cafef00d"0x11varint + dataVec<u8>byte[]
json_number(from JSON)0x124 bytes (index)Stringstring
timestamp2024-01-15T10:30:00Z0x3210 bytes(i64, i16)DateTimeOffset

Special Types

TeaLeaf TypeText SyntaxBinary CodeDescription
null~0x00Null/missing value

Container Types

TeaLeaf TypeText SyntaxBinary CodeDescription
Array[1, 2, 3]0x20Ordered collection
Object{key: value}0x21String-keyed map
Struct(val, val, ...) in @table0x22Schema-typed record
Map@map {key: value}0x23Any-keyed ordered map
Tuple(val, val, ...)0x24 (reserved)Currently parsed as array

Semantic Types

TeaLeaf TypeText SyntaxBinary CodeDescription
Ref!name0x30Named reference
Tagged:tag value0x31Discriminated value

Type Modifiers

ModifierSyntaxDescription
Nullabletype?Field can be ~ (null)
Array[]typeArray of the given type
Nullable array[]type?The field itself can be null

Type Widening Path

int8 → int16 → int32 → int64
uint8 → uint16 → uint32 → uint64
float32 → float64

Widening is automatic when reading binary data. Narrowing requires recompilation.

JSON Mapping

TeaLeaf TypeJSON OutputJSON Input
Nullnullnull → Null
Booltrue/falseboolean → Bool
Intnumberinteger → Int
UIntnumberlarge integer → UInt
Floatnumberdecimal → Float
String"text"string → String
Bytes"0xhex"(not auto-detected)
JsonNumbernumberlarge/precise number → JsonNumber
Timestamp"ISO 8601"(not auto-detected)
Array[...]array → Array
Object{...}object → Object
Map[[k,v],...](not auto-detected)
Ref{"$ref":"name"}(not auto-detected)
Tagged{"$tag":"t","$value":v}(not auto-detected)

Comparison Matrix

How TeaLeaf compares to other data formats.

Feature Comparison

FeatureJSONYAMLProtobufAvroMsgPackCBORTeaLeaf
Human-readable textYesYesNo*NoNoNoYes
Compact binaryNoNoYesYesYesYesYes
Schema in textNoNoExternalExternalNoNoInline
Schema in binaryNoNoNoYesNoNoYes
No codegen requiredYesYesNoPartialYesYesYes
CommentsNoYesN/AN/ANoNoYes
Built-in JSON conversionNoNoNoNoYes
String deduplicationNoNoNoNoNoNoYes
Per-section compressionNoNoNoYesNoNoYes
Null bitmapsNoNoNoYesNoNoYes
Random-access readingNoNoNoNoNoNoYes

*Protobuf TextFormat exists but is rarely used.

Size Comparison

FormatSmall Object10K Points1K Users
JSON1.00x1.00x1.00x
YAML~1.1x~1.1x~1.1x
Protobuf0.38x0.65x0.41x
MessagePack0.35x0.63x0.38x
CBOR~0.40x~0.65x~0.42x
TeaLeaf Binary3.56x0.15x0.47x

Speed Comparison

OperationJSON (serde)ProtobufMsgPackTeaLeaf
Parse textFastN/AN/AModerate
Decode binaryN/AFastFastModerate
Encode textFastN/AN/AModerate
Encode binaryN/AFastFastModerate
Random key accessO(n) parseO(1) generatedN/AO(1) hash

When to Use Each Format

Use TeaLeaf When

ScenarioWhy
LLM context / promptsSchema-first reduces token count
Config files (human-edited + deployed)Text for editing, binary for deployment
Large tabular data6-7x compression with string dedup
Self-describing data exchangeNo external schema files needed
Game save data / asset manifestsCompact, nested, self-describing
Scientific/sensor dataNull bitmaps for sparse data

Use JSON When

ScenarioWhy
Web APIs / RESTUniversal support
Small payloads (< 1 KB)No overhead
JavaScript-heavy applicationsNative parsing
Human-only data (no binary needed)Simpler tooling

Use Protobuf When

ScenarioWhy
RPC / gRPC servicesFirst-class streaming support
Maximum decode speedGenerated code with known offsets
Schema evolution at scaleField numbers + backward compat
Microservice communicationEstablished ecosystem

Use Avro When

ScenarioWhy
Hadoop / big data pipelinesEcosystem integration
Schema registry workflowsBuilt-in evolution
Large-scale data lake storageBlock compression

Use MessagePack / CBOR When

ScenarioWhy
Tiny payloads (< 100 bytes)Minimal overhead
Schemaless binaryNo schema definition needed
Drop-in JSON replacementSimilar data model

Ecosystem Maturity

AspectJSONProtobufAvroTeaLeaf
Language supportUniversal10+ languages5+ languagesRust, .NET
ToolingExtensiveExtensiveModerateCLI + libraries
CommunityMassiveLargeMediumEarly
Specification maturityRFC 8259Stable (proto3)Apache specBeta
IDE supportUniversalPluginsPluginsPlanned

TeaLeaf is a young format (v2.0.0-beta.8). It fills a specific niche that existing formats don’t serve well – but it doesn’t aim to replace established formats in their core use cases.

Specification Governance

How the TeaLeaf specification, implementation, and tests relate to each other.

Two Sources of Truth

TeaLeaf has a prose specification and an executable specification:

Prose SpecExecutable Spec
Locationspec/TEALEAF_SPEC.mdcanonical/ test suite
FormatMarkdown document.tl samples + expected JSON + pre-compiled .tlbx
Enforced byHuman reviewCI (automated on every push and PR)
CoversFull grammar, type system, binary layout14 feature areas, 52 tests (42 success + 10 error)

The canonical test suite is the normative specification. If the prose spec and the tests disagree, the tests are authoritative. The prose spec is documentation that describes intent and rationale.

What the Canonical Suite Validates

Each canonical sample is tested through three paths:

Text (.tl) ──────────────────────────────► JSON    (compare with expected/)
Binary (.tlbx) ──────────────────────────► JSON    (compare with expected/)
Text (.tl) ──► Binary (.tlbx) ──► Read ──► JSON    (full round-trip)

The 14 sample files cover:

FileCoverage
primitives.tlNull, bool, int, float, string, escape sequences
arrays.tlEmpty, typed, mixed, nested arrays
objects.tlEmpty, simple, nested, deeply nested objects
schemas.tl@struct, @table, nested structs, nullable fields
special_types.tlReferences, tagged values, maps, edge cases
timestamps.tlISO 8601 variants, timezones, milliseconds
numbers_extended.tlHex, binary, scientific notation, i64 limits
unions.tl@union, empty/multi-field variants
multiline_strings.tlTriple-quoted strings, auto-dedent, code blocks
unicode_escaping.tlCJK, Cyrillic, Arabic, emoji, ZWJ sequences
refs_tags_maps.tlReferences, tagged values, maps, compositions
mixed_schemas.tlSchema-bound and schemaless data together
large_data.tlStress tests: 100+ element arrays, deep nesting, long strings
cyclic_refs.tlReference cycles, forward references, self-references

Error tests in canonical/errors/ validate that invalid input produces specific, stable error messages across all interfaces (CLI, FFI, .NET).

Change Process

Adding New Behavior

When adding new syntax, types, or features:

  1. Design – Describe the change in an issue or PR description
  2. Implement – Modify the parser/encoder/decoder in tealeaf-core
  3. Add canonical tests – Create or extend a sample in canonical/samples/, generate expected JSON and binary fixtures
  4. Update the prose spec – Update spec/TEALEAF_SPEC.md to document the new behavior
  5. CI validates – All three round-trip paths must pass

A PR that adds implementation without canonical tests is incomplete. A PR that updates the prose spec without tests is documentation-only and does not change behavior.

Modifying Existing Behavior

Behavior changes fall into two categories:

Non-breaking (output changes, error message improvements):

  • Update canonical expected outputs (canonical/expected/*.json)
  • Update error golden tests if error messages changed
  • Update the prose spec

Breaking (syntax changes, binary format changes, type system changes):

  • Requires a version bump in release.json
  • Regenerate all binary fixtures (canonical/binary/*.tlbx)
  • Update the prose spec with a clear note about the breaking change
  • Binary format changes must update the format version constant in writer.rs

Error Message Stability

Error messages are part of the public contract. The canonical/errors/ directory contains invalid input files paired with expected error messages in expected_errors.json. Changes to error text should be noted in the changelog and may require downstream consumers to update.

What Is Not Covered

The canonical suite focuses on the core format. These areas rely on their own test suites:

AreaTest LocationNotes
CLI flags and output formattingtealeaf-core/tests/cli_integration.rsTests CLI behavior, not format correctness
Derive macros (Rust)tealeaf-core/tests/derive.rsTests DTO conversion, not parsing
FFI memory managementtealeaf-ffi unit testsTests allocation/deallocation, not format
.NET source generatorTeaLeaf.Generators.TestsTests code generation, not format
.NET serializationTeaLeaf.TestsTests managed-to-native bridge
Accuracy benchmarkaccuracy-benchmarkTests LLM accuracy, not format

Spec Versioning

The format version is embedded in the binary header (see writer.rs). The prose spec documents the current version. When the binary format changes in a backward-incompatible way:

  1. The format version constant in writer.rs must be incremented
  2. The reader (reader.rs) should handle both old and new versions where feasible
  3. All binary fixtures in canonical/binary/ must be regenerated
  4. The prose spec must document the version change

The project version (release.json) and the binary format version are independent. A project version bump does not necessarily mean a format version bump, and vice versa.

Changelog

v2.0.0-beta.8 (Current)

.NET

  • XML documentation in NuGet packagesTeaLeaf and TeaLeaf.Annotations packages now include XML doc files (TeaLeaf.xml, TeaLeaf.Annotations.xml) for all target frameworks. Consumers get IntelliSense tooltips for all public APIs. Previously, GenerateDocumentationFile was not enabled and the .xml files were absent from the .nupkg.
  • Added XML doc comments to all undocumented public members: TLType enum values (13), TLDocument.ToString/Dispose, TLReader.Dispose, TLField.ToString, TLSchema.ToString, TLException constructors (3)
  • Enabled TreatWarningsAsErrors for TeaLeaf and TeaLeaf.Annotations — missing XML docs or other warnings are now compile errors, preventing regressions

Testing

  • Added ToJson_PreservesSpecialCharacters_NoUnicodeEscaping — verifies +, <, >, ' survive binary round-trip without Unicode escaping in both ToJson() and ToJsonCompact() paths
  • Added ToJson_PreservesFloatDecimalPoint_WholeNumbers — verifies whole-number floats (99.0, 150.0, 0.0) retain .0 suffix and non-whole floats (4.5, 3.75) preserve decimal digits

v2.0.0-beta.7

.NET

  • Fixed TLReader.ToJson() escaping non-ASCII-safe characters — + in phone numbers rendered as \u002B, </> as \u003C/\u003E, etc. System.Text.Json’s default JavaScriptEncoder.Default HTML-encodes these characters for XSS safety, which is inappropriate for a data serialization library. All three JSON serialization methods (ToJson, ToJsonCompact, GetAsJson) now use JavaScriptEncoder.UnsafeRelaxedJsonEscaping via shared static readonly options.
  • Fixed TLReader.ToJson() dropping .0 suffix from whole-number floats — 3582.0 in source JSON became 3582 after binary round-trip because System.Text.Json’s JsonValue.Create(double) strips trailing .0. Added FloatToJsonNode helper that uses F1 formatting for whole-number doubles, preserving formatting fidelity with the Rust CLI path.

v2.0.0-beta.6

Features

  • Recursive array schema inference in JSON importfrom_json_with_schemas now discovers schemas for arrays nested inside objects at arbitrary depth (e.g., items[].product.stock[]). Previously, analyze_nested_objects only recursed into nested objects but not nested arrays, causing deeply nested arrays to fall back to []any. The CLI and derive-macro paths now produce equivalent schema coverage.
  • Deterministic schema declaration orderanalyze_array and analyze_nested_objects now use single-pass field-order traversal (depth-first), matching the derive macro’s field-declaration-order strategy. Previously, both functions made two separate passes (arrays first, then objects), causing schema declarations to appear in a different order than the derive/Builder API path. CLI and Builder API now produce byte-identical .tl output for the same data.

Bug Fixes

  • Fixed binary encoding corruption for []any typed arrays — encode_typed_value incorrectly wrote TLType::Struct as the element type for the “any” pseudo-type (the to_tl_type() default for unknown names), causing the reader to interpret heterogeneous data as struct schema indices. Arrays with mixed element types inside schema-typed objects (e.g., order.customer, order.payment) now correctly use heterogeneous 0xFF encoding when no matching schema exists.

Tooling

  • Version sync scripts (sync-version.ps1, sync-version.sh) now regenerate the workflow diagram (assets/tealeaf_workflow.png) via generate_workflow_diagram.py on each version bump

Testing

  • Added json_any_array_binary_roundtrip — focused regression test verifying []any fields inside schema-typed structs survive binary compilation with full data integrity verification
  • Added retail_orders_json_binary_roundtrip — end-to-end test exercising JSON → infer schemas → compile → binary read with retail_orders.json (the exact path that was untested)
  • Added .NET FromJson_HeterogeneousArrayInStruct_BinaryRoundTrips — mirrors the Rust []any regression test through the FFI layer
  • Strengthened .NET FromJson_RetailOrdersFixture_CompileRoundTrips — upgraded from string-contains check to structural JSON verification (10 orders, 4 products, 3 customers, spot-check order ID and item count)
  • Added json_inference_nested_array_inside_object — verifies arrays nested inside objects (e.g., items[].product.stock[]) get their own schema and typed array fields
  • Added gen_retail_orders_api_tl derive integration test — generates .tl from Rust DTOs via Builder API and confirms byte-identical output with CLI path
  • Added examples/retail_orders_different_shape_cli.tl and retail_orders_different_shape_api.tl comparison fixtures (2,395 bytes each, zero diff)
  • Moved retail_orders_different_shape.rs from examples/ to tealeaf-core/tests/fixtures/ to keep test dependencies within the crate boundary
  • Verified all 7 fuzz targets pass (~566K total runs, zero crashes)

v2.0.0-beta.5

Features

  • Schema-aware serialization for Builder APIto_tl_with_schemas() now produces compact @table output for documents built via TeaLeafBuilder with derive-macro schemas. Previously, PascalCase schema names from #[derive(ToTeaLeaf)] (e.g., SalesOrder) didn’t match the serializer’s singularize() heuristic (e.g., "orders""order"), causing all arrays to fall back to verbose [{k: v}] format. The serializer now resolves schemas via a 4-step chain: declared type from parent schema → singularize → case-insensitive singularize → structural field matching.

Bug Fixes

  • Fixed schema inference name collision when a field singularizes to the same name as its parent array’s schema — prevented self-referencing schemas (e.g., @struct root (root: root)) and data loss during round-trip (found via fuzzing)
  • Fixed @table serializer applying wrong schema when the same field name appears at multiple nesting levels with different object shapes — serializer now validates schema fields match the actual object keys before using positional tuple encoding

Testing

  • Added 8 Rust regression tests for schema name collisions: fuzz_repro_dots_in_field_name, schema_name_collision_field_matches_parent, analyze_node_nesting_stress_test, schema_collision_recursive_arrays, schema_collision_recursive_same_shape, schema_collision_three_level_nesting, schema_collision_three_level_divergent_leaves, all_orders_cli_vs_api_roundtrip
  • Added derive integration test test_builder_schema_aware_table_output — verifies Builder API with 5 nested PascalCase schemas produces @table encoding and round-trips correctly
  • Verified all 7 fuzz targets pass (~445K total runs, zero crashes)

v2.0.0-beta.4

Bug Fixes

  • Fixed binary encoding crash when compiling JSON with heterogeneous nested objects — from_json_with_schemas infers any pseudo-type for fields whose nested objects have varying shapes; the binary encoder now falls back to generic encoding instead of erroring with “schema-typed field ‘any’ requires a schema”
  • Fixed parser failing to resolve schema names that shadow built-in type keywords — schemas named bool, int, string, etc. now correctly resolve via LParen lookahead disambiguation (struct tuples always start with (, primitives never do)
  • Fixed singularize() producing empty string for single-character field names (e.g., "s""") — caused @struct definitions with missing names and unparseable TL text output
  • Fixed validate_tokens.py token comparison by converting API input to int for safety

.NET

  • Added TLValueExtensions with GetRequired() extension methods for TLValue and TLDocument — provides non-nullable access patterns, reducing CS8602 warnings in consuming code
  • Added TL007 diagnostic: [TeaLeaf] classes in the global namespace now produce a compile-time error (“TeaLeaf type must be in a named namespace”)
  • Removed SuppressDependenciesWhenPacking property from TeaLeaf.Generators.csproj
  • Exposed InternalsVisibleTo for TeaLeaf.Tests

CI/CD

  • Re-enabled all 6 GitHub Actions workflows after making the repository public (rust-cli, dotnet-package, accuracy-benchmark, docs, coverage, fuzz)
  • Fixed coverlet filter quoting in coverage workflow — commas URL-encoded as %2c to prevent shell argument splitting
  • Fixed Codecov token handling — made CODECOV_TOKEN optional for public repo tokenless uploads
  • Fixed Codecov multi-file upload format — changed from YAML block scalar to comma-separated single-line
  • Refactored coverage workflow to use dotnet-coverage with dedicated settings XML files
  • Added CodeQL security analysis workflow
  • Fixed accuracy-benchmark workflow permissions

Testing

  • Added Rust regression test for any pseudo-type compile round-trip
  • Added 21 Rust tests for schema names shadowing all built-in type keywords (bool, int, int8..int64, uint..uint64, float, float32, float64, string, timestamp, bytes) — covers JSON inference round-trip, direct TL parsing, self-referencing schemas, duplicate declarations, and multiple built-in-named schemas in one document
  • Added 4 .NET regression tests covering TLDocument.FromJsonCompile with heterogeneous nested objects, mixed-structure arrays, complex schema inference, and retail_orders.json end-to-end
  • Added .NET tests for JSON serialization of timestamps and byte arrays
  • Added .NET coverage tests for multi-word enums and nullable nested objects
  • Added .NET source generator tests (524 new lines in GeneratorTests.cs) including TL007 global namespace diagnostic
  • Added .NET TLValue.GetRequired() extension method tests
  • Added .NET TLReader binary reader tests (168 new lines)
  • Added cross-platform FindRepoFile helper for .NET test fixture discovery (walks up directory tree instead of hardcoded relative path depth)
  • Verified full .NET test suite on Linux (WSL Ubuntu 24.04)

Tooling

  • Added --version / -V CLI flag
  • Added delete-caches.ps1 and delete-caches.sh GitHub Actions cache cleanup scripts
  • Updated coverage.ps1 to support dotnet-coverage collection with XML settings files

Documentation

  • Updated binary deserialization method names in quick-start, LLM context guide, schema evolution guide, and derive macros docs
  • Updated tealeaf workflow diagram

v2.0.0-beta.3

Features

  • Byte literalsb"..." hex syntax for byte data in text format (e.g., payload: b"cafef00d")
  • Arbitrary-precision numbersValue::JsonNumber preserves exact decimal representation for numbers exceeding native type ranges
  • Insertion order preservationIndexMap replaces HashMap for all user-facing containers; JSON round-trips now preserve original key order (ADR-0001)
  • Timestamp timezone support — Timestamps encode timezone offset in minutes (10 bytes: 8 millis + 2 offset); supports Z, +HH:MM, -HH:MM, +HH formats
  • Special float valuesNaN, inf, -inf keywords for IEEE 754 special values (JSON export converts to null)
  • Extended escape sequences\b (backspace), \f (form feed), \uXXXX (Unicode code points) for full JSON string escape parity
  • Forward compatibility — Unknown directives silently ignored, enabling older implementations to partially parse files with newer features (spec §1.18)

Bug Fixes

  • Fixed bounds check failures and bitmap overflow issues in binary decoder
  • Fixed lexer infinite loop on certain malformed inputs (found via fuzzing)
  • Fixed NaN value quoting causing incorrect round-trip behavior
  • Fixed parser crashes on deeply nested structures
  • Fixed integer overflow in varint decoding
  • Fixed off-by-one errors in array length checks
  • Fixed negative hex/binary literal parsing
  • Fixed exponent-only numbers (e.g., 1e3) to parse as floats, not integers
  • Fixed timestamp timezone parsing to accept hour-only offsets (+05 = +05:00)
  • Rejected value-only types (object, map, tuple, ref, tagged) as schema field types per spec §2.1
  • Fixed .NET package publishing for TeaLeaf.Annotations and TeaLeaf.Generators to NuGet

Performance

  • Removed O(n log n) key sorting from all serialization paths: 6-17% faster for small/medium objects, up to 69% faster for tabular data
  • Binary decode 56-105% slower for generic object workloads due to IndexMap insertion cost (acceptable trade-off per ADR-0001; columnar workloads less affected)

Specification

  • Schema table header byte +6 stores Union Count (was reserved)
  • String table length encoding changed from u16 to u32 for strings > 65KB
  • Added type code 0x12 for JSONNUMBER
  • Timestamp encoding extended to 10 bytes (8 millis + 2 offset)
  • Added bytes_lit grammar production; extended number to include NaN/inf/-inf
  • Documented object, map, ref, tagged as value-only types (not valid in schema fields)
  • Resolved compression algorithm spec contradiction: binary format v2 uses ZLIB (deflate), not zstd (ADR-0004)

Tooling

  • Fuzzing infrastructure — 7 cargo-fuzz targets with custom dictionaries and structure-aware generation (ADR-0002)
  • Fuzzing CI workflow — GitHub Actions runs all targets for 120s each (~15 min per run)
  • Nesting depth limit — 256-level max for stack overflow protection (ADR-0003)
  • VS Code extension — Syntax highlighting for .tl files (vscode-tealeaf/)
  • FFI safety — Comprehensive # Safety docs on all FFI functions; regenerated tealeaf.h
  • Token validationvalidate_tokens.py script validates API-reported token counts against tiktoken
  • Maintenance scriptsdelete-deployments and delete-workflow-runs for GitHub cleanup

Testing

  • 238+ adversarial tests for malformed binary input
  • 333+ .NET edge case tests for FFI boundary conditions
  • Property-based tests with depth-bounded recursive generation
  • Accuracy benchmark token savings updated to ~36% fewer data tokens (validated with tiktoken)

Documentation

  • ADR-0001: IndexMap for Insertion Order Preservation
  • ADR-0002: Fuzzing Architecture and Strategy
  • ADR-0003: Maximum Nesting Depth Limit (256)
  • ADR-0004: ZLIB Compression for Binary Format
  • Code of Conduct, SECURITY.md, GitHub issue/PR templates
  • examples/showcase.tl — 736-line comprehensive format demonstration
  • Sample accuracy benchmark results

Breaking Changes

  • Value::Object uses IndexMap<String, Value> instead of HashMap (type alias ObjectMap provided; From<HashMap> retained for backward compatibility)
  • Value::Timestamp(i64)Value::Timestamp(i64, i16) — second field is timezone offset in minutes
  • Value::JsonNumber(String) variant added — match expressions on Value need new arm
  • Binary timestamps not backward-compatible (beta.2 readers cannot decode beta.3 timestamps; beta.3 readers handle beta.2 files by defaulting offset to UTC)
  • JSON round-trips preserve key order instead of alphabetizing

v2.0.0-beta.2

Format

  • @union definitions now encoded in binary schema table (full text-binary-text roundtrip)
  • Union schema region uses backward-compatible extension of schema table header
  • Derive macro collect_unions() generates union definitions for Rust enums
  • TeaLeafBuilder::add_union() for programmatic union construction

Improvements

  • Version sync automation expanded to cover all project files (16 targets)
  • NuGet package icon added to all NuGet packages (TeaLeaf, Annotations, Generators)
  • CI badges added to README (Rust CI, .NET CI, crates.io, NuGet, codecov, License)
  • crates.io publish ordering fixed (tealeaf-derive before tealeaf-core)
  • Contributing guide added (CONTRIBUTING.md)
  • Spec governance documentation added
  • Accuracy benchmark dump-prompts subcommand for offline prompt inspection
  • TeaLeaf.Annotations published as separate NuGet package (fixes dependency resolution)
  • benches_proto/ excluded from crates.io package (removes protoc requirement for consumers)

v2.0.0-beta.1

Initial public beta release.

Format

  • Text format (.tl) with comments, schemas, and all value types
  • Binary format (.tlbx) with string deduplication, schema embedding, and per-section compression
  • 15 primitive types + 6 container/semantic types
  • Inline schemas with @struct, @table, @map, @union
  • References (!name) and tagged values (:tag value)
  • File includes (@include)
  • ISO 8601 timestamp support
  • JSON bidirectional conversion with schema inference

CLI

  • 8 commands: compile, decompile, info, validate, to-json, from-json, tlbx-to-json, json-to-tlbx
  • Pre-built binaries for 7 platforms (Windows, Linux, macOS – x64 and ARM64)

Rust

  • tealeaf-core crate with full parser, compiler, and reader
  • tealeaf-derive crate with #[derive(ToTeaLeaf, FromTeaLeaf)]
  • Builder API (TeaLeafBuilder)
  • Memory-mapped binary reading
  • Conversion traits with automatic schema collection

.NET

  • TeaLeaf NuGet package with native libraries for all platforms
  • C# incremental source generator ([TeaLeaf] attribute)
  • Reflection-based serializer (TeaLeafSerializer)
  • Managed wrappers (TLDocument, TLValue, TLReader)
  • Schema introspection API
  • Diagnostic codes TL001-TL006

FFI

  • C-compatible API via tealeaf-ffi crate
  • 45+ exported functions
  • Thread-safe error handling
  • Null-safe for all pointer parameters
  • C header generation via cbindgen

Known Limitations

  • Bytes type does not round-trip through text format (resolved: b"..." hex literals added)
  • JSON import does not recognize $ref, $tag, or timestamp strings
  • Individual string length limited to ~4 GB (u32) in binary format
  • 64-byte header overhead makes TeaLeaf inefficient for very small objects