TeaLeaf Data Format

A schema-aware data format with human-readable text and compact binary representation.

~36% fewer data tokens than JSON for LLM applications, with zero accuracy loss.

v2.0.0-beta.8

What is TeaLeaf?

TeaLeaf is a data format that bridges the gap between human-readable configuration and machine-efficient binary storage. A single .tl source file can be read and edited by humans, compiled to a compact .tlbx binary, and converted to/from JSON – all with schemas inline.

TeaLeaf – schemas with nested structures, compact positional data:

# Schema: define structure once
@struct Location (city, country)
@struct Department (name, location: Location)
@struct Employee (
  id: int,
  name,
  role,
  department: Department,
  skills: []string,
)

# Data: field names not repeated
employees: @table Employee [
  (1, "Alice", "Engineer",
    ("Platform", ("Seattle", "USA")),
    ["rust", "python"])
  (2, "Bob", "Designer",
    ("Product", ("Austin", "USA")),
    ["figma", "css"])
  (3, "Carol", "Manager",
    ("Platform", ("Seattle", "USA")),
    ["leadership", "agile"])
]

JSON – no schema, names repeated:

{
  "employees": [
    {
      "id": 1,
      "name": "Alice",
      "role": "Engineer",
      "department": {
        "name": "Platform",
        "location": { "city": "Seattle", "country": "USA" }
      },
      "skills": ["rust", "python"]
    },
    {
      "id": 2,
      "name": "Bob",
      "role": "Designer",
      "department": {
        "name": "Product",
        "location": { "city": "Austin", "country": "USA" }
      },
      "skills": ["figma", "css"]
    },
    {
      "id": 3,
      "name": "Carol",
      "role": "Manager",
      "department": {
        "name": "Platform",
        "location": { "city": "Seattle", "country": "USA" }
      },
      "skills": ["leadership", "agile"]
    }
  ]
}

Key Features

Feature	Description
Dual format	Human-readable text (`.tl`) and compact binary (`.tlbx`)
Inline schemas	`@struct` definitions live alongside data – no external `.proto` files
JSON interop	Bidirectional conversion with automatic schema inference
String deduplication	Binary format stores each unique string once
Compression	Per-section ZLIB compression with null bitmaps
Comments	`#` line comments in the text format
Language bindings	Native Rust, .NET (via FFI + source generator)
CLI tooling	`tealeaf compile`, `decompile`, `validate`, `info`, JSON conversion

Why TeaLeaf?

The existing data format landscape presents trade-offs that TeaLeaf attempts to bridge. TeaLeaf does not attempt to replace any of the formats listed below, but rather presents a different perspective that users can objectively compare to identify if it fits their specific use cases.

Format	Observation
JSON	Verbose, no comments, no schema
YAML	Indentation-sensitive, error-prone at scale
Protobuf	Schema external, binary-only, requires codegen
Avro	Schema embedded but not human-readable
CSV/TSV	Too simple for nested or typed data
MessagePack/CBOR	Compact but schemaless

TeaLeaf unifies these concerns:

Human-readable text format with explicit types and comments
Compact binary with embedded schemas – no external schema files needed
Schema-first design – field names defined once, not repeated per record
No codegen required – schemas discovered at runtime
Built-in JSON conversion for easy integration with existing tools

Primary Use Case: LLM API Data Payloads

TeaLeaf is well-suited for assembling and managing context for large language models – sending business data, analytics, and structured payloads to LLM APIs where token efficiency directly impacts API costs.

Why TeaLeaf for LLM context:

~36% fewer data tokens — verified across Claude Sonnet 4.5 and GPT-5.2 (12 tasks, 10 domains; savings increase with larger datasets)
Zero accuracy loss — benchmark scores within noise (0.988 vs 0.978 Anthropic, 0.901 vs 0.899 OpenAI)
Binary format for fast cached context retrieval
String deduplication (roles, field names, common values stored once)
Human-readable text for prompt authoring

Token savings example (retail orders dataset):

Format	Characters	Tokens (GPT-5.x)	Savings
JSON	36,791	9,829	—
TeaLeaf	14,542	5,632	43% fewer tokens

Size Comparison

Format	Small Object	10K Points	1K Users
JSON	1.00x	1.00x	1.00x
Protobuf	0.38x	0.65x	0.41x
MessagePack	0.35x	0.63x	0.38x
TeaLeaf Text	1.38x	0.87x	0.63x
TeaLeaf Compressed	3.56x	0.15x	0.47x

TeaLeaf has 64-byte header overhead (not ideal for tiny objects). For large arrays with compression, TeaLeaf achieves 6-7x better compression than JSON.

Trade-off: TeaLeaf decode is ~2-5x slower than Protobuf due to dynamic key-based access. Choose TeaLeaf when size matters more than decode speed.

Project Structure

tealeaf/
├── tealeaf-core/       # Rust core: parser, compiler, reader, CLI
├── tealeaf-derive/     # Rust proc-macro: #[derive(ToTeaLeaf, FromTeaLeaf)]
├── tealeaf-ffi/        # C-compatible FFI layer
├── bindings/
│   └── dotnet/         # .NET bindings + source generator
├── canonical/          # Canonical test fixtures
├── spec/               # Format specification
└── examples/           # Example files and workflows

Quick Links

Getting Started: Installation | Quick Start | Concepts
Format: Text Format | Type System | Binary Format
CLI: Command Reference
Rust: Overview | Derive Macros
.NET: Overview | Source Generator
FFI: API Reference
Guides: LLM Context | Performance

License

TeaLeaf is licensed under the MIT License.

Source code: github.com/krishjag/tealeaf

Installation

Pre-built Binaries

Download the latest release from GitHub Releases.

Platform	Architecture	Download
Windows	x64	tealeaf-windows-x64.zip
Windows	ARM64	tealeaf-windows-arm64.zip
Linux	x64 (glibc)	tealeaf-linux-x64.tar.gz
Linux	ARM64 (glibc)	tealeaf-linux-arm64.tar.gz
Linux	x64 (musl)	tealeaf-linux-musl-x64.tar.gz
macOS	x64 (Intel)	tealeaf-macos-x64.tar.gz
macOS	ARM64 (Apple Silicon)	tealeaf-macos-arm64.tar.gz

Quick Install

Windows (PowerShell)

# Download and extract to current directory
Invoke-WebRequest -Uri "https://github.com/krishjag/tealeaf/releases/latest/download/tealeaf-windows-x64.zip" -OutFile tealeaf.zip
Expand-Archive tealeaf.zip -DestinationPath .

# Optional: add to PATH
$env:PATH += ";$PWD"

Linux / macOS

# Download and extract (replace with your platform)
curl -LO https://github.com/krishjag/tealeaf/releases/latest/download/tealeaf-linux-x64.tar.gz
tar -xzf tealeaf-linux-x64.tar.gz

# Optional: move to PATH
sudo mv tealeaf /usr/local/bin/

Build from Source

Requires the Rust toolchain (1.70+).

git clone https://github.com/krishjag/tealeaf.git
cd tealeaf
cargo build --release --package tealeaf-core

The binary will be at target/release/tealeaf (or tealeaf.exe on Windows).

Verify Installation

tealeaf --version
# tealeaf 2.0.0-beta.8

tealeaf help

Rust Crate

Add tealeaf-core to your Cargo.toml:

[dependencies]
tealeaf-core = { version = "2.0.0-beta.8", features = ["derive"] }

The derive feature enables #[derive(ToTeaLeaf, FromTeaLeaf)] macros.

.NET NuGet Package

dotnet add package TeaLeaf

The NuGet package includes everything needed:

TeaLeaf.Annotations – [TeaLeaf], [TLSkip], and other attributes
TeaLeaf.Generators – C# incremental source generator (bundled as an analyzer)
Native libraries for all supported platforms (Windows, Linux, macOS – x64 and ARM64)

No additional packages required. [TeaLeaf] classes get compile-time serialization methods automatically.

Note: The .NET package requires .NET 8.0 or later. The source generator requires a C# compiler with incremental generator support.

Quick Start

This guide walks through the core TeaLeaf workflow: write text, compile to binary, and convert to/from JSON.

1. Write a TeaLeaf File

Create example.tl:

# Define schemas
@struct address (street: string, city: string, zip: string)
@struct user (
  id: int,
  name: string,
  email: string?,
  address: address,
  active: bool,
)

# Data uses schemas -- field names defined once, not repeated
users: @table user [
  (1, "Alice", "alice@example.com", ("123 Main St", "Seattle", "98101"), true),
  (2, "Bob", ~, ("456 Oak Ave", "Austin", "78701"), false),
]

# Plain key-value pairs
app_version: "2.0.0-beta.2"
debug: false

2. Validate

Check that the file is syntactically correct:

tealeaf validate example.tl

3. Compile to Binary

Compile to the compact binary format:

tealeaf compile example.tl -o example.tlbx

4. Inspect

View information about either format:

tealeaf info example.tl
tealeaf info example.tlbx

5. Convert to JSON

# Text to JSON
tealeaf to-json example.tl -o example.json

# Binary to JSON
tealeaf tlbx-to-json example.tlbx -o example_from_binary.json

6. Convert from JSON

# JSON to TeaLeaf text (with automatic schema inference)
tealeaf from-json example.json -o reconstructed.tl

# JSON to TeaLeaf binary
tealeaf json-to-tlbx example.json -o direct.tlbx

7. Decompile

Convert binary back to text:

tealeaf decompile example.tlbx -o decompiled.tl

Complete Workflow

example.tl ──compile──> example.tlbx ──decompile──> decompiled.tl
    │                       │
    ├──to-json──> example.json <──tlbx-to-json──┘
    │                │
    └──from-json─────┘

Using the Rust API

use tealeaf::TeaLeaf;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Parse text format
    let doc = TeaLeaf::load("example.tl")?;

    // Access values
    if let Some(users) = doc.get("users") {
        println!("Users: {:?}", users);
    }

    // Compile to binary
    doc.compile("example.tlbx", true)?;

    // Convert to JSON
    let json = doc.to_json()?;
    println!("{}", json);

    Ok(())
}

With Derive Macros

use tealeaf::{TeaLeaf, ToTeaLeaf, FromTeaLeaf, ToTeaLeafExt};

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
    id: i32,
    name: String,
    #[tealeaf(optional)]
    email: Option<String>,
    active: bool,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let user = User {
        id: 1,
        name: "Alice".into(),
        email: Some("alice@example.com".into()),
        active: true,
    };

    // Serialize to TeaLeaf text
    let text = user.to_tl_string("user");
    println!("{}", text);

    // Compile directly to binary
    user.to_tlbx("user", "user.tlbx", false)?;

    // Deserialize from binary
    let reader = tealeaf::Reader::open("user.tlbx")?;
    let loaded = User::from_tealeaf_value(&reader.get("user")?)?;

    Ok(())
}

Using the .NET API

Source Generator (Compile-Time)

using TeaLeaf;
using TeaLeaf.Annotations;

[TeaLeaf]
public partial class User
{
    public int Id { get; set; }
    public string Name { get; set; } = "";

    [TLOptional]
    public string? Email { get; set; }

    public bool Active { get; set; }
}

// Serialize
var user = new User { Id = 1, Name = "Alice", Active = true };
string text = user.ToTeaLeafText();
string json = user.ToTeaLeafJson();
user.CompileToTeaLeaf("user.tlbx");

// Deserialize
using var doc = TLDocument.ParseFile("user.tlbx");
var loaded = User.FromTeaLeaf(doc);

Reflection Serializer (Runtime)

using TeaLeaf;

var user = new User { Id = 1, Name = "Alice", Active = true };

// Serialize
string docText = TeaLeafSerializer.ToDocument(user);
TeaLeafSerializer.Compile(user, "user.tlbx");

// Deserialize
var loaded = TeaLeafSerializer.FromText<User>(docText);

Next Steps

Core Concepts – understand schemas, types, and the text format
CLI Reference – all available commands
Rust Guide – Rust API in depth
.NET Guide – .NET bindings in depth

Core Concepts

This page introduces the fundamental ideas behind TeaLeaf.

Dual Format

TeaLeaf has two representations of the same data:

	Text (`.tl`)	Binary (`.tlbx`)
Purpose	Authoring, version control, review	Storage, transmission, deployment
Human-readable	Yes	No
Comments	Yes (`#`)	Stripped during compilation
Schemas	Inline `@struct` definitions	Embedded in schema table
Size	Larger (field names in data)	Compact (positional, deduplicated)
Speed	Slower to parse	Fast random-access via memory mapping

The .tl file is the source of truth. Binary files are compiled artifacts – regenerate them when the source changes.

Schemas

Schemas define the structure of your data using @struct:

@struct point (x: int, y: int)
@struct line (start: point, end: point, color: string?)

Key properties:

Inline – schemas live in the same file as data
Positional – binary encoding uses field order, not names
Nestable – structs can reference other structs
Nullable – fields marked with ? accept null (~)

Schemas enable @table for compact tabular data:

points: @table point [
  (0, 0),
  (100, 200),
  (-50, 75),
]

Without schemas, the same data would require repeating field names:

# Without schemas -- verbose
points: [
  {x: 0, y: 0},
  {x: 100, y: 200},
  {x: -50, y: 75},
]

Type System

TeaLeaf has a rich type system with primitives, containers, and modifiers.

Primitives

Type	Description	Example
`bool`	Boolean	`true`, `false`
`int` / `int32`	32-bit signed integer	`42`, `-17`
`int64`	64-bit signed integer	`9999999999`
`uint` / `uint32`	32-bit unsigned integer	`255`
`float` / `float64`	64-bit float	`3.14`, `6.022e23`
`string`	UTF-8 text	`"hello"`, `alice`
`bytes`	Raw binary data	`b"cafef00d"`
`timestamp`	ISO 8601 date/time	`2024-01-15T10:30:00Z`

Containers

Syntax	Description
`[]T`	Array of type T
`T?`	Nullable type T
`@map { ... }`	Ordered key-value map
`{ key: value }`	Untyped object

Null

The tilde ~ represents null:

optional_field: ~

Key-Value Documents

A TeaLeaf document is a collection of named key-value sections:

# Each top-level entry is a "section" in the binary format
config: {host: localhost, port: 8080}
users: @table user [(1, alice), (2, bob)]
version: "2.0.0-beta.2"

Keys become section names in the binary file. You access values by key at runtime.

References

References allow data reuse and graph structures:

# Define a reference
!seattle: {city: "Seattle", state: "WA"}

# Use it in multiple places
office: !seattle
warehouse: !seattle

Tagged Values

Tags add a discriminator label to values, enabling sum types:

events: [
  :click {x: 100, y: 200},
  :scroll {delta: -50},
  :keypress {key: "Enter"},
]

Unions

Named discriminated unions:

@union shape {
  circle (radius: float),
  rectangle (width: float, height: float),
  point (),
}

shapes: [:circle (5.0), :rectangle (10.0, 20.0), :point ()]

Union definitions are preserved through binary compilation and decompilation, including variant names, field names, and field types.

Compilation Pipeline

   .tl (text)
      │
      ├── parse ──> in-memory document (TeaLeaf / TLDocument)
      │                    │
      │                    ├── compile ──> .tlbx (binary)
      │                    ├── to_json ──> .json
      │                    └── to_tl_text ──> .tl (round-trip)
      │
   .tlbx (binary)
      │
      ├── reader ──> random-access values (zero-copy with mmap)
      │                    │
      │                    ├── decompile ──> .tl
      │                    └── to_json ──> .json
      │
   .json
      │
      └── from_json ──> in-memory document
                             │
                             └── (with schema inference for arrays)

File Includes

Split large files into modules:

@include "schemas/common.tl"
@include "./data/users.tl"

Paths are resolved relative to the including file.

Next Steps

Text Format – complete syntax reference
Type System – all types and modifiers in detail
Schemas – schema definitions, tables, and nesting

Text Format

The TeaLeaf text format (.tl) is the human-readable representation. This page is the complete syntax reference.

Comments

Comments begin with # and extend to end of line:

# This is a line comment
name: alice  # inline comment

Comments are stripped during compilation to binary.

Strings

Simple (Unquoted)

Bare identifiers that contain no whitespace or special characters:

name: alice
host: localhost
status: active

Valid characters: letters, digits, _, -, .

Quoted

Double-quoted strings with escape sequences:

greeting: "hello world"
path: "C:\\Users\\name"
message: "line1\nline2"
tab_separated: "col1\tcol2"

Escape sequences: \\, \", \n, \t, \r, \b (backspace), \f (form feed), \uXXXX (Unicode code point, 4 hex digits)

Multiline (Triple-Quoted)

Triple-quoted strings with automatic leading whitespace removal:

description: """
  This is a multiline string.
  Leading whitespace is trimmed based on
  the indentation of the first content line.
  Useful for documentation blocks.
"""

Numbers

Integers

count: 42
negative: -17
zero: 0

Floats

price: 3.14
scientific: 6.022e23
negative_exp: 1.5e-10

Numbers with exponent notation but no decimal point (e.g., 1e3) are parsed as floats.

Hexadecimal

color: 0xFF5500
mask: 0x00A1

Binary Literals

flags: 0b1010
byte_val: 0b11110000

Both lowercase (0x, 0b) and uppercase (0X, 0B) prefixes are accepted.

Negative hex and binary literals are supported: -0xFF, -0b1010.

Bytes Literals

payload: b"cafef00d"
empty: b""
checksum: b"CAFE"

Hex digits only (uppercase or lowercase), even length, no spaces.

Special Float Values

not_a_number: NaN
positive_infinity: inf
negative_infinity: -inf

These keywords represent IEEE 754 special values. In JSON export, NaN and infinity values are converted to null.

Boolean and Null

enabled: true
disabled: false
missing: ~

The tilde (~) is the null literal.

Timestamps

ISO 8601 formatted date/time values:

# Date only
created: 2024-01-15

# Date and time (UTC)
updated: 2024-01-15T10:30:00Z

# With milliseconds
precise: 2024-01-15T10:30:00.123Z

# With timezone offset
local: 2024-01-15T10:30:00+05:30

Format: YYYY-MM-DD[THH:MM[:SS[.sss]][Z|+HH:MM|-HH:MM]]

Seconds (:SS) are optional and default to 00. Timestamps are stored internally as Unix milliseconds (i64).

Objects

Curly-brace delimited key-value collections:

# Inline
point: {x: 10, y: 20}

# Multi-line
config: {
  host: localhost,
  port: 8080,
  debug: false,
}

Trailing commas are allowed.

Arrays

Square-bracket delimited ordered collections:

numbers: [1, 2, 3, 4, 5]
mixed: [1, "hello", true, ~]
nested: [[1, 2], [3, 4]]
empty: []

Tuples

Parenthesized value lists. Outside of @table, tuples are parsed as plain arrays:

# This is an array [0, 0], NOT a struct
origin: (0, 0)

Inside a @table context, tuples are bound to the table’s schema:

@struct point (x: int, y: int)
points: @table point [
  (0, 0),       # bound to point schema
  (100, 200),
]

Maps

Ordered key-value maps with the @map directive. Unlike objects, maps support non-string keys:

# String keys
headers: @map {
  "Content-Type": "application/json",
  "Accept": "*/*",
}

# Integer keys
status_codes: @map {
  200: "OK",
  404: "Not Found",
  500: "Internal Server Error",
}

# Mixed value types
config: @map {
  name: "myapp",
  port: 8080,
  debug: true,
}

Maps preserve insertion order and support heterogeneous key types.

References

Define named values and reuse them:

# Define a reference
!node_a: {label: "Start", value: 1}
!node_b: {label: "End", value: 2}

# Use references
edges: [
  {from: !node_a, to: !node_b, weight: 1.0},
  {from: !node_b, to: !node_a, weight: 0.5},
]

# References can be used multiple times
nodes: [!node_a, !node_b]

References can be defined at the top level or inside objects.

Tagged Values

A colon prefix adds a discriminator tag to any value:

events: [
  :click {x: 100, y: 200},
  :scroll {delta: -50},
  :keypress {key: "Enter"},
]

Tags are useful for discriminated unions and variant types.

Unions

Named discriminated unions with @union:

@union shape {
  circle (radius: float),
  rectangle (width: float, height: float),
  point (),
}

shapes: [
  :circle (5.0),
  :rectangle (10.0, 20.0),
  :point (),
]

Union definitions are encoded in the binary schema table alongside struct definitions, preserving variant names, field names, and field types through compilation and decompilation.

Root Array

The @root-array directive marks the document as representing a top-level JSON array. This is primarily used for JSON round-trip fidelity.

When a root-level JSON array is imported via from-json, TeaLeaf stores each element as a numbered key (0, 1, 2, …) and emits @root-array so that to-json reconstructs the original array structure:

@root-array

0: {id: 1, name: alice}
1: {id: 2, name: bob}
2: {id: 3, name: carol}

Without @root-array, exporting to JSON would produce {"0": {...}, "1": {...}, ...}. With it, the output is [{...}, {...}, ...].

The directive takes no arguments and must appear before any data pairs.

Unknown Directives

Unknown directives (e.g., @custom) at the document top level are silently ignored. If a same-line argument follows the directive (e.g., @custom foo or @custom [1,2,3]), it is consumed and discarded. Arguments on the next line are not consumed — they are parsed as normal statements. This enables forward compatibility: files authored for a newer spec version can be partially parsed by older implementations that do not recognize new directives.

When an unknown directive appears as a value (e.g., key: @unknown [1,2,3]), it is treated as null. The argument expression is consumed but discarded.

File Includes

Import other TeaLeaf files:

@include "schemas/common.tl"
@include "./shared/config.tl"

Paths are resolved relative to the including file. Included schemas are available for @table use in the including file.

Formatting Rules

Trailing commas are allowed in objects, arrays, tuples, and maps
Whitespace is flexible – indent as you like
Key names follow identifier rules: start with letter or _, then letters, digits, _, -, .
Quoted keys are supported for names with special characters: "Content-Type": "application/json"

Type System

TeaLeaf has a rich type system covering primitives, containers, and type modifiers.

Primitive Types

Type	Aliases	Description	Binary Size
`bool`		true/false	1 byte
`int8`		Signed 8-bit integer	1 byte
`int16`		Signed 16-bit integer	2 bytes
`int`	`int32`	Signed 32-bit integer	4 bytes
`int64`		Signed 64-bit integer	8 bytes
`uint8`		Unsigned 8-bit integer	1 byte
`uint16`		Unsigned 16-bit integer	2 bytes
`uint`	`uint32`	Unsigned 32-bit integer	4 bytes
`uint64`		Unsigned 64-bit integer	8 bytes
`float32`		32-bit IEEE 754 float	4 bytes
`float`	`float64`	64-bit IEEE 754 float	8 bytes
`string`		UTF-8 text	variable
`bytes`		Raw binary data	variable
`json_number`		Arbitrary-precision numeric string (from JSON)	variable
`timestamp`		Unix milliseconds (i64) + timezone offset (i16)	10 bytes

Type Modifiers

field: string          # required string
field: string?         # nullable string (can be ~)
field: []string        # required array of strings
field: []string?       # nullable array of strings (the field itself can be ~)
field: []user          # array of structs

The ? modifier applies to the field, not array elements. However, the parser does accept ~ (null) values inside arrays, including schema-typed arrays. Null elements are tracked in the null bitmap.

Value Types (Not Schema Types)

The following are value types that appear in data but cannot be declared as field types in @struct:

Type	Description
`object`	Untyped `{ key: value }` collections
`map`	Ordered `@map { key: value }` with any key type
`ref`	Reference (`!name`) to another value
`tagged`	Tagged value (`:tag value`)

For structured fields, define a named struct and use it as the field type. For tagged values with a known set of variants, define a @union – this provides schema metadata (variant names, field names, field types) that is preserved in the binary format.

Type Widening

When reading binary data, automatic safe conversions apply:

int8 → int16 → int32 → int64
uint8 → uint16 → uint32 → uint64
float32 → float64

Narrowing conversions are not automatic and require recompilation.

Type Inference

Standalone Values

When writing, the smallest representation is selected:

Integers: i8 if fits, else i16, else i32, else i64
Unsigned: u8 if fits, else u16, else u32, else u64
Floats: always f64 at runtime

Homogeneous Arrays

Arrays of uniform type use optimized encoding:

Array Contents	Encoding Strategy
Schema-typed objects (matching a `@struct`)	Struct array encoding with null bitmaps
`Value::Int` arrays	Packed `Int32` encoding
`Value::String` arrays	String table indices (`u32`)
All other arrays (UInt, Float, Bool, mixed, etc.)	Heterogeneous encoding with per-element type tags

Type Coercion at Compile Time

When compiling schema-bound data, type mismatches use default values rather than erroring:

Target Type	Mismatch Behavior
Numeric fields	Integers/floats coerce; non-numeric becomes `0`
String fields	Non-string becomes empty string `""`
Bytes fields	Non-bytes becomes empty bytes (length 0)
Timestamp fields	Non-timestamp becomes epoch (`0`)

This “best effort” approach prioritizes successful compilation over strict validation. Validate at the application level before compilation for strict type checking.

Bytes Literal

The text format supports b"..." hex literals for byte data:

payload: b"cafef00d"
empty: b""
checksum: b"CA FE"   # ERROR -- no spaces allowed

Contents are hex digits only (uppercase or lowercase)
Length must be even (2 hex chars per byte)
dumps() and decompile emit b"..." for Value::Bytes, enabling full text round-trip
JSON export encodes bytes as "0xcafef00d" strings; JSON import does not auto-convert back to bytes

Schemas

Schemas are the foundation of TeaLeaf’s compact encoding. They define structure once so data can use positional encoding.

Defining Schemas

Use @struct to define a schema:

@struct point (x: int, y: int)

With multiple fields and types:

@struct user (
  id: int,
  name: string,
  email: string?,
  active: bool,
)

Optional Type Annotations

Field types can be omitted – they default to string:

@struct config (host, port: int, debug: bool)
# host defaults to string type

Using Schemas with @table

The @table directive binds an array of tuples to a schema:

@struct user (id: int, name: string, email: string)

users: @table user [
  (1, "Alice", "alice@example.com"),
  (2, "Bob", "bob@example.com"),
  (3, "Carol", "carol@example.com"),
]

Each tuple’s values are matched positionally to the schema fields.

Nested Structs

Structs can reference other structs. Nested tuples inherit schema binding from their parent field type:

@struct address (street: string, city: string, zip: string)

@struct person (
  name: string,
  home: address,
  work: address?,
)

people: @table person [
  (
    "Alice Smith",
    ("123 Main St", "Berlin", "10115"),     # Parsed as address
    ("456 Office Blvd", "Berlin", "10117"), # Parsed as address
  ),
]

Deep Nesting

Schemas can nest arbitrarily deep:

@struct method (type: string, last_four: string)
@struct payment (amount: float, method: method)
@struct order (id: int, customer: string, payment: payment)

orders: @table order [
  (1, "Alice", (99.99, ("credit", "4242"))),
  (2, "Bob", (49.50, ("debit", "1234"))),
]

Array Fields

Schema fields can be arrays of primitives or other structs:

@struct employee (
  id: int,
  name: string,
  skills: []string,
  scores: []int,
)

employees: @table employee [
  (1, "Alice", ["rust", "python"], [95, 88]),
  (2, "Bob", ["java"], [72]),
]

Nullable Fields

The ? modifier makes a field nullable:

@struct user (
  id: int,
  name: string,
  email: string?,   # can be ~
  phone: string?,   # can be ~
)

users: @table user [
  (1, "Alice", "alice@example.com", "+1-555-0100"),
  (2, "Bob", ~, ~),  # email and phone are null
]

Binary Encoding Benefits

Schemas enable significant binary compression:

Positional storage – field names stored once in the schema table, not per row
Null bitmaps – one bit per nullable field per row, instead of full null markers
Type-homogeneous arrays – packed encoding when all elements match a schema
String deduplication – repeated values like city names stored once in the string table

Example Size Savings

For 1,000 user records with 5 fields:

Approach	Approximate Size
JSON (field names repeated)	~80KB
TeaLeaf text (schema + tuples)	~35KB
TeaLeaf binary (compressed)	~15KB

Schema Compatibility

Compatible Changes

Change	Notes
Rename field	Data is positional; names are documentation only
Widen type	`int8` → `int64`, `float32` → `float64` (automatic)

Incompatible Changes (Require Recompile)

Change	Resolution
Add field	Recompile source `.tl` file
Remove field	Recompile source `.tl` file
Reorder fields	Recompile source `.tl` file
Narrow type	Recompile source `.tl` file

Recompilation Workflow

The .tl file is the master. When schemas change:

tealeaf compile data.tl -o data.tlbx

TeaLeaf prioritizes simplicity over automatic schema evolution:

No migration machinery – recompile when schemas change
No version negotiation – the embedded schema is the source of truth
Explicit over implicit – tuples require values for all fields

Binary Format

The TeaLeaf binary format (.tlbx) is the compact, machine-efficient representation. This page documents the binary layout.

Constants

Constant	Value
Magic	`TLBX` (4 bytes, ASCII)
Version Major	`2`
Version Minor	`0`
Header Size	64 bytes

File Structure

┌──────────────────┐
│ Header (64 B)    │
├──────────────────┤
│ String Table     │
├──────────────────┤
│ Schema Table     │
├──────────────────┤
│ Section Index    │
├──────────────────┤
│ Data Sections    │
└──────────────────┘

All multi-byte values are little-endian.

Header (64 bytes)

Offset	Size	Field	Description
0	4	Magic	`TLBX`
4	2	Version Major	`2`
6	2	Version Minor	`0`
8	4	Flags	bit 0: compress (advisory), bit 1: root_array
12	4	Reserved	(unused)
16	8	String Table Offset	`u64` LE
24	8	Schema Table Offset	`u64` LE
32	8	Index Offset	`u64` LE
40	8	Data Offset	`u64` LE
48	4	String Count	`u32` LE
52	4	Schema Count	`u32` LE
56	4	Section Count	`u32` LE
60	4	Reserved	(for future checksum; currently 0)

Flag semantics:

Bit 0 (COMPRESS): Advisory. Indicates one or more sections use ZLIB (deflate) compression. Compression is determined per-section via the entry flags in the section index. This flag is a hint for tooling only.
Bit 1 (ROOT_ARRAY): Indicates the source document was a root-level JSON array.

String Table

All unique strings are deduplicated and stored once:

┌─────────────────────────┐
│ Size: u32               │
│ Count: u32              │
├─────────────────────────┤
│ Offsets: [u32 × Count]  │
│ Lengths: [u32 × Count]  │
├─────────────────────────┤
│ String Data (UTF-8)     │
└─────────────────────────┘

Strings are referenced by 32-bit index throughout the file. This provides:

Deduplication – "Seattle" stored once, even if used 1,000 times
Fast lookup – O(1) index-based access
Compact references – 4 bytes per reference instead of the full string

Schema Table

The schema table stores both struct and union definitions:

┌──────────────────────────────────────┐
│ Size: u32                            │
│ Struct Count: u16                    │
│ Union Count: u16                     │
├──────────────────────────────────────┤
│ Struct Offsets: [u32 × struct_count] │
│ Struct Definitions                   │
├──────────────────────────────────────┤
│ Union Offsets: [u32 × union_count]   │
│ Union Definitions                    │
└──────────────────────────────────────┘

Backward compatibility: The Union Count field at offset +6 was previously reserved (always 0). Old readers that ignore this field and only read Struct Count structs continue to work – they simply skip the union data.

Struct Definition

Schema:
  name_idx: u32      (string table index)
  field_count: u16
  flags: u16         (reserved)

  Field (repeated × field_count):
    name_idx: u32    (string table index)
    type: u8         (TLType code)
    flags: u8        (bit 0: nullable, bit 1: is_array)
    extra: u16       (type reference -- see below)

Field extra values:

For STRUCT (0x22) fields: string table index of the struct type name (0xFFFF = untyped object)
For TAGGED (0x31) fields: string table index of the union type name (0xFFFF = untyped tagged value)
For all other field types: 0xFFFF

Union Definition

Union:
  name_idx: u32         (string table index)
  variant_count: u16
  flags: u16            (reserved)

  Variant (repeated × variant_count):
    name_idx: u32       (string table index)
    field_count: u16
    flags: u16          (reserved)

    Field (repeated × field_count):
      name_idx: u32     (string table index)
      type: u8          (TLType code)
      flags: u8         (bit 0: nullable, bit 1: is_array)
      extra: u16        (same semantics as struct field extra)

Each union variant uses the same 8-byte field entry format as struct fields.

Type Codes

0x00  NULL        0x0A  FLOAT32     0x20  ARRAY      0x30  REF
0x01  BOOL        0x0B  FLOAT64     0x21  OBJECT     0x31  TAGGED
0x02  INT8        0x10  STRING      0x22  STRUCT     0x32  TIMESTAMP
0x03  INT16       0x11  BYTES       0x23  MAP
0x04  INT32       0x12  JSONNUMBER  0x24  TUPLE (reserved)
0x05  INT64
0x06  UINT8
0x07  UINT16
0x08  UINT32
0x09  UINT64

TUPLE (0x24) is reserved but not currently emitted. Tuples in text are parsed as arrays.

JSONNUMBER (0x12) stores arbitrary-precision numeric strings that exceed the range of i64, u64, or f64. Stored as a string table index, identical to STRING encoding.

Section Index

Maps named sections to data locations:

┌─────────────────────────┐
│ Size: u32               │
│ Count: u32              │
├─────────────────────────┤
│ Entries (32 B each)     │
└─────────────────────────┘

Each entry (32 bytes):

Field	Type	Description
`key_idx`	`u32`	String table index for section name
`offset`	`u64`	Absolute file offset to data
`size`	`u32`	Compressed size in bytes
`uncompressed_size`	`u32`	Original size before compression
`schema_idx`	`u16`	Schema index (`0xFFFF` if none)
`type`	`u8`	TLType code
`flags`	`u8`	bit 0: compressed, bit 1: is_array
`item_count`	`u32`	Count for arrays/maps
`reserved`	`u32`	(future use)

Data Encoding

Primitives

Type	Encoding
Null	0 bytes
Bool	1 byte (`0x00` or `0x01`)
Int8/UInt8	1 byte
Int16/UInt16	2 bytes, LE
Int32/UInt32	4 bytes, LE
Int64/UInt64	8 bytes, LE
Float32	4 bytes, IEEE 754 LE
Float64	8 bytes, IEEE 754 LE
String	`u32` index into string table
Bytes	varint length + raw bytes
Timestamp	`i64` Unix milliseconds (LE, 8 bytes) + `i16` timezone offset in minutes (LE, 2 bytes). Total: 10 bytes

Varint Encoding

Used for bytes length:

Continuation bit (0x80) + 7 value bits
Least-significant group first

Arrays (Top-Level, Homogeneous)

For Value::Int (when all values fit in i32) or Value::String arrays:

Count: u32
Element Type: u8 (Int32 or String)
Elements: [packed data]

All other uniform-type arrays (UInt, Bool, Float, Timestamp, Int64) use heterogeneous encoding.

Arrays (Top-Level, Heterogeneous)

For mixed-type arrays:

Count: u32
Element Type: 0xFF (marker)
Elements: [type: u8, data, type: u8, data, ...]

Arrays (Schema-Typed Fields)

Array fields within @struct use homogeneous encoding for ANY element type:

Count: u32
Element Type: u8 (field's declared type)
Elements: [packed typed values]

Objects

Field Count: u16
Fields: [
  key_idx: u32    (string table index)
  type: u8        (TLType code)
  data: [type-specific]
]

Struct Arrays (Optimal Encoding)

Count: u32
Schema Index: u16
Null Bitmap Size: u16
Rows: [
  Null Bitmap: [u8 × bitmap_size]
  Values: [non-null field values only]
]

The null bitmap tracks which fields are null:

Bit i set = field i is null
Only non-null values are stored
Bitmap size = ceil((field_count + 7) / 8)

Maps

Count: u32
Entries: [
  key_type: u8
  key_data: [type-specific]
  value_type: u8
  value_data: [type-specific]
]

References

name_idx: u32    (string table index for reference name)

Tagged Values

tag_idx: u32     (string table index for tag name)
value_type: u8   (TLType code)
value_data: [type-specific]

Compression

Algorithm: ZLIB (deflate)
Threshold: Compress if data > 64 bytes AND compressed < 90% of original
Granularity: Per-section (each section compressed independently)
Flag: Bit 0 of entry flags indicates compression
Decompression: Readers check the flag and decompress transparently

JSON Interoperability

TeaLeaf provides built-in bidirectional JSON conversion for easy integration with existing tools and systems.

JSON to TeaLeaf

CLI

# JSON to TeaLeaf text (with automatic schema inference)
tealeaf from-json input.json -o output.tl

# JSON to TeaLeaf binary
tealeaf json-to-tlbx input.json -o output.tlbx

Rust API

#![allow(unused)]
fn main() {
let doc = TeaLeaf::from_json(json_string)?;

// With automatic schema inference for arrays
let doc = TeaLeaf::from_json_with_schemas(json_string)?;
}

.NET API

using var doc = TLDocument.FromJson(jsonString);

Type Mappings (JSON → TeaLeaf)

JSON Type	TeaLeaf Type
`null`	Null
`true` / `false`	Bool
number (integer)	Int (or UInt if > `i64::MAX`)
number (decimal, finite f64)	Float
number (exceeds i64/u64/f64)	JsonNumber
string	String
array	Array
object	Object

Limitations

JSON import is “plain JSON only” – it does not recognize the special JSON forms used for TeaLeaf export:

JSON Form	Result
`{"$ref": "name"}`	Plain Object (not a Ref)
`{"$tag": "...", "$value": ...}`	Plain Object (not a Tagged)
`[[key, value], ...]`	Plain Array (not a Map)
ISO 8601 strings	Plain String (not a Timestamp)

For full round-trip fidelity with these types, use binary format (.tlbx) or reconstruct programmatically.

TeaLeaf to JSON

CLI

# Text to JSON
tealeaf to-json input.tl -o output.json

# Binary to JSON
tealeaf tlbx-to-json input.tlbx -o output.json

Both commands write to stdout if -o is not specified.

Rust API

#![allow(unused)]
fn main() {
let json = doc.to_json()?;         // pretty-printed
let json = doc.to_json_compact()?;  // minified
}

.NET API

string json = doc.ToJson();         // pretty-printed
string json = doc.ToJsonCompact();   // minified

Type Mappings (TeaLeaf → JSON)

TeaLeaf Type	JSON Representation
Null	`null`
Bool	`true` / `false`
Int, UInt	number
Float	number
JsonNumber	number (parsed back to JSON number)
String	string
Bytes	string (hex with `0x` prefix)
Array	array
Object	object
Map	array of `[key, value]` pairs
Timestamp	string (ISO 8601)
Ref	`{"$ref": "name"}`
Tagged	`{"$tag": "tagname", "$value": value}`

Schema Inference

When converting JSON to TeaLeaf, the from-json command (and from_json_with_schemas API) can automatically infer schemas from arrays of uniform objects.

How It Works

Array Detection – identifies arrays of objects with identical field sets
Name Inference – singularizes parent key names ("products" → product schema)
Type Inference – determines field types across all array items
Nullable Detection – fields with any null values become nullable (string?)
Nested Schemas – creates separate schemas for nested objects within array elements

Example

Input JSON:

{
  "customers": [
    {
      "id": 1,
      "name": "Alice",
      "billing_address": {"street": "123 Main", "city": "Boston"}
    },
    {
      "id": 2,
      "name": "Bob",
      "billing_address": {"street": "456 Oak", "city": "Denver"}
    }
  ]
}

Inferred TeaLeaf output:

@struct billing_address (city: string, street: string)
@struct customer (billing_address: billing_address, id: int, name: string)

customers: @table customer [
  (("Boston", "123 Main"), 1, "Alice"),
  (("Denver", "456 Oak"), 2, "Bob"),
]

Nested Schema Inference

When array elements contain nested objects, TeaLeaf creates schemas for those nested objects if they have uniform structure across all items:

Nested objects become their own @struct definitions
Parent schemas reference nested schemas by name (not object type)
Deeply nested objects are handled recursively

Round-Trip Considerations

Path	Fidelity
`.tl` → `.json` → `.tl`	Lossy – schemas, comments, refs, tags, timestamps, maps are simplified
`.tl` → `.tlbx` → `.tl`	Lossless for data (comments stripped)
`.tl` → `.tlbx` → `.json`	Same as `.tl` → `.json`
`.json` → `.tl` → `.json`	Generally lossless for JSON-native types
`.json` → `.tlbx` → `.json`	Generally lossless for JSON-native types

For types that don’t round-trip through JSON (Ref, Tagged, Map, Timestamp, Bytes), use the binary format for lossless storage.

Grammar

The formal grammar for the TeaLeaf text format in EBNF notation.

EBNF Grammar

document     = { directive | pair | ref_def } ;

directive    = struct_def | union_def | include | root_array ;
struct_def   = "@struct" name "(" fields ")" ;
union_def    = "@union" name "{" variants "}" ;
include      = "@include" string ;
root_array   = "@root-array" ;

variants     = variant { "," variant } ;
variant      = name "(" [ fields ] ")" ;

fields       = field { "," field } ;
field        = name [ ":" type ] ;  (* type defaults to string if omitted *)
type         = [ "[]" ] base_type [ "?" ] ;
base_type    = "bool" | "int" | "int8" | "int16" | "int32" | "int64"
             | "uint" | "uint8" | "uint16" | "uint32" | "uint64"
             | "float" | "float32" | "float64" | "string" | "bytes"
             | "timestamp" | name ;

pair         = key ":" value ;
key          = name | string ;
value        = primitive | object | array | tuple | table | map
             | tagged | ref | timestamp ;

primitive    = string | bytes_lit | number | bool | "~" ;
bytes_lit    = "b\"" { hexdigit hexdigit } "\"" ;
object       = "{" [ ( pair | ref_def ) { "," ( pair | ref_def ) } ] "}" ;
array        = "[" [ value { "," value } ] "]" ;
tuple        = "(" [ value { "," value } ] ")" ;
table        = "@table" name array ;
map          = "@map" "{" [ map_entry { "," map_entry } ] "}" ;
map_entry    = map_key ":" value ;
map_key      = string | name | integer ;
tagged       = ":" name value ;
ref          = "!" name ;
ref_def      = "!" name ":" value ;
timestamp    = date [ "T" time [ timezone ] ] ;

date         = digit{4} "-" digit{2} "-" digit{2} ;
time         = digit{2} ":" digit{2} [ ":" digit{2} [ "." digit{1,3} ] ] ;
timezone     = "Z" | ( "+" | "-" ) digit{2} [ ":" digit{2} | digit{2} ] ;

string       = name | '"' chars '"' | '"""' multiline '"""' ;
number       = integer | float | hex | binary ;
integer      = [ "-" ] digit+ ;
float        = [ "-" ] digit+ "." digit+ [ ("e"|"E") ["+"|"-"] digit+ ]
             | [ "-" ] digit+ ("e"|"E") ["+"|"-"] digit+
             | "NaN" | "inf" | "-inf" ;
hex          = [ "-" ] ("0x" | "0X") hexdigit+ ;
binary       = [ "-" ] ("0b" | "0B") ("0"|"1")+ ;
bool         = "true" | "false" ;
name         = (letter | "_") { letter | digit | "_" | "-" | "." } ;
comment      = "#" { any } newline ;

chars        = { any_char | escape } ;
escape       = "\\" | "\\\"" | "\\n" | "\\t" | "\\r" | "\\b" | "\\f"
             | "\\u" hexdigit hexdigit hexdigit hexdigit ;

Production Notes

Document Structure

A document is a sequence of:

Directives – @struct, @union, @include, @root-array (processed before data)
Pairs – key: value (the actual data)
Reference definitions – !name: value (reusable named values)

Key Rules

Keys can be bare identifiers (name) or quoted strings ("Content-Type")
Trailing commas are allowed in all list contexts (arrays, objects, tuples, maps, fields)
Comments (# to end of line) can appear anywhere whitespace is valid
Whitespace is insignificant except inside strings

Type Defaults

When a field type is omitted in a @struct, it defaults to string:

@struct config (host, port: int, debug: bool)
# "host" is implicitly string

Tuple Semantics

Standalone tuples are parsed as arrays. Only within a @table context do tuples acquire schema binding:

# This is an array [1, 2, 3]
plain: (1, 2, 3)

# These are schema-bound tuples
@struct point (x: int, y: int)
points: @table point [(0, 0), (1, 1)]

Root Array Directive

The @root-array directive marks the document as representing a root-level JSON array rather than a JSON object. This is used for JSON round-trip fidelity – when a JSON array is imported via from-json, the directive is emitted so that to-json produces an array at the top level instead of an object:

@root-array

0: {id: 1, name: alice}
1: {id: 2, name: bob}

Without @root-array, the JSON output would be {"0": {...}, "1": {...}}. With it, the output is [{...}, {...}].

Map Key Restrictions

Map keys are restricted to hashable types: strings, names, and integers. Complex values (objects, arrays) cannot be map keys.

Reference Scoping

References can be defined at:

Top level – !name: value alongside pairs
Inside objects – {!ref: value, field: !ref}

References are resolved within the document scope.

CLI Overview

The tealeaf command-line tool provides all operations for working with TeaLeaf files.

Usage

tealeaf <command> [options]

Commands

Command	Description
`compile`	Compile text (`.tl`) to binary (`.tlbx`)
`decompile`	Decompile binary (`.tlbx`) to text (`.tl`)
`info`	Show file information (auto-detects format)
`validate`	Validate text format syntax
`to-json`	Convert TeaLeaf text to JSON
`from-json`	Convert JSON to TeaLeaf text
`tlbx-to-json`	Convert TeaLeaf binary to JSON
`json-to-tlbx`	Convert JSON to TeaLeaf binary
`help`	Show help text

Global Options

tealeaf help         # Show usage
tealeaf -h           # Show usage
tealeaf --help       # Show usage

Exit Codes

Code	Meaning
`0`	Success
`1`	Error (parse error, I/O error, invalid arguments)

Error messages are written to stderr. Data output goes to stdout (when no -o flag is specified).

Quick Examples

# Full workflow
tealeaf validate data.tl
tealeaf compile data.tl -o data.tlbx
tealeaf info data.tlbx
tealeaf to-json data.tl -o data.json
tealeaf decompile data.tlbx -o recovered.tl

# JSON conversion
tealeaf from-json api_response.json -o structured.tl
tealeaf json-to-tlbx api_response.json -o compact.tlbx
tealeaf tlbx-to-json compact.tlbx -o exported.json

compile

Compile a TeaLeaf text file (.tl) to the compact binary format (.tlbx).

Usage

tealeaf compile <input.tl> -o <output.tlbx>

Arguments

Argument	Required	Description
`<input.tl>`	Yes	Path to the TeaLeaf text file
`-o <output.tlbx>`	Yes	Path for the output binary file

Description

The compile command:

Parses the text file (including any @include directives)
Builds the string table (deduplicates all strings)
Encodes schemas into the schema table
Encodes each top-level key-value pair as a data section
Applies per-section ZLIB compression (enabled by default)
Writes the binary file with the 64-byte header

Compression is applied to sections larger than 64 bytes where the compressed size is less than 90% of the original.

Examples

# Basic compilation
tealeaf compile config.tl -o config.tlbx

# Compile and inspect
tealeaf compile data.tl -o data.tlbx
tealeaf info data.tlbx

Output

On success, prints:

Input and output file paths
Input size, output size, and compression ratio (percentage)

Error Cases

Error	Cause
Parse error	Invalid TeaLeaf syntax in input file
I/O error	Input file not found or output path not writable
Include error	Referenced `@include` file not found

decompile

Convert a TeaLeaf binary file (.tlbx) back to the human-readable text format (.tl).

Usage

tealeaf decompile <input.tlbx> -o <output.tl>

Arguments

Argument	Required	Description
`<input.tlbx>`	Yes	Path to the TeaLeaf binary file
`-o <output.tl>`	Yes	Path for the output text file

Description

The decompile command:

Opens the binary file and reads the header
Loads the string table and schema table
Reads the section index
Decompresses sections as needed
Reconstructs @struct definitions from the schema table
Writes each section as a key-value pair in text format

Notes

Comments are not preserved – comments from the original .tl are stripped during compilation
Formatting may differ – the decompiled output uses the default formatting, which may differ from the original source
Data is lossless – all values, schemas, and structure are preserved
Bytes are lossless – bytes values are written as b"..." hex literals, which round-trip correctly

Examples

# Decompile a binary file
tealeaf decompile data.tlbx -o data_recovered.tl

# Round-trip verification
tealeaf compile original.tl -o compiled.tlbx
tealeaf decompile compiled.tlbx -o roundtrip.tl
tealeaf compile roundtrip.tl -o roundtrip.tlbx
# compiled.tlbx and roundtrip.tlbx should be equivalent

info

Display information about a TeaLeaf file. Auto-detects whether the file is text or binary format.

Usage

tealeaf info <file>

Arguments

Argument	Required	Description
`<file>`	Yes	Path to a `.tl` or `.tlbx` file

Description

The info command auto-detects the file format (by checking for the TLBX magic bytes) and displays relevant information.

For Text Files (`.tl`)

Number of top-level keys
Key names
Number of schema definitions
Schema details (name, fields, types)

For Binary Files (`.tlbx`)

Version information
File size
Header details (offsets, counts)
String table statistics (count, total size)
Schema table details (names, field counts)
Section index (key names, sizes, compression ratios)

Examples

# Inspect a text file
tealeaf info config.tl

# Inspect a binary file
tealeaf info data.tlbx

validate

Validate a TeaLeaf text file for syntactic correctness without compiling it.

Usage

tealeaf validate <file.tl>

Arguments

Argument	Required	Description
`<file.tl>`	Yes	Path to the TeaLeaf text file

Description

The validate command parses the text file and reports any syntax errors. It does not produce any output files.

Validation checks include:

Lexical analysis (valid tokens, string escaping)
Structural parsing (matched brackets, valid directives)
Schema reference validity (@table references defined @struct)
Include file resolution
Type syntax in schema definitions

Examples

# Validate a file
tealeaf validate config.tl

# Validate before compiling
tealeaf validate data.tl && tealeaf compile data.tl -o data.tlbx

Exit Codes

Code	Meaning
`0`	File is valid
`1`	Validation errors found

On success, prints ✓ Valid along with schema and key counts. On failure, prints ✗ Invalid: <error message> and exits with code 1.

to-json / from-json

Convert between TeaLeaf text format and JSON.

to-json

Convert a TeaLeaf text file to JSON.

Usage

tealeaf to-json <input.tl> [-o <output.json>]

Arguments

Argument	Required	Description
`<input.tl>`	Yes	Path to the TeaLeaf text file
`-o <output.json>`	No	Output file path. If omitted, writes to stdout

Examples

# Write to file
tealeaf to-json data.tl -o data.json

# Write to stdout
tealeaf to-json data.tl

# Pipe to another tool
tealeaf to-json data.tl | jq '.users'

Output Format

The output is pretty-printed JSON. See JSON Interoperability for type mapping details.

from-json

Convert a JSON file to TeaLeaf text format with automatic schema inference.

Usage

tealeaf from-json <input.json> -o <output.tl>

Arguments

Argument	Required	Description
`<input.json>`	Yes	Path to the JSON file
`-o <output.tl>`	Yes	Path for the output TeaLeaf text file

Schema Inference

from-json automatically infers schemas from JSON arrays of uniform objects:

Array Detection – identifies arrays where all elements are objects with identical keys
Name Inference – singularizes the parent key name ("users" → user schema)
Type Inference – determines field types across all items
Nullable Detection – fields with any null become nullable (string?)
Nested Schemas – creates schemas for nested uniform objects

Examples

# Convert with schema inference
tealeaf from-json api_data.json -o structured.tl

# Full pipeline: JSON → TeaLeaf text → Binary
tealeaf from-json data.json -o data.tl
tealeaf compile data.tl -o data.tlbx

Example: Schema Inference in Action

Input (employees.json):

{
  "employees": [
    {"id": 1, "name": "Alice", "dept": "Engineering"},
    {"id": 2, "name": "Bob", "dept": "Design"}
  ]
}

Output (employees.tl):

@struct employee (dept: string, id: int, name: string)

employees: @table employee [
  ("Engineering", 1, "Alice"),
  ("Design", 2, "Bob"),
]

tlbx-to-json / json-to-tlbx

Convert between TeaLeaf binary format and JSON directly, without going through the text format.

tlbx-to-json

Convert a TeaLeaf binary file to JSON.

Usage

tealeaf tlbx-to-json <input.tlbx> [-o <output.json>]

Arguments

Argument	Required	Description
`<input.tlbx>`	Yes	Path to the TeaLeaf binary file
`-o <output.json>`	No	Output file path. If omitted, writes to stdout

Examples

# Write to file
tealeaf tlbx-to-json data.tlbx -o data.json

# Write to stdout
tealeaf tlbx-to-json data.tlbx

# Pipe to jq for filtering
tealeaf tlbx-to-json data.tlbx | jq '.config'

Notes

Produces the same JSON output as to-json on the equivalent text file
Reads the binary directly – no intermediate text conversion

json-to-tlbx

Convert a JSON file directly to TeaLeaf binary format.

Usage

tealeaf json-to-tlbx <input.json> -o <output.tlbx>

Arguments

Argument	Required	Description
`<input.json>`	Yes	Path to the JSON file
`-o <output.tlbx>`	Yes	Path for the output binary file

Examples

# Direct JSON to binary
tealeaf json-to-tlbx api_data.json -o compact.tlbx

# Verify the result
tealeaf info compact.tlbx
tealeaf tlbx-to-json compact.tlbx -o verify.json

Notes

Performs schema inference (same as from-json)
Compiles directly to binary – no intermediate .tl file
Compression is enabled by default

Workflow Comparison

# Two-step (via text)
tealeaf from-json data.json -o data.tl
tealeaf compile data.tl -o data.tlbx

# One-step (direct)
tealeaf json-to-tlbx data.json -o data.tlbx

Both approaches produce equivalent binary output.

Rust Guide: Overview

TeaLeaf is written in Rust. The tealeaf-core crate provides the full API for parsing, compiling, reading, and converting TeaLeaf documents.

Crates

Crate	Description
`tealeaf-core`	Core library: parser, compiler, reader, CLI, JSON conversion
`tealeaf-derive`	Proc-macro crate: `#[derive(ToTeaLeaf, FromTeaLeaf)]`
`tealeaf-ffi`	C-compatible FFI layer for language bindings

Installation

Add to your Cargo.toml:

[dependencies]
tealeaf-core = { version = "2.0.0-beta.8", features = ["derive"] }

The derive feature pulls in tealeaf-derive for proc-macro support.

Core Types

`TeaLeaf`

The main document type:

#![allow(unused)]
fn main() {
use tealeaf::TeaLeaf;

// Parse from text
let doc = TeaLeaf::parse("name: Alice\nage: 30")?;

// Load from file
let doc = TeaLeaf::load("data.tl")?;

// Load from JSON
let doc = TeaLeaf::from_json(json_str)?;

// With schema inference
let doc = TeaLeaf::from_json_with_schemas(json_str)?;
}

`Value`

The value enum representing all TeaLeaf types:

#![allow(unused)]
fn main() {
use tealeaf::Value;

pub enum Value {
    Null,
    Bool(bool),
    Int(i64),
    UInt(u64),
    Float(f64),
    String(String),
    Bytes(Vec<u8>),
    Array(Vec<Value>),
    Object(ObjectMap<String, Value>),  // IndexMap alias, preserves insertion order
    Map(Vec<(Value, Value)>),
    Ref(String),
    Tagged(String, Box<Value>),
    Timestamp(i64, i16),  // (unix_millis, tz_offset_minutes)
    JsonNumber(String),   // arbitrary-precision number (raw JSON decimal string)
}
}

`Schema` and `Field`

Schema definitions:

#![allow(unused)]
fn main() {
use tealeaf::{Schema, Field, FieldType};

let schema = Schema {
    name: "user".to_string(),
    fields: vec![
        Field { name: "id".into(), field_type: FieldType { base: "int".into(), nullable: false, is_array: false } },
        Field { name: "name".into(), field_type: FieldType { base: "string".into(), nullable: false, is_array: false } },
        Field { name: "email".into(), field_type: FieldType { base: "string".into(), nullable: true, is_array: false } },
    ],
};
}

Accessing Data

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("data.tl")?;

// Get a value by key
if let Some(Value::String(name)) = doc.get("name") {
    println!("Name: {}", name);
}

// Get a schema
if let Some(schema) = doc.schema("user") {
    for field in &schema.fields {
        println!("  {}: {}", field.name, field.field_type.base);
    }
}
}

Output Operations

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("data.tl")?;

// Compile to binary
doc.compile("data.tlbx", true)?;  // true = enable compression

// Convert to JSON
let json = doc.to_json()?;         // pretty-printed
let json = doc.to_json_compact()?;  // minified

// Convert to TeaLeaf text (with schemas)
let text = doc.to_tl_with_schemas();
}

Conversion Traits

Two traits enable Rust struct ↔ TeaLeaf conversion:

#![allow(unused)]
fn main() {
pub trait ToTeaLeaf {
    fn to_tealeaf_value(&self) -> Value;
    fn collect_schemas() -> IndexMap<String, Schema>;
    fn tealeaf_field_type() -> FieldType;
}

pub trait FromTeaLeaf: Sized {
    fn from_tealeaf_value(value: &Value) -> Result<Self, ConvertError>;
}
}

These are typically derived via #[derive(ToTeaLeaf, FromTeaLeaf)] – see Derive Macros.

Extension Trait

ToTeaLeafExt provides convenience methods for any ToTeaLeaf implementor:

#![allow(unused)]
fn main() {
pub trait ToTeaLeafExt: ToTeaLeaf {
    fn to_tealeaf_doc(&self, key: &str) -> TeaLeaf;
    fn to_tl_string(&self, key: &str) -> String;
    fn to_tlbx(&self, key: &str, path: &str, compress: bool) -> Result<()>;
    fn to_tealeaf_json(&self, key: &str) -> Result<String>;
}
}

Example:

#![allow(unused)]
fn main() {
let user = User { id: 1, name: "Alice".into(), active: true };

// One-liner serialization
let text = user.to_tl_string("user");
user.to_tlbx("user", "user.tlbx", true)?;
let json = user.to_tealeaf_json("user")?;
}

Next Steps

Derive Macros – #[derive(ToTeaLeaf, FromTeaLeaf)]
Attributes Reference – all #[tealeaf(...)] attributes
Builder API – programmatic document construction
Schemas & Types – working with schemas in Rust
Error Handling – error types and patterns

Derive Macros

The tealeaf-derive crate provides two proc-macros for automatic Rust struct ↔ TeaLeaf conversion.

Setup

Enable the derive feature:

[dependencies]
tealeaf-core = { version = "2.0.0-beta.8", features = ["derive"] }

ToTeaLeaf

Converts a Rust struct or enum to a TeaLeaf Value:

#![allow(unused)]
fn main() {
use tealeaf::{ToTeaLeaf, ToTeaLeafExt};

#[derive(ToTeaLeaf)]
struct Config {
    host: String,
    port: i32,
    debug: bool,
}

let config = Config { host: "localhost".into(), port: 8080, debug: true };

// Serialize to TeaLeaf text
let text = config.to_tl_string("config");
// @struct config (host: string, port: int, debug: bool)
// config: (localhost, 8080, true)

// Compile directly to binary
config.to_tlbx("config", "config.tlbx", true)?;

// Convert to JSON
let json = config.to_tealeaf_json("config")?;

// Get as Value
let value = config.to_tealeaf_value();

// Get schemas
let schemas = Config::collect_schemas();
}

FromTeaLeaf

Deserializes a TeaLeaf Value back to a Rust struct:

#![allow(unused)]
fn main() {
use tealeaf::{Reader, FromTeaLeaf};

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Config {
    host: String,
    port: i32,
    debug: bool,
}

let reader = Reader::open("config.tlbx")?;
let value = reader.get("config")?;
let config = Config::from_tealeaf_value(&value)?;
}

Struct Example

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
    id: i64,
    name: String,
    #[tealeaf(optional)]
    email: Option<String>,
    active: bool,
    #[tealeaf(rename = "join_date", type = "timestamp")]
    joined: i64,
}
}

This generates:

Schema: @struct user (id: int64, name: string, email: string?, active: bool, join_date: timestamp)
ToTeaLeaf: serializes to a positional tuple matching the schema
FromTeaLeaf: deserializes from an object or struct-array row

Enum Example

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
enum Shape {
    Circle { radius: f64 },
    Rectangle { width: f64, height: f64 },
    Point,
}

let shapes = vec![
    Shape::Circle { radius: 5.0 },
    Shape::Rectangle { width: 10.0, height: 20.0 },
    Shape::Point,
];
}

Enum variants are serialized as tagged values:

shapes: [:circle {radius: 5.0}, :rectangle {width: 10.0, height: 20.0}, :point ~]

Nested Structs

Structs can reference other ToTeaLeaf/FromTeaLeaf types:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Address {
    street: String,
    city: String,
    zip: String,
}

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Person {
    name: String,
    home: Address,
    #[tealeaf(optional)]
    work: Option<Address>,
}
}

The collect_schemas() method automatically collects schemas from nested types.

Collections

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Team {
    name: String,
    members: Vec<String>,      // []string
    scores: Vec<i32>,          // []int
    leads: Vec<Person>,        // []person (nested struct array)
}
}

Supported Types

Rust Type	TeaLeaf Type
`bool`	`bool`
`i8`, `i16`, `i32`	`int8`, `int16`, `int`
`i64`	`int64`
`u8`, `u16`, `u32`	`uint8`, `uint16`, `uint`
`u64`	`uint64`
`f32`	`float32`
`f64`	`float`
`String`, `&str`	`string`
`Vec<u8>`	`bytes`
`Vec<T>`	`[]T`
`Option<T>`	`T?` (nullable)
`IndexMap<String, T>`	object (order-preserving)
`HashMap<String, T>`	object
Custom struct (with derive)	named struct reference

Attributes Reference

All attributes use the #[tealeaf(...)] namespace and can be applied to structs, enums, or individual fields.

Container Attributes

Applied to a struct or enum:

`rename = "name"`

Override the schema name used in TeaLeaf output:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
#[tealeaf(rename = "app_config")]
struct Config {
    host: String,
    port: i32,
}
// Generates: @struct app_config (host: string, port: int)
}

Without rename, the struct name is converted to snake_case (Config → config).

`key = "name"`

Override the default document key when serializing:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
#[tealeaf(key = "my_config")]
struct Config { /* ... */ }
}

`root_array`

Mark a struct as a root-level array element (changes serialization to omit the wrapping key):

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
#[tealeaf(root_array)]
struct LogEntry {
    timestamp: i64,
    message: String,
}
}

Field Attributes

Applied to individual struct fields:

`rename = "name"`

Override the field name in the schema:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
    #[tealeaf(rename = "user_name")]
    name: String,
}
// Generates: @struct user (user_name: string)
}

`skip`

Exclude a field from serialization/deserialization:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
    name: String,
    #[tealeaf(skip)]
    internal_cache: Option<Vec<u8>>,
}
}

Skipped fields must implement Default for deserialization.

`optional`

Mark a field as nullable in the schema:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct User {
    name: String,
    #[tealeaf(optional)]
    email: Option<String>,  // string?
}
}

Note: Fields of type Option<T> are automatically detected as optional. The #[tealeaf(optional)] attribute is mainly useful for documentation or when using wrapper types.

`type = "tealeaf_type"`

Override the TeaLeaf type for a field:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Event {
    #[tealeaf(type = "timestamp")]
    created_at: i64,  // Would normally be int64, but we want timestamp

    #[tealeaf(type = "uint64")]
    large_count: i64,  // Override the default signed type
}
}

Valid type names: bool, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float, float32, float64, string, bytes, timestamp.

`flatten`

Inline the fields of a nested struct into the parent:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Metadata {
    created_by: String,
    version: i32,
}

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Document {
    title: String,
    #[tealeaf(flatten)]
    meta: Metadata,
}
// Generates: @struct document (title: string, created_by: string, version: int)
// Instead of: @struct document (title: string, meta: metadata)
}

`default`

Use Default::default() when deserializing a missing field:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Config {
    host: String,
    #[tealeaf(default)]
    port: i32,  // defaults to 0 if missing
}
}

`default = "expr"`

Use a custom expression for the default value:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Config {
    host: String,
    #[tealeaf(default = "8080")]
    port: i32,
    #[tealeaf(default = "true")]
    debug: bool,
}
}

Combining Attributes

Multiple attributes can be combined:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Event {
    #[tealeaf(rename = "ts", type = "timestamp")]
    timestamp: i64,

    #[tealeaf(optional, rename = "msg")]
    message: Option<String>,

    #[tealeaf(skip)]
    cached_hash: u64,

    #[tealeaf(flatten)]
    metadata: EventMeta,
}
}

Attribute Summary Table

Attribute	Level	Description
`rename = "name"`	Container or Field	Override schema/field name
`key = "name"`	Container	Override document key
`root_array`	Container	Serialize as root array element
`skip`	Field	Exclude from serialization
`optional`	Field	Mark as nullable (`T?`)
`type = "name"`	Field	Override TeaLeaf type
`flatten`	Field	Inline nested struct fields
`default`	Field	Use `Default::default()`
`default = "expr"`	Field	Use custom default expression

Builder API

The TeaLeafBuilder provides a fluent API for constructing TeaLeaf documents programmatically.

Basic Usage

#![allow(unused)]
fn main() {
use tealeaf::{TeaLeafBuilder, Value};

let doc = TeaLeafBuilder::new()
    .add_value("name", Value::String("Alice".into()))
    .add_value("age", Value::Int(30))
    .add_value("active", Value::Bool(true))
    .build();

// Compile to binary
doc.compile("output.tlbx", true)?;

// Convert to JSON
let json = doc.to_json()?;
}

Methods

`new()`

Create a new empty builder:

#![allow(unused)]
fn main() {
let builder = TeaLeafBuilder::new();
}

`add_value(key, value)`

Add a raw Value to the document:

#![allow(unused)]
fn main() {
builder.add_value("count", Value::Int(42))
}

`add<T: ToTeaLeaf>(key, dto)`

Add a struct that implements ToTeaLeaf. Automatically collects schemas from the type:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
struct Config {
    host: String,
    port: i32,
}

let config = Config { host: "localhost".into(), port: 8080 };

let doc = TeaLeafBuilder::new()
    .add("config", &config)
    .build();
}

`add_vec<T: ToTeaLeaf>(key, items)`

Add an array of ToTeaLeaf items. Automatically collects schemas:

#![allow(unused)]
fn main() {
let users = vec![
    User { id: 1, name: "Alice".into() },
    User { id: 2, name: "Bob".into() },
];

let doc = TeaLeafBuilder::new()
    .add_vec("users", &users)
    .build();
}

`add_schema(schema)`

Manually add a schema definition:

#![allow(unused)]
fn main() {
use tealeaf::{Schema, Field, FieldType};

let schema = Schema {
    name: "point".to_string(),
    fields: vec![
        Field {
            name: "x".into(),
            field_type: FieldType { base: "int".into(), nullable: false, is_array: false },
        },
        Field {
            name: "y".into(),
            field_type: FieldType { base: "int".into(), nullable: false, is_array: false },
        },
    ],
};

let doc = TeaLeafBuilder::new()
    .add_schema(schema)
    .add_value("origin", Value::Array(vec![Value::Int(0), Value::Int(0)]))
    .build();
}

`root_array()`

Mark the document as a root-level array (rather than a key-value document):

#![allow(unused)]
fn main() {
let doc = TeaLeafBuilder::new()
    .root_array()
    .add_value("items", Value::Array(vec![
        Value::Int(1),
        Value::Int(2),
        Value::Int(3),
    ]))
    .build();
}

`build()`

Finalize and return the TeaLeaf document:

#![allow(unused)]
fn main() {
let doc = builder.build();
}

Complete Example

use tealeaf::{TeaLeafBuilder, ToTeaLeaf, FromTeaLeaf, Value};

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Address {
    street: String,
    city: String,
}

#[derive(ToTeaLeaf, FromTeaLeaf)]
struct Employee {
    id: i64,
    name: String,
    address: Address,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let employees = vec![
        Employee {
            id: 1,
            name: "Alice".into(),
            address: Address { street: "123 Main".into(), city: "Seattle".into() },
        },
        Employee {
            id: 2,
            name: "Bob".into(),
            address: Address { street: "456 Oak".into(), city: "Austin".into() },
        },
    ];

    let doc = TeaLeafBuilder::new()
        .add_value("company", Value::String("Acme Corp".into()))
        .add_vec("employees", &employees)
        .add_value("version", Value::Int(1))
        .build();

    // Output
    doc.compile("company.tlbx", true)?;
    println!("{}", doc.to_tl_with_schemas());
    println!("{}", doc.to_json()?);

    Ok(())
}

Schemas & Types

Working with schemas and the type system in Rust.

Schema Structure

#![allow(unused)]
fn main() {
pub struct Schema {
    pub name: String,
    pub fields: Vec<Field>,
}

pub struct Field {
    pub name: String,
    pub field_type: FieldType,
}

pub struct FieldType {
    pub base: String,       // "int", "string", "user", etc.
    pub nullable: bool,     // field: T?
    pub is_array: bool,     // field: []T
}
}

Creating Schemas Manually

#![allow(unused)]
fn main() {
use tealeaf::{Schema, Field, FieldType};

let user_schema = Schema {
    name: "user".to_string(),
    fields: vec![
        Field {
            name: "id".into(),
            field_type: FieldType { base: "int".into(), nullable: false, is_array: false },
        },
        Field {
            name: "name".into(),
            field_type: FieldType { base: "string".into(), nullable: false, is_array: false },
        },
        Field {
            name: "tags".into(),
            field_type: FieldType { base: "string".into(), nullable: false, is_array: true },
        },
        Field {
            name: "email".into(),
            field_type: FieldType { base: "string".into(), nullable: true, is_array: false },
        },
    ],
};
}

Collecting Schemas from Derive

When using #[derive(ToTeaLeaf)], schemas are collected automatically:

#![allow(unused)]
fn main() {
#[derive(ToTeaLeaf)]
struct Address { street: String, city: String }

#[derive(ToTeaLeaf)]
struct User { name: String, home: Address }

// Collects schemas for both `user` and `address`
let schemas = User::collect_schemas();
assert!(schemas.contains_key("user"));
assert!(schemas.contains_key("address"));
}

Accessing Schemas from Documents

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("data.tl")?;

// Get a specific schema
if let Some(schema) = doc.schema("user") {
    println!("Schema: {} ({} fields)", schema.name, schema.fields.len());
    for field in &schema.fields {
        let nullable = if field.field_type.nullable { "?" } else { "" };
        let array = if field.field_type.is_array { "[]" } else { "" };
        println!("  {}: {}{}{}", field.name, array, field.field_type.base, nullable);
    }
}

// Iterate all schemas
for (name, schema) in &doc.schemas {
    println!("{}: {} fields", name, schema.fields.len());
}
}

Accessing Schemas from Binary Reader

Schemas are embedded in the binary format. Parse a key’s value and inspect the document schemas:

#![allow(unused)]
fn main() {
use tealeaf::Reader;

let reader = Reader::open("data.tlbx")?;

// List available keys
for key in reader.keys() {
    let value = reader.get(key)?;
    println!("{}: {:?}", key, value);
}
}

For full schema introspection, decompile the binary back to a TeaLeaf document and access doc.schemas.

Value Type System

The Value enum maps to TeaLeaf types:

Variant	TeaLeaf Type	Notes
`Value::Null`	null	`~` in text
`Value::Bool(b)`	bool
`Value::Int(i)`	int/int8/int16/int32/int64	Size chosen by inference
`Value::UInt(u)`	uint/uint8/uint16/uint32/uint64	Size chosen by inference
`Value::Float(f)`	float/float64	Always f64 at runtime
`Value::String(s)`	string
`Value::Bytes(b)`	bytes
`Value::Array(v)`	array	Heterogeneous or typed
`Value::Object(m)`	object	String-keyed map
`Value::Map(pairs)`	map	Ordered, any key type
`Value::Ref(name)`	ref	`!name` reference
`Value::Tagged(tag, val)`	tagged	`:tag value`
`Value::Timestamp(ms, tz)`	timestamp	Unix milliseconds + timezone offset (minutes)
`Value::JsonNumber(s)`	json-number	Arbitrary-precision number (raw JSON decimal string)

Type Inference at Write Time

When compiling, the writer selects the smallest encoding:

#![allow(unused)]
fn main() {
// Value::Int(42) → int8 in binary (fits in i8)
// Value::Int(1000) → int16 (fits in i16)
// Value::Int(100_000) → int32 (fits in i32)
// Value::Int(5_000_000_000) → int64
}

Schema-Typed Data

When data matches a schema (via @table), binary encoding uses:

Positional storage (no field name repetition)
Null bitmaps (one bit per nullable field)
Type-homogeneous arrays (packed encoding for []int, []string, etc.)

Error Handling

TeaLeaf uses the thiserror crate for structured error types.

Error Types

The main error enum:

Error Variant	Description
`Io`	File I/O error (wraps `std::io::Error`)
`InvalidMagic`	Binary file doesn’t start with `TLBX` magic bytes
`InvalidVersion`	Unsupported binary format version
`InvalidType`	Unknown type code in binary data
`InvalidUtf8`	String encoding error
`UnexpectedToken`	Parse error – expected one token, got another
`UnexpectedEof`	Premature end of input
`UnknownStruct`	`@table` references a struct that hasn’t been defined
`MissingField`	Required field not provided in data
`ParseError`	Generic parse error with message
`ValueOutOfRange`	Numeric value exceeds target type range

Conversion Errors

The ConvertError type is used by FromTeaLeaf:

#![allow(unused)]
fn main() {
pub enum ConvertError {
    MissingField { struct_name: String, field: String },
    TypeMismatch { expected: String, got: String, path: String },
    Nested { path: String, source: Box<ConvertError> },
    Custom(String),
}
}

Handling Errors

Parse Errors

#![allow(unused)]
fn main() {
use tealeaf::TeaLeaf;

match TeaLeaf::parse(input) {
    Ok(doc) => { /* use doc */ },
    Err(e) => {
        eprintln!("Parse error: {}", e);
        // e.g., "Unexpected token: expected ':', got '}' at line 5"
    }
}
}

I/O Errors

#![allow(unused)]
fn main() {
match TeaLeaf::load("nonexistent.tl") {
    Ok(doc) => { /* ... */ },
    Err(e) => {
        // Will be an Io variant wrapping std::io::Error
        eprintln!("Could not load file: {}", e);
    }
}
}

Binary Format Errors

#![allow(unused)]
fn main() {
use tealeaf::Reader;

match Reader::open("corrupted.tlbx") {
    Ok(reader) => { /* ... */ },
    Err(e) => {
        // Could be InvalidMagic, InvalidVersion, etc.
        eprintln!("Binary read error: {}", e);
    }
}
}

Conversion Errors

#![allow(unused)]
fn main() {
use tealeaf::{FromTeaLeaf, Value};

let value = Value::String("not a number".into());
match i32::from_tealeaf_value(&value) {
    Ok(n) => println!("Got: {}", n),
    Err(e) => {
        // ConvertError::TypeMismatch { expected: "Int", got: "String" }
        eprintln!("Conversion failed: {}", e);
    }
}
}

Error Propagation

All errors implement std::error::Error and Display, so they work with ? and anyhow/eyre:

#![allow(unused)]
fn main() {
fn process_file(path: &str) -> Result<(), Box<dyn std::error::Error>> {
    let doc = TeaLeaf::load(path)?;
    let json = doc.to_json()?;
    doc.compile("output.tlbx", true)?;
    Ok(())
}
}

Validation Without Errors

For checking validity without consuming the error:

#![allow(unused)]
fn main() {
let is_valid = TeaLeaf::parse(input).is_ok();
}

The CLI validate command uses this pattern to report validity without stopping on errors.

.NET Guide: Overview

TeaLeaf provides .NET bindings through a NuGet package that includes a C# source generator and a reflection-based serializer, both backed by the native Rust library via P/Invoke.

Architecture

┌─────────────────────────────────────────────┐
│  Your .NET Application                      │
├─────────────────────┬───────────────────────┤
│  Source Generator   │  Reflection Serializer│
│  (compile-time)     │  (runtime)            │
├─────────────────────┴───────────────────────┤
│  TeaLeaf Managed Layer (TLDocument, TLValue)│
├─────────────────────────────────────────────┤
│  P/Invoke (NativeMethods.cs)                │
├─────────────────────────────────────────────┤
│  tealeaf_ffi.dll / .so / .dylib (Rust)      │
└─────────────────────────────────────────────┘

Installation

dotnet add package TeaLeaf

The single package bundles everything:

Component	What it provides
`TeaLeaf`	Managed wrapper types (`TLDocument`, `TLValue`, `TLReader`), reflection serializer
`TeaLeaf.Annotations`	Attributes (`[TeaLeaf]`, `[TLSkip]`, etc.) – included as a dependency
`TeaLeaf.Generators`	C# incremental source generator – bundled as an analyzer
Native libraries	`tealeaf_ffi` for all supported platforms (win/linux/osx, x64/arm64)

Two Serialization Approaches

1. Source Generator (Recommended)

Zero-reflection, compile-time code generation:

[TeaLeaf]
public partial class User
{
    public int Id { get; set; }
    public string Name { get; set; } = "";
    [TLOptional] public string? Email { get; set; }
}

// Generated methods
string schema = User.GetTeaLeafSchema();
string text = user.ToTeaLeafText();
string json = user.ToTeaLeafJson();
user.CompileToTeaLeaf("user.tlbx");
var loaded = User.FromTeaLeaf(doc);

Requirements:

Class must be partial
Annotated with [TeaLeaf]
Properties must have public getters (and setters for deserialization)

2. Reflection Serializer

For generic types, dynamic scenarios, or types you don’t control:

using var doc = TeaLeafSerializer.ToDocument(user);
string text = TeaLeafSerializer.ToText(user);
string json = TeaLeafSerializer.ToJson(user);
var loaded = TeaLeafSerializer.Deserialize<User>(doc);

Core Types

`TLDocument`

The in-memory document, wrapping a native handle:

// Parse text
using var doc = TLDocument.Parse("name: alice\nage: 30");

// Load from file
using var doc = TLDocument.ParseFile("data.tl");

// From JSON
using var doc = TLDocument.FromJson(jsonString);

// Access values
string[] keys = doc.Keys;
using var value = doc["name"];

// Output
string text = doc.ToText();
string json = doc.ToJson();
doc.Compile("output.tlbx", compress: true);

`TLValue`

Represents any TeaLeaf value with type-safe accessors:

using var val = doc["users"];

// Type checking
TLType type = val.Type;
bool isNull = val.IsNull;

// Primitive access
bool? b = val.AsBool();
long? i = val.AsInt();
double? f = val.AsFloat();
string? s = val.AsString();
byte[]? bytes = val.AsBytes();
DateTimeOffset? ts = val.AsDateTime();

// Collection access
int len = val.ArrayLength;
using var elem = val[0];
using var field = val["name"];
string[] keys = val.ObjectKeys;

// Dynamic conversion
object? obj = val.ToObject();

`TLReader`

Binary file reader with optional memory mapping:

// Standard read
using var reader = TLReader.Open("data.tlbx");

// Memory-mapped (zero-copy for large files)
using var reader = TLReader.OpenMmap("data.tlbx");

// Access
string[] keys = reader.Keys;
using var val = reader["users"];

// Schema introspection
int schemaCount = reader.SchemaCount;
string name = reader.GetSchemaName(0);

Next Steps

Source Generator – compile-time code generation in detail
Attributes Reference – all available annotations
Reflection Serializer – runtime serialization
Native Types – TLDocument, TLValue, TLReader API
Diagnostics – compiler warnings and errors
Platform Support – supported runtimes and architectures

Source Generator

The TeaLeaf source generator is a C# incremental source generator (IIncrementalGenerator) that generates serialization and deserialization code at compile time.

How It Works

Roslyn detects classes annotated with [TeaLeaf]
ModelAnalyzer examines the type’s properties, attributes, and nested types
TLTextEmitter generates serialization methods
DeserializerEmitter generates deserialization methods
Generated code is added as a partial class extension

Requirements

The class must be partial
Annotated with [TeaLeaf] (from TeaLeaf.Annotations)
Public properties with getters (and setters for deserialization)
.NET 8.0+ with incremental source generator support

Basic Example

using TeaLeaf.Annotations;

[TeaLeaf]
public partial class User
{
    public int Id { get; set; }
    public string Name { get; set; } = "";
    [TLOptional] public string? Email { get; set; }
    public bool Active { get; set; }
}

Generated Methods

For each [TeaLeaf] class, the generator produces:

`GetTeaLeafSchema()`

Returns the @struct definition as a string:

string schema = User.GetTeaLeafSchema();
// "@struct user (id: int, name: string, email: string?, active: bool)"

`ToTeaLeafText()`

Serializes the instance to TeaLeaf text body format:

string text = user.ToTeaLeafText();
// "(1, \"Alice\", \"alice@example.com\", true)"

`ToTeaLeafDocument(string key = "user")`

Returns a complete TeaLeaf text document with schemas:

string doc = user.ToTeaLeafDocument();
// "@struct user (id: int, name: string, email: string?, active: bool)\nuser: (1, ...)"

`ToTLDocument(string key = "user")`

Parses through the native engine to create a TLDocument:

using var doc = user.ToTLDocument();
string json = doc.ToJson();
doc.Compile("user.tlbx");

`ToTeaLeafJson(string key = "user")`

Serializes to JSON via the native engine:

string json = user.ToTeaLeafJson();

`CompileToTeaLeaf(string path, string key = "user", bool compress = false)`

Compiles directly to a .tlbx binary file:

user.CompileToTeaLeaf("user.tlbx", compress: true);

`FromTeaLeaf(TLDocument doc, string key = "user")`

Deserializes from a TLDocument:

using var doc = TLDocument.ParseFile("user.tlbx");
var loaded = User.FromTeaLeaf(doc);

`FromTeaLeaf(TLValue value)`

Deserializes from a TLValue (for nested types):

using var val = doc["user"];
var loaded = User.FromTeaLeaf(val);

Nested Types

Types referencing other [TeaLeaf] types are fully supported:

[TeaLeaf]
public partial class Address
{
    public string Street { get; set; } = "";
    public string City { get; set; } = "";
}

[TeaLeaf]
public partial class Person
{
    public string Name { get; set; } = "";
    public Address Home { get; set; } = new();
    [TLOptional] public Address? Work { get; set; }
}

Generated schema:

@struct address (street: string, city: string)
@struct person (name: string, home: address, work: address?)

Collections

[TeaLeaf]
public partial class Team
{
    public string Name { get; set; } = "";
    public List<string> Tags { get; set; } = new();
    public List<Person> Members { get; set; } = new();
}

Generated schema:

@struct team (name: string, tags: []string, members: []person)

Enum Support

Enums are serialized as snake_case strings:

public enum Status { Active, Inactive, Suspended }

[TeaLeaf]
public partial class User
{
    public string Name { get; set; } = "";
    public Status Status { get; set; }
}

In TeaLeaf text: ("Alice", active)

Type Mapping

C# Type	TeaLeaf Type
`bool`	`bool`
`int`	`int`
`long`	`int64`
`short`	`int16`
`sbyte`	`int8`
`uint`	`uint`
`ulong`	`uint64`
`ushort`	`uint16`
`byte`	`uint8`
`double`	`float`
`float`	`float32`
`decimal`	`float`
`string`	`string`
`DateTime`	`timestamp`
`DateTimeOffset`	`timestamp`
`byte[]`	`bytes`
`List<T>`	`[]T`
`T?` / `Nullable<T>`	`T?`
Enum	`string`
`[TeaLeaf]` class	struct reference

.NET Attributes Reference

All TeaLeaf annotations are in the TeaLeaf.Annotations namespace.

Type-Level Attributes

`[TeaLeaf]` / `[TeaLeaf("struct_name")]`

Marks a class for source generator processing:

[TeaLeaf]           // Schema name: "my_class" (auto snake_case)
public partial class MyClass { }

[TeaLeaf("config")] // Schema name: "config" (explicit)
public partial class AppConfiguration { }

The optional string parameter sets the struct name used in the TeaLeaf schema. If omitted, the class name is converted to snake_case.

The attribute also has an EmitSchema property (defaults to true). When set to false, the source generator skips @struct and @table output for arrays of this type:

[TeaLeaf(EmitSchema = false)]  // Data only, no @struct definition
public partial class RawData { }

`[TLKey("key_name")]`

Overrides the top-level key used when serializing as a document entry:

[TeaLeaf]
[TLKey("app_settings")]
public partial class Config
{
    public string Host { get; set; } = "";
    public int Port { get; set; }
}

// Default key would be "config", but TLKey overrides to "app_settings"
string doc = config.ToTeaLeafDocument(); // key is "app_settings"

Property-Level Attributes

`[TLSkip]`

Exclude a property from serialization and deserialization:

[TeaLeaf]
public partial class User
{
    public int Id { get; set; }
    public string Name { get; set; } = "";

    [TLSkip]
    public string ComputedDisplayName => $"User #{Id}: {Name}";
}

`[TLOptional]`

Mark a property as nullable in the schema:

[TeaLeaf]
public partial class User
{
    public string Name { get; set; } = "";

    [TLOptional]
    public string? Email { get; set; }

    [TLOptional]
    public int? Age { get; set; }
}
// Schema: @struct user (name: string, email: string?, age: int?)

Note: Properties of nullable reference types (string?) or Nullable<T> types (int?) are automatically treated as optional. The [TLOptional] attribute is mainly for explicit documentation.

`[TLRename("field_name")]`

Override the field name in the TeaLeaf schema:

[TeaLeaf]
public partial class User
{
    [TLRename("user_name")]
    public string Name { get; set; } = "";

    [TLRename("is_active")]
    public bool Active { get; set; }
}
// Schema: @struct user (user_name: string, is_active: bool)

Without [TLRename], property names are converted to snake_case (Name → name, IsActive → is_active).

`[TLType("type_name")]`

Override the TeaLeaf type for a field:

[TeaLeaf]
public partial class Event
{
    public string Name { get; set; } = "";

    [TLType("timestamp")]
    public long CreatedAt { get; set; }  // Would be int64, forced to timestamp

    [TLType("uint64")]
    public long LargeCount { get; set; }  // Would be int64, forced to uint64
}

Valid type names: bool, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float, float32, float64, string, bytes, timestamp.

Attribute Summary

Attribute	Level	Description
`[TeaLeaf]` / `[TeaLeaf("name")]`	Class	Enable source generation, optional struct name
`[TLKey("key")]`	Class	Override document key
`[TLSkip]`	Property	Exclude from serialization
`[TLOptional]`	Property	Mark as nullable in schema
`[TLRename("name")]`	Property	Override field name
`[TLType("type")]`	Property	Override TeaLeaf type

Combining Attributes

[TeaLeaf("event_record")]
[TLKey("events")]
public partial class EventRecord
{
    [TLRename("event_id")]
    public int Id { get; set; }

    public string Name { get; set; } = "";

    [TLType("timestamp")]
    public long CreatedAt { get; set; }

    [TLOptional]
    [TLRename("extra_data")]
    public string? Metadata { get; set; }

    [TLSkip]
    public string DisplayLabel => $"{Name} ({Id})";
}

Generated schema:

@struct event_record (event_id: int, name: string, created_at: timestamp, extra_data: string?)

Reflection Serializer

The TeaLeafSerializer class provides runtime reflection-based serialization for scenarios where the source generator isn’t suitable.

When to Use

Scenario	Approach
Known types at compile time	Source Generator (recommended)
Generic types (`T`)	Reflection Serializer
Types you don’t control (third-party)	Reflection Serializer
Dynamic/runtime-determined types	Reflection Serializer
Maximum performance	Source Generator

API

All methods are on the static TeaLeafSerializer class.

Serialization

// To document text (schemas + data)
string docText = TeaLeafSerializer.ToDocument<User>(user);
string docText = TeaLeafSerializer.ToDocument<User>(user, key: "custom_key");

// To TeaLeaf text (data only, no schemas)
string text = TeaLeafSerializer.ToText<User>(user);

// To TLDocument (for further operations)
using var doc = TeaLeafSerializer.ToTLDocument<User>(user);
using var doc = TeaLeafSerializer.ToTLDocument<User>(user, key: "custom_key");

// To JSON (via native engine)
string json = TeaLeafSerializer.ToJson<User>(user);

// Compile to binary
TeaLeafSerializer.Compile<User>(user, "output.tlbx", compress: true);

Deserialization

// From TLDocument
using var doc = TLDocument.Parse(tlText);
var user = TeaLeafSerializer.FromDocument<User>(doc);
var user = TeaLeafSerializer.FromDocument<User>(doc, key: "custom_key");

// From TLValue (for nested types)
using var val = doc.Get("user");
var user = TeaLeafSerializer.FromValue<User>(val);

// From text
var user = TeaLeafSerializer.FromText<User>(tlText);

Schema Generation

// Get schema string
string schema = TeaLeafSerializer.GetSchema<User>();
// "@struct user (id: int, name: string, email: string?)"

// Get TeaLeaf type name for a C# type
string typeName = TeaLeafTextHelper.GetTLTypeName(typeof(int));    // "int"
string typeName = TeaLeafTextHelper.GetTLTypeName(typeof(long));   // "int64"
string typeName = TeaLeafTextHelper.GetTLTypeName(typeof(DateTime)); // "timestamp"

Type Mapping

The reflection serializer uses TeaLeafTextHelper.GetTLTypeName() for type resolution:

C# Type	TeaLeaf Type
`bool`	`bool`
`int`	`int`
`long`	`int64`
`short`	`int16`
`sbyte`	`int8`
`uint`	`uint`
`ulong`	`uint64`
`ushort`	`uint16`
`byte`	`uint8`
`double`	`float`
`float`	`float32`
`decimal`	`float`
`string`	`string`
`DateTime`	`timestamp`
`DateTimeOffset`	`timestamp`
`byte[]`	`bytes`
`List<T>`	`[]T`
`Dictionary<string, T>`	`object`
Enum	`string`
`[TeaLeaf]` class	struct reference

Attributes

The reflection serializer respects the same attributes as the source generator:

[TeaLeaf] / [TeaLeaf("name")] – struct name
[TLKey("key")] – document key
[TLSkip] – skip property
[TLOptional] – nullable field
[TLRename("name")] – rename field
[TLType("type")] – override type

Text Helpers

The TeaLeafTextHelper class provides utilities used by the serializer:

// PascalCase to snake_case
TeaLeafTextHelper.ToSnakeCase("MyProperty"); // "my_property"

// String quoting
TeaLeafTextHelper.NeedsQuoting("hello world"); // true
TeaLeafTextHelper.QuoteIfNeeded("hello world"); // "\"hello world\""
TeaLeafTextHelper.EscapeString("line\nnewline"); // "line\\nnewline"

// Value formatting
var sb = new StringBuilder();
TeaLeafTextHelper.AppendValue(sb, 42, typeof(int)); // "42"
TeaLeafTextHelper.AppendValue(sb, null, typeof(string)); // "~"

Performance Considerations

The reflection serializer uses System.Reflection at runtime, which is slower than the source generator approach. For hot paths or high-throughput scenarios, prefer the source generator.

However, the actual binary compilation and native operations are identical – both approaches use the same native Rust library under the hood. The performance difference is only in the C# serialization/deserialization layer.

Native Types

The managed wrapper types provide safe access to the native TeaLeaf library. All native types implement IDisposable and must be disposed to prevent memory leaks.

TLDocument

Represents a parsed TeaLeaf document.

Construction

// Parse text
using var doc = TLDocument.Parse("name: alice\nage: 30");

// Parse from file (text or binary -- auto-detected)
using var doc = TLDocument.ParseFile("data.tl");
using var doc = TLDocument.ParseFile("data.tlbx");

// From JSON string
using var doc = TLDocument.FromJson("{\"name\": \"alice\"}");

Value Access

// Get value by key
using var val = doc["name"];       // indexer
using var val = doc.Get("name");   // method

// Get all keys
string[] keys = doc.Keys;

Output

// To text
string text = doc.ToText();               // full document (schemas + data)
string data = doc.ToTextDataOnly();       // data only (no schemas)

// To JSON
string json = doc.ToJson();               // pretty-printed
string json = doc.ToJsonCompact();        // minified

// Compile to binary
doc.Compile("output.tlbx", compress: true);

Disposal

TLDocument wraps a native pointer. Always dispose:

using var doc = TLDocument.Parse(text);  // using statement (recommended)

// Or manual disposal
var doc = TLDocument.Parse(text);
try { /* use doc */ }
finally { doc.Dispose(); }

TLValue

Represents any TeaLeaf value with type-safe accessors.

Type Checking

TLType type = value.Type;    // Enum: Null, Bool, Int, UInt, Float, String, etc.
bool isNull = value.IsNull;  // Shorthand for Type == TLType.Null

Primitive Accessors

Each returns null if the value is not the expected type:

bool? b = value.AsBool();
long? i = value.AsInt();
ulong? u = value.AsUInt();
double? f = value.AsFloat();
string? s = value.AsString();
long? ts = value.AsTimestamp();          // Unix milliseconds
short? tz = value.AsTimestampOffset();  // Timezone offset in minutes (0 = UTC)
DateTimeOffset? dt = value.AsDateTime(); // Converted from timestamp (preserves offset)
byte[]? bytes = value.AsBytes();

Object Access

string[] keys = value.ObjectKeys;          // All field names
using var field = value.GetField("name");  // Get by key
using var field = value["name"];           // Indexer shorthand

Array Access

int len = value.ArrayLength;
using var elem = value.GetArrayElement(0); // By index
using var elem = value[0];                 // Indexer shorthand

foreach (var item in value.AsArray())
{
    // item is a TLValue -- caller must dispose
    using (item)
    {
        Console.WriteLine(item.AsString());
    }
}

Map Access

int len = value.MapLength;
using var key = value.GetMapKey(0);
using var val = value.GetMapValue(0);

foreach (var (k, v) in value.AsMap())
{
    using (k) using (v)
    {
        Console.WriteLine($"{k.AsString()}: {v.AsString()}");
    }
}

Reference and Tag Access

string? refName = value.AsRefName();   // For Ref values
string? tagName = value.AsTagName();   // For Tagged values
using var inner = value.AsTagValue();  // Inner value of a Tagged

Dynamic Conversion

object? obj = value.ToObject();
// Returns: bool, long, ulong, double, string, byte[],
// DateTimeOffset, object[], Dictionary<string, object?>, or null

TLReader

Binary file reader with optional memory-mapped I/O.

Construction

// Standard file read
using var reader = TLReader.Open("data.tlbx");

// Memory-mapped (recommended for large files)
using var reader = TLReader.OpenMmap("data.tlbx");

Value Access

string[] keys = reader.Keys;
using var val = reader["users"];
using var val = reader.Get("users");

Schema Introspection

foreach (var schema in reader.Schemas)
{
    Console.WriteLine($"Schema: {schema.Name}");
    foreach (var field in schema.Fields)
    {
        Console.WriteLine($"  {field.Name}: {(field.IsArray ? "[]" : "")}{field.Type}{(field.IsNullable ? "?" : "")}");
    }
}

// Look up a specific schema by name
var userSchema = reader.GetSchema("user");
if (userSchema != null)
{
    Console.WriteLine($"user has {userSchema.Fields.Count} fields");
}

TLType Enum

public enum TLType
{
    Null = 0,
    Bool = 1,
    Int = 2,
    UInt = 3,
    Float = 4,
    String = 5,
    Bytes = 6,
    Array = 7,
    Object = 8,
    Map = 9,
    Ref = 10,
    Tagged = 11,
    Timestamp = 12,
}

Memory Management

All native types (TLDocument, TLValue, TLReader) hold native pointers and must be disposed:

// Preferred: using statement
using var doc = TLDocument.Parse(text);

// For values from collections, dispose each item:
foreach (var item in value.AsArray())
{
    using (item)
    {
        // process
    }
}

// For map entries:
foreach (var (key, val) in value.AsMap())
{
    using (key) using (val)
    {
        // process
    }
}

Accessing a disposed object throws ObjectDisposedException.

Diagnostics

The TeaLeaf source generator reports diagnostics (warnings and errors) through the standard C# compiler diagnostic system.

Diagnostic Codes

Code	Severity	Message
TL001	Error	Type must be declared as partial
TL002	Warning	Unsupported property type
TL003	Error	Invalid TLType attribute value
TL004	Warning	Nested type not annotated with [TeaLeaf]
TL005	Warning	Circular type reference detected
TL006	Error	Open generic types are not supported

TL001: Type Must Be Partial

The source generator needs to add methods to your class. This requires the partial modifier.

// ERROR: TL001
[TeaLeaf]
public class User { }  // Missing 'partial'

// FIXED
[TeaLeaf]
public partial class User { }

TL002: Unsupported Property Type

A property type isn’t directly mappable to a TeaLeaf type.

[TeaLeaf]
public partial class Config
{
    public IntPtr NativeHandle { get; set; }  // WARNING: TL002
}

The property will be skipped. Supported types include all primitives, string, DateTime, DateTimeOffset, byte[], List<T>, Dictionary<string, T>, enums, and other [TeaLeaf]-annotated classes.

TL003: Invalid TLType Value

The [TLType] attribute was given an unrecognized type name.

[TeaLeaf]
public partial class Event
{
    [TLType("datetime")]   // ERROR: TL003 -- "datetime" is not a valid type
    public long Created { get; set; }

    [TLType("timestamp")]  // CORRECT
    public long Updated { get; set; }
}

Valid values: bool, int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, float, float32, float64, string, bytes, timestamp.

TL004: Nested Type Not Annotated

A property references a class type that doesn’t have the [TeaLeaf] attribute.

public class Address  // Missing [TeaLeaf]
{
    public string City { get; set; } = "";
}

[TeaLeaf]
public partial class User
{
    public Address Home { get; set; } = new();  // WARNING: TL004
}

Fix by adding [TeaLeaf] to the nested type:

[TeaLeaf]
public partial class Address
{
    public string City { get; set; } = "";
}

TL005: Circular Type Reference

A type references itself (directly or transitively), which may cause a stack overflow at runtime during serialization.

[TeaLeaf]
public partial class TreeNode
{
    public string Name { get; set; } = "";
    public TreeNode? Child { get; set; }  // WARNING: TL005 -- circular reference
}

The code will still compile, but recursive structures must be bounded (e.g., use [TLOptional] with null termination) to avoid infinite recursion.

TL006: Open Generic Types

Generic type parameters are not supported:

// ERROR: TL006
[TeaLeaf]
public partial class Container<T>
{
    public T Value { get; set; }
}

Use concrete types instead. For generic scenarios, use the Reflection Serializer.

Viewing Diagnostics

Diagnostics appear in:

Visual Studio – Error List window
VS Code – Problems panel (with C# extension)
dotnet build – terminal output
MSBuild – build log

Example compiler output:

User.cs(3,22): error TL001: TeaLeaf type 'User' must be declared as partial
Config.cs(8,16): warning TL004: Property 'Address' type is not annotated with [TeaLeaf]

Platform Support

The TeaLeaf NuGet package includes pre-built native libraries for all major platforms.

Supported Platforms

OS	Architecture	Native Library	Status
Windows	x64	`tealeaf_ffi.dll`	Supported
Windows	ARM64	`tealeaf_ffi.dll`	Supported
Linux	x64 (glibc)	`libtealeaf_ffi.so`	Supported
Linux	ARM64 (glibc)	`libtealeaf_ffi.so`	Supported
macOS	x64 (Intel)	`libtealeaf_ffi.dylib`	Supported
macOS	ARM64 (Apple Silicon)	`libtealeaf_ffi.dylib`	Supported

.NET Requirements

.NET 8.0 or later
C# compiler with incremental source generator support (for the source generator)

NuGet Package Structure

The NuGet package bundles native libraries for all platforms using the runtimes folder convention:

TeaLeaf.nupkg
├── lib/net8.0/
│   ├── TeaLeaf.dll
│   ├── TeaLeaf.Annotations.dll
│   └── TeaLeaf.Generators.dll
└── runtimes/
    ├── win-x64/native/tealeaf_ffi.dll
    ├── win-arm64/native/tealeaf_ffi.dll
    ├── linux-x64/native/libtealeaf_ffi.so
    ├── linux-arm64/native/libtealeaf_ffi.so
    ├── osx-x64/native/libtealeaf_ffi.dylib
    └── osx-arm64/native/libtealeaf_ffi.dylib

The .NET runtime automatically selects the correct native library based on the host platform.

Native Library Loading

The managed layer uses [DllImport("tealeaf_ffi")] for P/Invoke. The .NET runtime resolves the native library through:

NuGet runtimes folder – automatic for published apps
Application directory – for self-contained deployments
System library path – PATH (Windows), LD_LIBRARY_PATH (Linux), DYLD_LIBRARY_PATH (macOS)

Deployment

Framework-Dependent

dotnet publish -c Release

The native library is copied to the output directory automatically.

Self-Contained

dotnet publish -c Release --self-contained -r win-x64
dotnet publish -c Release --self-contained -r linux-x64
dotnet publish -c Release --self-contained -r osx-arm64

Docker

For Linux containers, use the appropriate runtime:

FROM mcr.microsoft.com/dotnet/runtime:8.0
# Native library is included in the publish output
COPY --from=build /app/publish .

Building Native Libraries from Source

If you need a platform not included in the NuGet package:

# Clone the repository
git clone https://github.com/krishjag/tealeaf.git
cd tealeaf

# Build the FFI library
cargo build --release --package tealeaf-ffi

# Output location
# Windows: target/release/tealeaf_ffi.dll
# Linux:   target/release/libtealeaf_ffi.so
# macOS:   target/release/libtealeaf_ffi.dylib

Place the built library in your application directory or system library path.

Troubleshooting

DllNotFoundException

The native library could not be found. Check:

The package includes your platform (dotnet --info to check RID)
For self-contained apps, ensure the correct -r flag is used
For manual builds, ensure the library is in the application directory

BadImageFormatException

Architecture mismatch between the .NET runtime and native library. Ensure both are the same architecture (x64/ARM64).

EntryPointNotFoundException

Version mismatch between the managed and native libraries. Ensure both are from the same release.

FFI Reference: Overview

The tealeaf-ffi crate exposes a C-compatible API for integrating TeaLeaf into any language that supports C FFI (Foreign Function Interface).

Architecture

┌──────────────────────┐
│  Host Language       │
│  (.NET, Python, etc.)│
├──────────────────────┤
│  FFI Bindings        │
│  (P/Invoke, ctypes)  │
├──────────────────────┤
│  tealeaf_ffi         │  ← C ABI library
│  (cdylib + staticlib)│
├──────────────────────┤
│  tealeaf-core        │  ← Rust core library
└──────────────────────┘

The FFI layer provides:

Document parsing – parse text, files, and JSON
Value access – type-safe accessors for all value types
Binary reader – read .tlbx files with optional memory mapping
Schema introspection – query schema structure at runtime
JSON conversion – to/from JSON
Binary compilation – compile documents to .tlbx
Error handling – thread-local last-error pattern
Memory management – explicit free functions for all allocated resources

Output Libraries

The crate builds both dynamic and static libraries:

Platform	Dynamic Library	Static Library
Windows	`tealeaf_ffi.dll`	`tealeaf_ffi.lib`
Linux	`libtealeaf_ffi.so`	`libtealeaf_ffi.a`
macOS	`libtealeaf_ffi.dylib`	`libtealeaf_ffi.a`

C Header

The build generates a C header via cbindgen:

#include "tealeaf_ffi.h"

// Parse a document
TLDocument* doc = tl_parse("name: alice\nage: 30");
if (!doc) {
    char* err = tl_get_last_error();
    fprintf(stderr, "Error: %s\n", err);
    tl_string_free(err);
    return 1;
}

// Access a value
TLValue* val = tl_document_get(doc, "name");
if (val && tl_value_type(val) == TL_STRING) {
    char* name = tl_value_as_string(val);
    printf("Name: %s\n", name);
    tl_string_free(name);
}

tl_value_free(val);
tl_document_free(doc);

Opaque Types

The FFI uses opaque pointer types:

Type	Description
`TLDocument*`	Parsed document handle
`TLValue*`	Value handle (any type)
`TLReader*`	Binary file reader handle

All handles must be freed with their corresponding _free function.

Error Model

TeaLeaf FFI uses the thread-local last-error pattern:

Functions that can fail return NULL (pointers) or a result struct
On failure, the error message is stored in thread-local storage
Call tl_get_last_error() to retrieve it
Call tl_clear_error() to clear it

TLDocument* doc = tl_parse("invalid {");
if (!doc) {
    char* err = tl_get_last_error();
    // err contains the parse error message
    tl_string_free(err);
}

Null Safety

All FFI functions that accept pointers are null-safe:

Passing NULL returns a safe default (0, false, NULL) rather than crashing
This makes it safe to chain calls without checking each one

Next Steps

API Reference – complete function listing
Memory Management – ownership and freeing rules
Building from Source – compilation instructions

FFI API Reference

Complete listing of all exported FFI functions.

Error Handling

`tl_get_last_error`

char* tl_get_last_error(void);

Returns the last error message, or NULL if no error. Caller must free with tl_string_free.

`tl_clear_error`

void tl_clear_error(void);

Clears the thread-local error state.

Version

`tl_version`

const char* tl_version(void);

Returns the library version string (e.g., "2.0.0-beta.8"). The returned pointer is static – do not free it.

Document API

`tl_parse`

TLDocument* tl_parse(const char* text);

Parse a TeaLeaf text string. Returns NULL on failure (check tl_get_last_error).

`tl_parse_file`

TLDocument* tl_parse_file(const char* path);

Parse a TeaLeaf text file. Returns NULL on failure.

`tl_document_free`

void tl_document_free(TLDocument* doc);

Free a document. Safe to call with NULL.

`tl_document_get`

TLValue* tl_document_get(const TLDocument* doc, const char* key);

Get a value by key. Returns NULL if key not found or doc is NULL. Caller must free with tl_value_free.

`tl_document_keys`

char** tl_document_keys(const TLDocument* doc);

Get all top-level keys as a NULL-terminated array. Caller must free with tl_string_array_free.

`tl_document_to_text`

char* tl_document_to_text(const TLDocument* doc);

Convert document to TeaLeaf text (with schemas). Caller must free with tl_string_free.

`tl_document_to_text_data_only`

char* tl_document_to_text_data_only(const TLDocument* doc);

Convert document to TeaLeaf text (data only, no schemas). Caller must free with tl_string_free.

`tl_document_compile`

TLResult tl_document_compile(const TLDocument* doc, const char* path, bool compress);

Compile document to binary file. Returns a TLResult indicating success or failure.

JSON API

`tl_document_from_json`

TLDocument* tl_document_from_json(const char* json);

Parse a JSON string into a TLDocument. Returns NULL on failure.

`tl_document_to_json`

char* tl_document_to_json(const TLDocument* doc);

Convert document to pretty-printed JSON. Caller must free with tl_string_free.

`tl_document_to_json_compact`

char* tl_document_to_json_compact(const TLDocument* doc);

Convert document to minified JSON. Caller must free with tl_string_free.

Value API

`tl_value_type`

TLValueType tl_value_type(const TLValue* value);

Get the type of a value. Returns TL_NULL (0) if value is NULL.

`tl_value_free`

void tl_value_free(TLValue* value);

Free a value. Safe to call with NULL.

Primitive Accessors

bool    tl_value_as_bool(const TLValue* value);       // false if not bool
int64_t tl_value_as_int(const TLValue* value);        // 0 if not int
uint64_t tl_value_as_uint(const TLValue* value);      // 0 if not uint
double  tl_value_as_float(const TLValue* value);      // 0.0 if not float
char*   tl_value_as_string(const TLValue* value);     // NULL if not string; free with tl_string_free
int64_t tl_value_as_timestamp(const TLValue* value);  // 0 if not timestamp (millis only)
int16_t tl_value_as_timestamp_offset(const TLValue* value); // 0 if not timestamp (tz offset in minutes)

Bytes Accessors

size_t       tl_value_bytes_len(const TLValue* value);   // 0 if not bytes
const uint8_t* tl_value_bytes_data(const TLValue* value); // NULL if not bytes; pointer valid while value lives

Reference/Tag Accessors

char*    tl_value_ref_name(const TLValue* value);   // NULL if not ref; free with tl_string_free
char*    tl_value_tag_name(const TLValue* value);    // NULL if not tagged; free with tl_string_free
TLValue* tl_value_tag_value(const TLValue* value);   // NULL if not tagged; free with tl_value_free

Array Accessors

size_t   tl_value_array_len(const TLValue* value);                 // 0 if not array
TLValue* tl_value_array_get(const TLValue* value, size_t index);   // NULL if out of bounds; free with tl_value_free

Object Accessors

TLValue* tl_value_object_get(const TLValue* value, const char* key); // NULL if not found; free with tl_value_free
char**   tl_value_object_keys(const TLValue* value);                  // NULL-terminated; free with tl_string_array_free

Map Accessors

size_t   tl_value_map_len(const TLValue* value);                    // 0 if not map
TLValue* tl_value_map_get_key(const TLValue* value, size_t index);  // NULL if out of bounds; free with tl_value_free
TLValue* tl_value_map_get_value(const TLValue* value, size_t index);// NULL if out of bounds; free with tl_value_free

Binary Reader API

`tl_reader_open`

TLReader* tl_reader_open(const char* path);

Open a binary file for reading. Returns NULL on failure.

`tl_reader_open_mmap`

TLReader* tl_reader_open_mmap(const char* path);

Open a binary file with memory-mapped I/O (zero-copy). Returns NULL on failure.

`tl_reader_free`

void tl_reader_free(TLReader* reader);

Free a reader. Safe to call with NULL.

`tl_reader_get`

TLValue* tl_reader_get(const TLReader* reader, const char* key);

Get a value by key from binary. Returns NULL if not found. Caller must free with tl_value_free.

`tl_reader_keys`

char** tl_reader_keys(const TLReader* reader);

Get all section keys. Returns NULL-terminated array. Free with tl_string_array_free.

Schema API

size_t tl_reader_schema_count(const TLReader* reader);
char*  tl_reader_schema_name(const TLReader* reader, size_t index);
size_t tl_reader_schema_field_count(const TLReader* reader, size_t schema_index);
char*  tl_reader_schema_field_name(const TLReader* reader, size_t schema_index, size_t field_index);
char*  tl_reader_schema_field_type(const TLReader* reader, size_t schema_index, size_t field_index);
bool   tl_reader_schema_field_nullable(const TLReader* reader, size_t schema_index, size_t field_index);
bool   tl_reader_schema_field_is_array(const TLReader* reader, size_t schema_index, size_t field_index);

All char* returns from schema functions must be freed with tl_string_free. Out-of-bounds indices return NULL/0/false.

Memory Management

`tl_string_free`

void tl_string_free(char* s);

Free a string returned by any FFI function. Safe to call with NULL.

`tl_string_array_free`

void tl_string_array_free(char** arr);

Free a NULL-terminated string array. Frees each string and the array pointer. Safe to call with NULL.

`tl_result_free`

void tl_result_free(TLResult* result);

Free any allocated memory inside a TLResult. Safe to call with NULL.

Type Enum

typedef enum {
    TL_NULL      = 0,
    TL_BOOL      = 1,
    TL_INT       = 2,
    TL_UINT      = 3,
    TL_FLOAT     = 4,
    TL_STRING    = 5,
    TL_BYTES     = 6,
    TL_ARRAY     = 7,
    TL_OBJECT    = 8,
    TL_MAP       = 9,
    TL_REF       = 10,
    TL_TAGGED    = 11,
    TL_TIMESTAMP = 12,
} TLValueType;

Memory Management

The FFI layer uses explicit manual memory management. Understanding ownership rules is critical for writing correct bindings.

Ownership Rules

Rule 1: Caller Owns Returned Pointers

Every function that returns a heap-allocated pointer transfers ownership to the caller. The caller must free it with the appropriate function:

Return Type	Free Function
`TLDocument*`	`tl_document_free()`
`TLValue*`	`tl_value_free()`
`TLReader*`	`tl_reader_free()`
`char*`	`tl_string_free()`
`char**`	`tl_string_array_free()`
`TLResult`	`tl_result_free()`

Rule 2: Borrowed Pointers Are Read-Only

Functions that take const T* parameters borrow the pointer. The FFI layer does not take ownership or free inputs:

// doc is borrowed -- you still own it and must free it later
TLValue* val = tl_document_get(doc, "key");
// ... use val ...
tl_value_free(val);  // free the returned value
tl_document_free(doc);  // free the document separately

Rule 3: Null Is Always Safe

Every free function and every accessor accepts NULL safely:

tl_document_free(NULL);  // no-op
tl_value_free(NULL);     // no-op
tl_string_free(NULL);    // no-op

TLValue* val = tl_document_get(NULL, "key");  // returns NULL
bool b = tl_value_as_bool(NULL);              // returns false

Common Patterns

Parse → Use → Free

TLDocument* doc = tl_parse("name: alice");
if (doc) {
    TLValue* name = tl_document_get(doc, "name");
    if (name) {
        char* str = tl_value_as_string(name);
        if (str) {
            printf("%s\n", str);
            tl_string_free(str);
        }
        tl_value_free(name);
    }
    tl_document_free(doc);
}

Iterating Arrays

TLValue* arr = tl_document_get(doc, "items");
size_t len = tl_value_array_len(arr);

for (size_t i = 0; i < len; i++) {
    TLValue* elem = tl_value_array_get(arr, i);
    // use elem...
    tl_value_free(elem);  // free each element
}

tl_value_free(arr);  // free the array value

Iterating Object Keys

TLValue* obj = tl_document_get(doc, "config");
char** keys = tl_value_object_keys(obj);

if (keys) {
    for (int i = 0; keys[i] != NULL; i++) {
        TLValue* val = tl_value_object_get(obj, keys[i]);
        // use val...
        tl_value_free(val);
    }
    tl_string_array_free(keys);  // frees all strings AND the array
}

tl_value_free(obj);

Iterating Maps

TLValue* map = tl_document_get(doc, "headers");
size_t len = tl_value_map_len(map);

for (size_t i = 0; i < len; i++) {
    TLValue* key = tl_value_map_get_key(map, i);
    TLValue* val = tl_value_map_get_value(map, i);

    char* k = tl_value_as_string(key);
    char* v = tl_value_as_string(val);
    printf("%s: %s\n", k, v);

    tl_string_free(k);
    tl_string_free(v);
    tl_value_free(key);
    tl_value_free(val);
}

tl_value_free(map);

String Arrays

char** keys = tl_document_keys(doc);
if (keys) {
    for (int i = 0; keys[i] != NULL; i++) {
        printf("Key: %s\n", keys[i]);
    }
    tl_string_array_free(keys);  // ONE call frees everything
}

Bytes Data

The tl_value_bytes_data function returns a borrowed pointer valid only while the value lives:

TLValue* val = tl_document_get(doc, "data");
size_t len = tl_value_bytes_len(val);
const uint8_t* data = tl_value_bytes_data(val);

// Copy if you need the data after freeing the value
uint8_t* copy = malloc(len);
memcpy(copy, data, len);

tl_value_free(val);  // data pointer is now invalid
// copy is still valid

Error Strings

Error strings are owned by the caller:

char* err = tl_get_last_error();
if (err) {
    fprintf(stderr, "Error: %s\n", err);
    tl_string_free(err);  // must free
}

Common Mistakes

Mistake	Consequence	Fix
Not freeing returned pointers	Memory leak	Always pair creation with `_free`
Using pointer after free	Use-after-free / crash	Set pointer to NULL after free
Freeing borrowed `bytes_data` pointer	Double-free / crash	Only free with `tl_value_free` on the value
Calling wrong free function	Undefined behavior	Match the free to the allocation type
Freeing strings from string_array individually	Double-free	Use `tl_string_array_free` once

Building from Source

How to build the TeaLeaf FFI library from source.

Prerequisites

Rust toolchain (1.70+)
A C compiler (for cbindgen header generation)

Build

git clone https://github.com/krishjag/tealeaf.git
cd tealeaf
cargo build --release --package tealeaf-ffi

Output Files

Platform	Dynamic Library	Static Library
Windows	`target/release/tealeaf_ffi.dll`	`target/release/tealeaf_ffi.lib`
Linux	`target/release/libtealeaf_ffi.so`	`target/release/libtealeaf_ffi.a`
macOS	`target/release/libtealeaf_ffi.dylib`	`target/release/libtealeaf_ffi.a`

C Header

The build generates a C header via cbindgen (configured in tealeaf-ffi/cbindgen.toml):

# Header is generated during build
# Location: target/tealeaf_ffi.h (or as configured)

Cross-Compilation

Linux ARM64

# Install cross-compilation tools
sudo apt install gcc-aarch64-linux-gnu
rustup target add aarch64-unknown-linux-gnu

# Build
cargo build --release --package tealeaf-ffi --target aarch64-unknown-linux-gnu

Windows ARM64

rustup target add aarch64-pc-windows-msvc
cargo build --release --package tealeaf-ffi --target aarch64-pc-windows-msvc

macOS (from any platform via cross)

# Using cross (https://github.com/cross-rs/cross)
cargo install cross
cross build --release --package tealeaf-ffi --target aarch64-apple-darwin
cross build --release --package tealeaf-ffi --target x86_64-apple-darwin

Linking

Dynamic Linking

# GCC/Clang
gcc -o myapp myapp.c -L/path/to/lib -ltealeaf_ffi

# MSVC
cl myapp.c /link tealeaf_ffi.lib

At runtime, ensure the dynamic library is in the library search path.

Static Linking

# GCC/Clang (Linux)
gcc -o myapp myapp.c /path/to/libtealeaf_ffi.a -lpthread -ldl -lm

# macOS
gcc -o myapp myapp.c /path/to/libtealeaf_ffi.a -framework Security -lpthread

Static linking eliminates the runtime dependency but produces a larger binary.

Dependencies

The FFI crate has minimal dependencies:

[dependencies]
tealeaf-core = { workspace = true }

[build-dependencies]
cbindgen = "0.27"

The resulting library links against:

Linux: libpthread, libdl, libm
macOS: Security.framework, libpthread
Windows: standard Windows system libraries

Writing New Language Bindings

To create bindings for a new language:

Generate or write FFI declarations matching the C header
Load the dynamic library (or link statically)
Wrap opaque pointers in your language’s resource management (destructors, Dispose, __del__, etc.)
Map the error model – check for NULL returns and call tl_get_last_error
Handle string ownership – copy strings to your language’s string type, then free the C string

Example: Python (ctypes)

import ctypes

lib = ctypes.CDLL("libtealeaf_ffi.so")

# Define function signatures
lib.tl_parse.restype = ctypes.c_void_p
lib.tl_parse.argtypes = [ctypes.c_char_p]

lib.tl_document_get.restype = ctypes.c_void_p
lib.tl_document_get.argtypes = [ctypes.c_void_p, ctypes.c_char_p]

lib.tl_value_as_string.restype = ctypes.c_char_p
lib.tl_value_as_string.argtypes = [ctypes.c_void_p]

# Use it
doc = lib.tl_parse(b"name: alice")
val = lib.tl_document_get(doc, b"name")
name = lib.tl_value_as_string(val)
print(name.decode())  # "alice"

lib.tl_value_free(val)
lib.tl_document_free(doc)

Testing

# Run FFI tests
cargo test --package tealeaf-ffi

# Run all workspace tests
cargo test --workspace

LLM Context Engineering

TeaLeaf’s primary use case is context engineering for Large Language Model applications. This guide explains why and how.

The Problem

LLM context windows are limited and expensive. Typical structured data (tool definitions, conversation history, user profiles) consumes tokens proportional to format verbosity:

{
  "messages": [
    {"role": "user", "content": "Hello", "tokens": 2},
    {"role": "assistant", "content": "Hi there!", "tokens": 3},
    {"role": "user", "content": "What's the weather?", "tokens": 5},
    {"role": "assistant", "content": "Let me check...", "tokens": 4}
  ]
}

Every message repeats "role", "content", "tokens". With 50+ messages, this overhead adds up.

The TeaLeaf Approach

@struct Message (role: string, content: string, tokens: int?)

messages: @table Message [
  (user, Hello, 2),
  (assistant, "Hi there!", 3),
  (user, "What's the weather?", 5),
  (assistant, "Let me check...", 4),
]

Field names defined once. Data is positional. For 50 messages, this saves ~40% in text size and ~80% in binary.

Context Assembly Pattern

Define Schemas for Your Context

@struct Tool (name: string, description: string, params: []string)
@struct Message (role: string, content: string, tokens: int?)
@struct UserProfile (id: int, name: string, preferences: []string)

system_prompt: """
  You are a helpful assistant with access to the user's profile
  and conversation history. Use the tools when appropriate.
"""

user: @table UserProfile [
  (42, "Alice", ["concise_responses", "code_examples"]),
]

tools: @table Tool [
  (search, "Search the web for information", ["query"]),
  (calculate, "Evaluate a mathematical expression", ["expression"]),
  (weather, "Get current weather for a location", ["city", "country"]),
]

history: @table Message [
  (user, Hello, 2),
  (assistant, "Hi there! How can I help?", 7),
]

Binary Caching

Compiled .tlbx files make excellent context caches:

#![allow(unused)]
fn main() {
use tealeaf::{TeaLeafBuilder, ToTeaLeaf};

// Build context document
let doc = TeaLeafBuilder::new()
    .add_value("system_prompt", Value::String(system_prompt))
    .add_vec("tools", &tools)
    .add_vec("history", &messages)
    .add("user", &user_profile)
    .build();

// Cache as binary (fast to read back)
doc.compile("context_cache.tlbx", true)?;

// Later: load instantly from binary
let cached = tealeaf::Reader::open("context_cache.tlbx")?;
}

Sending to LLM

Convert to text for LLM consumption:

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("context.tl")?;
let context_text = doc.to_tl_with_schemas();
// Send context_text as part of the prompt
}

Or convert specific sections:

#![allow(unused)]
fn main() {
let doc = TeaLeaf::load("context.tl")?;
let json = doc.to_json()?;
// Use JSON for APIs that expect it
}

Size Comparison: Real-World Context

For a typical LLM context with 50 messages, 10 tools, and a user profile:

Format	Approximate Size
JSON	~15 KB
TeaLeaf Text	~8 KB
TeaLeaf Binary	~4 KB
TeaLeaf Binary (compressed)	~3 KB

Token savings are significant but less than byte savings. BPE tokenizers partially compress repeated JSON field names, so byte savings overstate token savings by 5-18 percentage points depending on data repetitiveness. For typical structured data, expect ~36% fewer data tokens (median), with savings increasing for larger and more structured datasets.

Token Comparison (verified via OpenAI tokenizer)

Dataset	JSON tokens	TeaLeaf tokens	Savings
Healthcare records	903	572	37%
Retail orders	9,829	5,632	43%

At the API level, prompt instructions are identical for both formats, diluting data-only savings (~36%) to ~30% of total input tokens.

Structured Outputs

LLMs can also produce TeaLeaf-formatted responses:

@struct Insight (category: string, finding: string, confidence: float)

analysis: @table Insight [
  (revenue, "Q4 revenue grew 15% YoY", 0.92),
  (churn, "Customer churn decreased by 3%", 0.87),
  (forecast, "Projected 20% growth in Q1", 0.73),
]

This can then be parsed and processed programmatically:

#![allow(unused)]
fn main() {
let response = TeaLeaf::parse(&llm_output)?;
if let Some(Value::Array(insights)) = response.get("analysis") {
    for insight in insights {
        // Process each structured insight
    }
}
}

Best Practices

Define schemas for all structured context – tool definitions, messages, profiles
Use @table for arrays of uniform objects – conversation history, search results
Cache compiled binary for frequently-used context segments
Use text format for LLM input – models understand the schema notation
String deduplication helps when context has repetitive strings (roles, tool names)
Separate static and dynamic context – compile static context once, merge at runtime

Benchmark Results

The accuracy-benchmark suite tests 12 tasks across 10 business domains on Claude Sonnet 4.5 and GPT-5.2:

~36% fewer data tokens compared to JSON (savings increase with larger datasets)
No accuracy loss – scores within noise across all providers
See the benchmark README for full methodology and results.

Schema Evolution

TeaLeaf takes a deliberately simple approach to schema evolution: when schemas change, recompile.

Design Philosophy

No migration machinery – no schema versioning or compatibility negotiation
Source file is master – the .tl file defines the current schema
Explicit over implicit – tuples require values for all fields
Binary is a compiled artifact – regenerate it like you would a compiled binary

Compatible Changes

These changes do not require recompilation of existing binary files:

Rename Fields

Field data is stored positionally. Names are documentation only:

# Before
@struct user (name: string, email: string)

# After -- binary still works
@struct user (full_name: string, email_address: string)

Widen Types

Automatic safe widening when reading:

# Before: field was int8
@struct sensor (id: int8, reading: float32)

# After: widened to int32 -- readers auto-widen
@struct sensor (id: int, reading: float)

Widening path: int8 → int16 → int32 → int64, float32 → float64

Incompatible Changes

These changes require recompilation from the .tl source:

Add a Field

# Before
@struct user (id: int, name: string)

# After -- added email field
@struct user (id: int, name: string, email: string?)

Old binary files won’t have the new field. Recompile:

tealeaf compile users.tl -o users.tlbx

Remove a Field

# Before
@struct user (id: int, name: string, legacy_field: string)

# After -- removed legacy_field
@struct user (id: int, name: string)

Reorder Fields

Binary data is positional. Changing field order changes the meaning of stored data:

# Before
@struct point (x: int, y: int)

# After -- DON'T DO THIS without recompiling
@struct point (y: int, x: int)

Narrow Types

Narrowing (e.g., int64 → int8) can lose data:

# Before
@struct data (value: int64)

# After -- potential data loss
@struct data (value: int8)

Recompilation Workflow

When schemas change:

# 1. Edit the .tl source file
# 2. Validate
tealeaf validate data.tl

# 3. Recompile
tealeaf compile data.tl -o data.tlbx

# 4. Verify
tealeaf info data.tlbx

Migration Strategy

For applications that need to handle schema changes:

Approach 1: Version Keys

Use different top-level keys for different schema versions:

@struct user_v1 (id: int, name: string)
@struct user_v2 (id: int, name: string, email: string?, role: string)

# Old data
users_v1: @table user_v1 [(1, alice), (2, bob)]

# New data
users_v2: @table user_v2 [(3, carol, "carol@ex.com", admin)]

Approach 2: Application-Level Migration

Read old binary, transform in code, write new binary:

#![allow(unused)]
fn main() {
// Read old binary format
let old_doc = tealeaf::Reader::open("data_v1.tlbx")?;

// Transform
let new_doc = TeaLeafBuilder::new()
    .add_vec("users", &migrate_users(&old_doc.get("users")?))
    .build();

// Write new format
new_doc.compile("data_v2.tlbx", true)?;
}

Approach 3: Nullable Fields

Add new fields as nullable to maintain backward compatibility:

@struct user (
  id: int,
  name: string,
  email: string?,    # new field, nullable
  phone: string?,    # new field, nullable
)

Old data can have ~ for new fields. New data populates them.

Comparison with Other Formats

Aspect	TeaLeaf	Protobuf	Avro
Schema location	Inline in data file	External `.proto`	Embedded in binary
Adding fields	Recompile	Compatible (field numbers)	Compatible (defaults)
Removing fields	Recompile	Compatible (skip unknown)	Compatible (skip)
Migration tool	None (recompile)	protoc	Schema registry
Complexity	Low	Medium	High

TeaLeaf trades automatic evolution for simplicity. If your use case requires frequent schema changes across distributed systems, consider Protobuf or Avro.

Performance

Performance characteristics of TeaLeaf across different operations.

Size Efficiency

Benchmark Results

Format	Small Object	10K Points	1K Users
JSON	1.00x	1.00x	1.00x
Protobuf	0.38x	0.65x	0.41x
MessagePack	0.35x	0.63x	0.38x
TeaLeaf Binary	3.56x	0.15x	0.47x

Analysis

Small objects: TeaLeaf has a 64-byte header overhead. For objects under ~200 bytes, JSON or MessagePack are more compact.
Large arrays: TeaLeaf’s string deduplication and schema-based compression shine. For 10K+ records, TeaLeaf achieves 6-7x better compression than JSON.
Medium datasets (1K records): TeaLeaf is competitive with Protobuf, with the advantage of embedded schemas.

Where Size Matters Most

Scenario	Recommendation
< 100 bytes payload	Use MessagePack or raw JSON
1-10 KB	TeaLeaf text or JSON (overhead amortized)
10 KB - 1 MB	TeaLeaf binary with compression
> 1 MB	TeaLeaf binary with compression (best gains)

Parse/Decode Speed

TeaLeaf’s dynamic key-based access is ~2-5x slower than Protobuf’s generated code:

Operation	TeaLeaf	Protobuf	JSON (serde)
Parse text	Moderate	N/A	Fast
Decode binary	Moderate	Fast	N/A
Random key access	O(1) hash	O(1) field	O(n) parse
Full iteration	Moderate	Fast	Fast

Why TeaLeaf Is Slower Than Protobuf

Dynamic dispatch – TeaLeaf resolves fields by name at runtime; Protobuf uses generated code with known offsets
String table lookup – each string access requires a table lookup
Schema resolution – schema structure is parsed from binary at load time

When This Matters

Hot loops decoding millions of records → consider Protobuf
Cold reads or moderate throughput → TeaLeaf is fine
Size-constrained transmission → TeaLeaf’s smaller binary compensates for slower decode

Memory-Mapped Reading

For large binary files, use memory-mapped I/O:

#![allow(unused)]
fn main() {
// Rust
let reader = Reader::open_mmap("large_file.tlbx")?;
}

// .NET
using var reader = TLReader.OpenMmap("large_file.tlbx");

Benefits:

No upfront allocation – data loaded on demand by the OS
Shared pages – multiple processes can read the same file
Lazy loading – only accessed sections are read from disk

Compilation Performance

Compiling .tl to .tlbx:

Input Size	Compile Time (approximate)
1 KB	< 1 ms
100 KB	~10 ms
1 MB	~100 ms
10 MB	~1 second

Compression adds ~20-50% to compile time but can reduce output size by 50-90%.

Optimization Tips

1. Use Schemas for Tabular Data

Schema-bound @table data gets optimal encoding:

Positional storage (no field name repetition)
Null bitmaps (1 bit per nullable field vs full null markers)
Type-homogeneous arrays

2. Enable Compression for Large Files

Compression is most effective for:

Sections larger than 64 bytes
Data with repeated string values
Numeric arrays with patterns

tealeaf compile data.tl -o data.tlbx  # compression on by default

3. Use Binary Format for Storage

Text is for authoring; binary is for storage and transmission:

Text (.tl) → Author, review, version control
Binary (.tlbx) → Deploy, cache, transmit

4. Cache Compiled Binary

For data that’s read frequently but written rarely:

#![allow(unused)]
fn main() {
// Compile once
doc.compile("cache.tlbx", true)?;

// Read many times (fast)
let reader = Reader::open_mmap("cache.tlbx")?;
}

5. Minimize String Diversity

String deduplication works best when values repeat:

Enum-like fields ("active", "inactive") → deduplicated
UUIDs or timestamps → each is unique, no deduplication benefit

6. Use the Right Integer Sizes

The writer auto-selects the smallest representation, but schema types guide encoding:

@struct sensor (
  id: uint16,       # 2 bytes instead of 4
  reading: float32, # 4 bytes instead of 8
  flags: uint8,     # 1 byte instead of 4
)

Round-Trip Fidelity

Understanding which conversion paths preserve data perfectly and where information can be lost.

Round-Trip Matrix

Path	Data Preserved	Lost
`.tl` → `.tlbx` → `.tl`	All data and schemas	Comments, formatting
`.tl` → `.json` → `.tl`	Basic types (string, number, bool, null, array, object)	Schemas, comments, refs, tags, maps, timestamps, bytes
`.tl` → `.tlbx` → `.json`	Same as `.tl` → `.json`	Same losses
`.json` → `.tl` → `.json`	All JSON-native types	(generally lossless)
`.json` → `.tlbx` → `.json`	All JSON-native types	(generally lossless)
`.tlbx` → `.tlbx` (recompile)	All data	(lossless)

Lossless: Text ↔ Binary

The text-to-binary-to-text round-trip preserves all data and schema information:

tealeaf compile original.tl -o compiled.tlbx
tealeaf decompile compiled.tlbx -o roundtrip.tl
tealeaf compile roundtrip.tl -o roundtrip.tlbx
# compiled.tlbx and roundtrip.tlbx contain equivalent data

What’s lost:

Comments (stripped during compilation)
Whitespace and formatting
The decompiled output may have different formatting than the original

What’s preserved:

All schemas (@struct definitions)
All values (every type)
Key ordering
Schema-typed data (table structure)

Lossy: TeaLeaf → JSON

JSON cannot represent all TeaLeaf types. The following conversions are one-way:

Timestamps → Strings

created: 2024-01-15T10:30:00Z

JSON output:

{"created": "2024-01-15T10:30:00.000Z"}

Reimporting: the ISO 8601 string becomes a plain String, not a Timestamp.

Maps → Arrays

headers: @map {200: "OK", 404: "Not Found"}

JSON output:

{"headers": [[200, "OK"], [404, "Not Found"]]}

Reimporting: becomes a plain nested array, not a Map.

References → Objects

!ref: {x: 1, y: 2}
point: !ref

JSON output:

{"point": {"$ref": "ref"}}

Reimporting: becomes a plain object with $ref key, not a Ref.

Tagged Values → Objects

event: :click {x: 100, y: 200}

JSON output:

{"event": {"$tag": "click", "$value": {"x": 100, "y": 200}}}

Reimporting: becomes a plain object, not a Tagged.

Bytes → Hex Strings (JSON only)

Bytes round-trip losslessly within TeaLeaf text format using b"..." literals:

data: b"cafef00d"

However, JSON export converts bytes to hex strings:

{"data": "0xcafef00d"}

Reimporting from JSON: becomes a plain string, not bytes.

Schemas → Lost

@struct user (id: int, name: string)
users: @table user [(1, alice), (2, bob)]

JSON output:

{"users": [{"id": 1, "name": "alice"}, {"id": 2, "name": "bob"}]}

The @struct definition is not represented in JSON. However, from-json can re-infer schemas from uniform arrays.

Bytes and Text Format

Bytes now round-trip losslessly through text format using the b"..." literal:

Binary (bytes value) → Decompile → Text (b"..." literal) → Compile → Binary (bytes value)

The decompiler emits b"cafef00d" for bytes values, and the parser reads them back as Value::Bytes.

Ensuring Lossless Round-Trips

Use Binary for Storage

If you need to preserve all TeaLeaf types (refs, tags, maps, timestamps, bytes), keep data in .tlbx:

# Lossless cycle
tealeaf compile data.tl -o data.tlbx
tealeaf decompile data.tlbx -o data.tl
# data.tl preserves all types (except comments)

Use JSON Only for Interop

JSON conversion is for integrating with JSON-based tools. Don’t use it as a primary storage format if your data uses TeaLeaf-specific types.

Verify with CLI

# Compile → JSON two ways, compare
tealeaf to-json data.tl -o from_text.json
tealeaf compile data.tl -o data.tlbx
tealeaf tlbx-to-json data.tlbx -o from_binary.json
# from_text.json and from_binary.json should be identical

Type Preservation Summary

TeaLeaf Type	Binary Round-Trip	JSON Round-Trip
Null	Lossless	Lossless
Bool	Lossless	Lossless
Int	Lossless	Lossless
UInt	Lossless	Lossless (as number)
Float	Lossless	Lossless
String	Lossless	Lossless
Bytes	Lossless	Lossy (→ hex string)
Array	Lossless	Lossless
Object	Lossless	Lossless
Map	Lossless	Lossy (→ array of pairs)
Ref	Lossless	Lossy (→ `$ref` object)
Tagged	Lossless	Lossy (→ `$tag`/`$value` object)
Timestamp	Lossless	Lossy (→ ISO 8601 string)
Schemas	Lossless	Lost (re-inferred on import)
Comments	Lost (stripped)	Lost

Architecture Decision Records

This section documents significant architecture decisions made in the TeaLeaf project. Each record captures the context, decision, and consequences of a choice that affects the project’s design or implementation.

ADR Index

ADR	Title	Status	Date
ADR-0001	Use IndexMap for Insertion Order Preservation	Accepted	2026-02-05
ADR-0002	Fuzzing Architecture and Strategy	Accepted	2026-02-06
ADR-0003	Maximum Nesting Depth Limit (256)	Accepted	2026-02-06
ADR-0004	ZLIB Compression for Binary Format	Accepted	2026-02-06

What is an ADR?

An Architecture Decision Record (ADR) is a short document that captures an important architectural decision along with its context and consequences. ADRs help future contributors understand why certain design choices were made, not just what was built.

ADR Lifecycle

Each ADR has one of the following statuses:

Proposed — Under discussion, not yet implemented
Accepted — Approved and implemented (or in progress)
Superseded — Replaced by a newer ADR (linked in the record)
Deprecated — No longer applicable due to project changes

ADR-0001: Use IndexMap for Insertion Order Preservation

Status: Accepted
Date: 2026-02-05
Applies to: tealeaf-core, tealeaf-derive, tealeaf-ffi

Context

TeaLeaf’s primary use case is context engineering for LLM applications, where structured data passes through multiple format conversions (JSON → .tl → .tlbx and back). Users intentionally order their JSON keys to convey semantic meaning — for example, placing name before description before details to mirror how a human would read the document. Prior to this change, all user-facing maps used HashMap<K, V>, and the text serializer and binary writer explicitly sorted keys alphabetically before output.

This caused two problems:

Semantic ordering was lost. A user who wrote {"zebra": 1, "apple": 2} in their JSON would get {"apple": 2, "zebra": 1} after a round-trip through TeaLeaf. For LLM prompt engineering, this reordering could change how models interpret the context.
Sorting was unnecessary work. Every serialization path (dumps(), compile(), write_value(), to_tl_with_schemas()) collected keys into a Vec, sorted them, and then iterated — adding O(n log n) overhead to every output operation.

Alternatives Considered

Approach	Pros	Cons
Keep HashMap + sort (status quo)	Deterministic output, no dependency change	Loses user intent, sorting overhead
Vec of (key, value) pairs	Order preserved, no new dependency	Loses O(1) key lookup, breaks API surface broadly
IndexMap	Order preserved, O(1) lookup, drop-in API	Slightly slower decode (insertion cost), new dependency
BTreeMap	Sorted + deterministic	Still not insertion-ordered, lookup O(log n)

Decision

Replace HashMap with IndexMap (from the indexmap crate v2) in all user-facing ordered containers:

Value::Object → ObjectMap<String, Value> (type alias for IndexMap)
TeaLeaf.data, TeaLeaf.schemas, TeaLeaf.unions → IndexMap<String, _>
Parser output, Reader.sections, trait return types → IndexMap

Internal lookup tables stay as HashMap because they don’t need ordering:

Writer.string_map, Writer.schema_map, Writer.union_map
Reader.schema_map, Reader.union_map, Reader.cache

Additionally:

Enable serde_json’s preserve_order feature so JSON parsing also preserves key order
Remove all explicit keys.sort() calls from serialization paths
Re-export IndexMap and ObjectMap from tealeaf-core so derive macros and downstream crates don’t need a direct indexmap dependency

Consequences

Positive

Round-trip fidelity. JSON → TeaLeaf → JSON now preserves the original key order at every level (sections, object fields, schema definitions).
Encoding is faster. Removing O(n log n) sort calls from every serialization path yields measurable improvements in encode benchmarks (6–17% for small/medium objects).
Simpler serialization code. Serialization loops iterate the map directly instead of collecting-sorting-iterating.
Binary format is unchanged. Old .tlbx files remain fully readable. The reader always produces keys in file order, which for old files happens to be alphabetical.

Negative

Binary decode is slower. IndexMap::insert() is slower than HashMap::insert() because it maintains a dense insertion-order array alongside the hash table. Benchmarks show +56% to +105% regression for decode-heavy workloads (large arrays of objects, deeply nested structs). For the primary use case (LLM context), this is acceptable because:
- Documents are typically encoded once and consumed as text (not repeatedly decoded from binary)
- The absolute times remain in the microsecond-to-millisecond range
- Encode performance (the more common hot path) improved
New dependency. indexmap v2 is a well-maintained, widely-used crate (used by serde_json internally), so supply-chain risk is minimal.
Public API change. TeaLeaf::new() now takes IndexMap instead of HashMap. This is a breaking change, mitigated by:
- The project is in beta (2.0.0-beta.2)
- From<HashMap<String, Value>> for Value conversion is retained for backward compatibility
- Downstream code using .get(), .insert(), .iter() works identically

Benchmark Summary

Workload	Encode	Decode
small_object	-16% (faster)	—
nested_structs	-10% to -17% (faster)	+56% to +68% (slower)
large_array_10000	-5% (faster)	+105% (slower)
tabular_5000	-69% (faster)	-48% (faster)

Note: Tabular workloads use struct-array encoding (columnar), which has fewer per-row IndexMap insertions. The decode regression is concentrated in generic object decoding where each row creates a new ObjectMap with field-by-field inserts.

References

indexmap crate documentation
serde_json preserve_order feature
Implementation PR: HashMap → IndexMap migration across 16+ files

ADR-0002: Fuzzing Architecture and Strategy

Status: Accepted
Date: 2026-02-06
Applies to: tealeaf-core

Context

TeaLeaf is a data format with multiple serialization paths: text parsing, text serialization, binary compilation, binary reading, and JSON import/export. Each path accepts untrusted input in production scenarios (user-supplied .tl files, .tlbx binaries, JSON strings from external APIs). Malformed or adversarial input must never cause undefined behavior, panics in non-roundtrip code paths, or memory safety violations.

The project already had unit tests, canonical fixture tests, and adversarial tests (hand-crafted malformed inputs). However, these approaches have inherent limitations:

Unit/fixture tests are author-biased. They test cases the developer thought of, missing emergent edge cases from format interactions (e.g., deeply nested structures with unicode escapes inside hex-prefixed numbers).
Adversarial tests are finite. The hand-crafted corpus in adversarial-tests/ covers known attack patterns but cannot explore the combinatorial input space.
Round-trip fidelity is hard to test exhaustively. The property “serialize then parse produces the same value” requires testing across all Value variants, nesting depths, and string content — a space too large for manual enumeration.

Alternatives Considered

Approach	Pros	Cons
Property-based testing (proptest/quickcheck)	Integrated into `cargo test`, structure-aware	Limited mutation depth, no coverage feedback, deterministic
AFL++	Mature, multiple mutation strategies	Requires instrumentation harness, harder CI integration on GitHub Actions
cargo-fuzz (libFuzzer)	Native Rust support, coverage-guided, dictionary support, easy CI	Requires nightly toolchain, Linux-only
Honggfuzz	Hardware-assisted coverage	Less Rust ecosystem integration, complex setup

Decision

Use cargo-fuzz (libFuzzer) with a three-layer fuzzing strategy:

Layer 1: Byte-level fuzzing (6 targets)

Coverage-guided mutation of raw bytes, testing each attack surface independently:

Target	Input	Tests
`fuzz_parse`	Raw bytes as TL text	Parser robustness against arbitrary byte sequences
`fuzz_serialize`	Raw bytes as TL text	Parse then re-serialize roundtrip fidelity
`fuzz_roundtrip`	Raw bytes as TL text	Full text → parse → serialize → re-parse → value equality
`fuzz_reader`	Raw bytes as `.tlbx` binary	Binary reader robustness against malformed files
`fuzz_json`	Raw bytes as JSON string	JSON import → TL export → re-import roundtrip
`fuzz_json_schemas`	Raw bytes as JSON string	JSON import with schema inference → roundtrip

Layer 2: Dictionary-guided fuzzing

libFuzzer dictionaries provide grammar-aware tokens that seed the mutation engine, dramatically improving coverage for structured formats where random bytes rarely produce valid syntax:

Dictionary	Used by	Key tokens
`tl.dict`	`fuzz_parse`, `fuzz_serialize`, `fuzz_roundtrip`	Keywords (`true`, `false`, `null`, `NaN`, `inf`), directives (`@struct`, `@table`, `@union`), type names, escape sequences, boundary numbers, timestamp patterns
`json.dict`	`fuzz_json`, `fuzz_json_schemas`	JSON delimiters, escape sequences, surrogate pair markers, serde_json magic strings, boundary numbers

Measured coverage impact (30-second fresh corpus):

Target	Without dict	With dict	Improvement
`fuzz_parse`	1790 edges	1922 edges	+7.4%
`fuzz_json`	1339 edges	1533 edges	+14.5%

Layer 3: Structure-aware fuzzing (1 target)

The fuzz_structured target bypasses the parser entirely, generating valid Value trees directly from fuzzer bytes using the arbitrary crate. This tests serialization and binary compilation paths with guaranteed-valid inputs that would take byte-level fuzzers much longer to discover:

Bounded recursion (max depth 3) prevents stack overflow
13 Value variants including JsonNumber, Tagged, Ref, Map, Bytes
Three roundtrip tests per invocation: text serialize/parse, binary compile/read, JSON no-panic
Reaches 2464 coverage edges in just 733 runs (vs thousands of runs for byte-level targets)

Fuzz infrastructure layout

tealeaf-core/fuzz/
  Cargo.toml              # Fuzz workspace with libfuzzer-sys + arbitrary
  fuzz_targets/
    fuzz_parse.rs         # Layer 1: text parser robustness
    fuzz_serialize.rs     # Layer 1: text roundtrip
    fuzz_roundtrip.rs     # Layer 1: full text roundtrip with value equality
    fuzz_reader.rs        # Layer 1: binary reader robustness
    fuzz_json.rs          # Layer 1: JSON import roundtrip
    fuzz_json_schemas.rs  # Layer 1: JSON with schema inference roundtrip
    fuzz_structured.rs    # Layer 3: structure-aware value generation
  dictionaries/
    tl.dict               # Layer 2: TL text format tokens
    json.dict             # Layer 2: JSON format tokens
  corpus/                 # Persistent corpus (per-target subdirectories)
  artifacts/              # Crash artifacts (per-target subdirectories)

CI integration

Fuzz targets run on GitHub Actions ubuntu-latest (2-core, 7 GB RAM) with the following constraints:

120 seconds per target (coverage saturates within ~30 seconds; 120s provides buffer for deeper exploration)
Serial execution — targets run one at a time to avoid memory pressure (each can use up to 512 MB RSS)
RSS limit: 512 MB per target
Dictionary-guided runs for text and JSON targets
Nightly Rust toolchain required (libFuzzer instrumentation)
Total wall time: ~15 minutes (7 targets × 120s + build overhead)

Value equality semantics

All roundtrip targets use a custom values_equal() function rather than PartialEq to handle expected coercions:

Int(n) == UInt(n) when n >= 0 (sign-agnostic integer comparison)
JsonNumber(s) == Int(i) when s parses to i (precision-preserving numbers may roundtrip as integers if they fit)
Float comparison uses to_bits() for exact bit-level equality (distinguishes +0.0 from -0.0, handles NaN)

Consequences

Positive

Discovered real bugs. Fuzzing found a NaN quoting bug (NaN roundtripped as Float(NaN) instead of being preserved through text format) and the precision loss that motivated Value::JsonNumber.
Continuous regression detection. CI runs catch regressions in parser/serializer correctness automatically on every push.
Coverage-guided exploration. libFuzzer’s coverage feedback explores code paths that hand-written tests miss, particularly in error handling and edge case branches.
Dictionary tokens accelerate exploration. Measured 7-14% coverage improvement with dictionaries, at zero runtime cost (dictionaries only seed the mutation engine).
Structure-aware fuzzing tests serializer independently. By generating valid Value trees directly, fuzz_structured achieves deep serializer coverage without depending on parser correctness.

Negative

Nightly Rust toolchain required. cargo-fuzz requires nightly for -Z flags and sanitizer instrumentation. This is isolated to the fuzz workspace and does not affect the main build.
Linux-only. libFuzzer doesn’t support Windows natively. Local fuzzing requires WSL on Windows; CI uses Ubuntu runners.
CI time cost. ~15 minutes per run. Acceptable for a post-push check; not suitable for pre-commit.
Corpus growth. The persistent corpus grows over time as new coverage-increasing inputs are discovered. Periodic corpus minimization (cargo fuzz cmin) is recommended.

Not covered

Protocol-level fuzzing. The FFI boundary (tealeaf-ffi) is not fuzzed directly. FFI functions are thin wrappers around the core library, which is fuzzed.
.NET binding fuzzing. The .NET layer is tested through its own test suite and the adversarial harness, but not through libFuzzer.
Concurrency testing. All fuzz targets are single-threaded. Thread-safety of Reader (which uses mmap) is tested separately.

References

cargo-fuzz documentation
libFuzzer documentation
libFuzzer dictionary format
arbitrary crate for structure-aware fuzzing

ADR-0003: Maximum Nesting Depth Limit (256)

Status: Accepted
Date: 2026-02-06
Applies to: tealeaf-core (parser, binary reader)

Context

TeaLeaf accepts untrusted input in production — user-supplied .tl files, .tlbx binaries from external sources, and JSON strings from APIs. Recursive data structures (arrays, objects, maps, tagged values) create call stacks proportional to input nesting depth. Without a limit, an attacker can craft a payload like key: [[[[... with thousands of levels, causing a stack overflow and process termination.

Two constants enforce the limit:

Constant	File	Value
`MAX_PARSE_DEPTH`	`parser.rs`	256
`MAX_DECODE_DEPTH`	`reader.rs`	256

Both constants are set to the same value to ensure text-binary parity: any document that parses successfully from .tl text can also round-trip through .tlbx binary without hitting a different depth ceiling.

The limit is checked at every recursive entry point:

Parser: parse_value() — arrays, objects, maps, tuples, tagged values
Reader: decode_value(), decode_array(), decode_object(), decode_struct(), decode_struct_array(), decode_map()

When exceeded, both paths return a descriptive error rather than panicking or overflowing the stack.

Ecosystem Comparison

Parser / Library	Default Max Depth	Configurable?
TeaLeaf	256	No (compile-time constant)
serde_json (Rust)	128	Yes (`disable_recursion_limit`)
serde_yaml (Rust)	128	No
System.Text.Json (.NET)	64	Yes (`MaxDepth`)
ASP.NET Core (default)	32	Yes
Jackson (Java)	1000 (v2), 500 (v3)	Yes
Go encoding/json	10,000	No
Python json (stdlib)	~1,000 (interpreter limit)	Via `sys.setrecursionlimit`
Protocol Buffers (Java/C++)	100	Yes
Protocol Buffers (Go)	10,000	Yes
rmp-serde (MessagePack)	1,024	Yes
CBOR (ciborium, Rust)	128	Yes
toml (Rust)	None	No (vulnerable to stack overflow)

Observations

Conservative defaults are trending down. Jackson reduced from 1,000 to 500 in v3. .NET defaults to 64. Protocol Buffers targets 100.
128 is the most common Rust ecosystem default (serde_json, serde_yaml, ciborium).
No production data format needs > 100 levels. Deeply nested structures indicate either machine-generated intermediate representations or adversarial input.
Formats without limits have CVEs. The toml crate’s lack of depth limiting is tracked as an open issue. Python’s reliance on interpreter limits has caused production crashes.

Decision

Set MAX_PARSE_DEPTH and MAX_DECODE_DEPTH to 256.

Why 256 over 128?

TeaLeaf schemas add implicit nesting. A @struct with an array of @struct-typed objects creates 3 levels of nesting (object → array → object) for what the user perceives as one level of structure. With schema compositions, 128 could be reached in complex but legitimate documents. 256 provides a 2x margin above the Rust ecosystem default while remaining well within safe stack bounds.

Why not configurable?

Simplicity. A compile-time constant is zero-cost at runtime (no configuration plumbing, no state to manage).
Consistent behavior. All TeaLeaf implementations (Rust, FFI, .NET) enforce the same limit. A configurable limit would require coordination across language boundaries.
256 is generous enough. No known use case requires deeper nesting. If a legitimate need arises, the constant can be bumped in a patch release without breaking any public API.

Stack safety margin

On x86-64 Linux with the default 8 MB stack, each recursive call uses roughly 200–400 bytes of stack frame. At 256 depth, the worst case is ~100 KB — well under 2% of the available stack. This leaves ample room for the caller’s own stack frames and for platforms with smaller stacks (e.g., 1 MB thread stacks).

Test Coverage

Test	Location	What it verifies
`test_parse_depth_256_succeeds`	`parser.rs`	200-level nesting parses successfully
`test_fuzz_deeply_nested_arrays_no_stack_overflow`	`parser.rs`	500-level nesting returns error (no crash)
`parse_deep_nesting_ok`	`adversarial.rs`	7-level nesting succeeds in adversarial harness
`fuzz_structured` depth=3	`fuzz_structured.rs`	Structure-aware fuzzer bounds depth to 3
`canonical/large_data.tl`	Canonical suite	Deep nesting fixture round-trips correctly

Consequences

Positive

Stack overflow protection. Malicious or malformed input with extreme nesting is rejected with a clear error message instead of crashing the process.
Text-binary parity. The same limit in parser and reader means any document that parses from text will also decode from binary, and vice versa.
Predictable resource usage. Callers can reason about maximum stack consumption without inspecting input.

Negative

Theoretical limitation. Documents with more than 256 levels of nesting are rejected. In practice, no known data format use case requires this depth.
Not configurable. Users who need deeper nesting must rebuild from source with a modified constant. This is an intentional trade-off for simplicity.

Neutral

No performance cost. The depth check is a single integer comparison per recursive call — unmeasurable relative to the cost of decoding a value.

ADR-0004: ZLIB Compression for Binary Format

Status: Accepted
Date: 2026-02-06
Applies to: tealeaf-core (writer, reader), spec §4.3 and §4.9

Context

The .tlbx binary format compresses individual sections to reduce file size. The implementation has always used ZLIB (deflate) via the flate2 crate. However, the spec contained a contradiction:

§4.3 (Header Flags) described the COMPRESS flag as indicating “zstd compression” and required readers to detect compression via the zstd frame magic (0xFD2FB528).
§4.9 (Compression) correctly stated the algorithm as “ZLIB (deflate)”.

This contradiction meant a third-party implementation following §4.3 would look for zstd-compressed data that doesn’t exist, while one following §4.9 would work correctly. The spec needed a single, definitive answer.

Decision

Standardize on ZLIB (deflate) as the sole compression algorithm for .tlbx binary format v2.

Why not zstd?

zstd is a superior algorithm in general-purpose benchmarks, but TeaLeaf’s design neutralizes its advantages:

String deduplication removes the most compressible data. The string table deduplicates all strings before compression runs. What remains for the compressor is packed integers, null bitmaps, and string table indices — low-entropy binary data with little redundancy.
Sections are small. The compression threshold is 64 bytes. Most sections are a few hundred bytes to a few KB. At these sizes, zlib and zstd achieve nearly identical compression ratios without dictionaries.
zstd’s dictionary mode doesn’t help here. Dictionary compression — where zstd’s largest advantage lies for small payloads — requires pre-training on representative data. TeaLeaf documents are schema-variable and content-diverse (the primary use case is LLM context engineering with arbitrary structured data). A static dictionary would not generalize across different schemas and data shapes.
The 90% threshold filters aggressively. Sections that don’t compress to under 90% of their original size are stored uncompressed. This threshold means most small sections aren’t compressed at all, making the algorithm choice irrelevant for the majority of sections.
Decompression speed is irrelevant at this scale. zstd decompresses 3-5x faster than zlib, but a few-hundred-byte section decompresses in microseconds with either algorithm. The difference is unmeasurable in practice.

Why zlib?

Universal availability. ZLIB/deflate is implemented in every language’s standard library or a widely-available package. zstd requires an additional native dependency in most ecosystems.
No breaking change. Every .tlbx file ever produced uses zlib. Switching would require either a format version bump (breaking all existing files) or dual-algorithm detection logic (complexity for every implementation).
Simpler for third-party implementations. One algorithm, no magic-byte detection, no conditional dependency. A conformant reader needs only zlib decompression.
Compression is not the primary size reduction strategy. TeaLeaf’s token efficiency comes from the text format’s conciseness and the binary format’s schema-aware encoding (struct arrays, string deduplication, type-specific packing). Compression is a secondary optimization applied on top.

Spec Changes

Section	Before	After
§4.3 (Header Flags)	“zstd compression”, “zstd frame magic”	“ZLIB (deflate) compression”, per-section flag detection
§4.9 (Compression)	Already correct (“ZLIB (deflate)”)	No change

Consequences

Positive

Spec is internally consistent. §4.3 and §4.9 now agree on ZLIB.
Third-party interop is unambiguous. Implementers need one algorithm, clearly documented.
No migration required. All existing .tlbx files remain valid.

Negative

Foregoes zstd’s speed advantage. In workloads with large sections (tens of KB+), zstd would decompress faster. TeaLeaf’s current section sizes don’t reach this threshold.

Neutral

Future versions can reconsider. If TeaLeaf v3 introduces large-section use cases (e.g., embedded binary blobs), zstd could be adopted with a format version bump. This ADR applies to binary format v2 only.

Architecture

High-level architecture of the TeaLeaf project.

Crate Structure

tealeaf/
├── tealeaf-core/          # Core library + CLI
│   ├── src/
│   │   ├── main.rs        # CLI entry point
│   │   ├── lib.rs         # Public API (TeaLeaf, Value, Schema, traits)
│   │   ├── reader.rs      # Binary file reader
│   │   ├── writer.rs      # Binary file writer (compiler)
│   │   ├── builder.rs     # TeaLeafBuilder fluent API
│   │   └── convert.rs     # ToTeaLeaf/FromTeaLeaf trait impls for primitives
│   └── tests/
│       ├── canonical.rs   # Canonical fixture tests
│       └── derive.rs      # Derive macro tests
│
├── tealeaf-derive/        # Proc-macro crate
│   ├── lib.rs             # Macro entry points
│   ├── attrs.rs           # Attribute parsing
│   ├── to_tealeaf.rs      # ToTeaLeaf derive implementation
│   ├── from_tealeaf.rs    # FromTeaLeaf derive implementation
│   ├── schema.rs          # Schema generation logic
│   └── util.rs            # Shared utilities
│
├── tealeaf-ffi/           # C FFI layer
│   ├── src/lib.rs         # All FFI exports
│   └── build.rs           # cbindgen header generation
│
├── bindings/dotnet/       # .NET bindings
│   ├── TeaLeaf.Annotations/   # Attribute definitions
│   ├── TeaLeaf.Generators/    # Source generator
│   ├── TeaLeaf/               # Managed wrappers + serializer
│   └── TeaLeaf.Tests/         # Test project
│
├── canonical/             # Canonical test fixtures
│   ├── samples/           # .tl text files
│   ├── expected/          # Expected .json outputs
│   ├── binary/            # Pre-compiled .tlbx files
│   └── errors/            # Invalid files for error testing
│
└── spec/                  # Format specification
    └── TEALEAF_SPEC.md

Data Flow

Parse Pipeline

Text input (.tl)
    │
    ▼
Lexer → Token stream
    │
    ▼
Parser → AST (directives + key-value pairs)
    │
    ├── Schema definitions → IndexMap<String, Schema>
    ├── Reference definitions → resolved inline
    └── Key-value pairs → IndexMap<String, Value>
    │
    ▼
TeaLeaf { schemas, data }

Compile Pipeline

TeaLeaf { schemas, data }
    │
    ▼
String collector → String table (deduplicated)
    │
    ▼
Schema encoder → Schema table (binary)
    │
    ▼
Value encoder → Data sections (per key)
    │    │
    │    ├── Primitives → fixed-size encoding
    │    ├── Strings → string table index (u32)
    │    ├── Struct arrays → null bitmap + positional values
    │    └── Other → type-tagged encoding
    │
    ▼
Compressor (per section, if > 64 bytes)
    │
    ▼
Writer → .tlbx file
    ├── Header (64 bytes)
    ├── String table
    ├── Schema table
    ├── Section index
    └── Data sections

Read Pipeline

.tlbx file
    │
    ▼
Reader (or MmapReader)
    │
    ├── Header validation (magic, version)
    ├── String table → lazy access
    ├── Schema table → lazy access
    └── Section index → key → offset mapping
    │
    ▼
Value access (by key)
    │
    ├── Locate section in index
    ├── Decompress if needed
    ├── Decode value by type code
    └── Return Value enum

Key Design Decisions

Positional Schema Encoding

Field names appear only in the schema table. Data rows use position to identify fields. This trades readability of binary for compactness.

Per-Section Compression

Each top-level key is a separate section compressed independently. This allows:

Random access without decompressing the entire file
Selective decompression (only read sections you need)

Thread-Local Error Handling (FFI)

The FFI uses thread-local storage for error messages instead of out-parameters or exceptions. This simplifies the C API while remaining thread-safe.

Source Generator vs Reflection

The .NET binding offers both approaches because:

Source generators produce optimal code but require partial classes
Reflection works with any type but is slower
Both share the same native library for actual encoding/decoding

Insertion Order Preservation (IndexMap)

All user-facing maps use IndexMap instead of HashMap to preserve insertion order across format conversions. Internal lookup tables (string interning, schema/union resolution, caches) remain HashMap for performance. See ADR-0001 for the full decision record including benchmark impact.

No Schema Versioning

TeaLeaf deliberately avoids schema evolution machinery. The rationale:

Simpler implementation and specification
Source file is always the truth
Recompilation is explicit and deterministic
Applications that need evolution can layer it on top

Binary Encoding Details

Deep dive into how values are encoded in the .tlbx binary format.

Encoding Strategy

The encoder selects the encoding strategy based on value type and context:

Top-Level Values

Each top-level key-value pair becomes a section in the binary file. The section’s type code and flags determine how to decode it.

Primitive Encoding

Type	Encoding	Size
Null	Nothing (type code alone)	0 bytes
Bool	`0x00` or `0x01`	1 byte
Int8	Signed byte	1 byte
Int16	2 bytes, little-endian	2 bytes
Int32	4 bytes, little-endian	4 bytes
Int64	8 bytes, little-endian	8 bytes
UInt8-64	Same as signed, unsigned	1-8 bytes
Float32	IEEE 754, little-endian	4 bytes
Float64	IEEE 754, little-endian	8 bytes
String	`u32` string table index	4 bytes
Bytes	varint length + raw data	variable
Timestamp	`i64` Unix ms + `i16` tz offset (minutes), LE	10 bytes

Integer Size Selection

The writer automatically selects the smallest representation:

Value::Int(42)      → Int8 (1 byte)     // fits in i8
Value::Int(1000)    → Int16 (2 bytes)   // fits in i16
Value::Int(100000)  → Int32 (4 bytes)   // fits in i32
Value::Int(5×10⁹)  → Int64 (8 bytes)   // needs i64

Struct Array Encoding

The most optimized encoding path is for arrays of schema-typed objects:

┌──────────────────────┐
│ Count: u32           │  Number of rows
│ Schema Index: u16    │  Which schema these rows follow
│ Null Bitmap Size: u16│  Bytes per row for null tracking
├──────────────────────┤
│ Row 0:               │
│   Null Bitmap: [u8]  │  One bit per field (1 = null)
│   Field 0 data       │  Only if not null
│   Field 1 data       │  Only if not null
│   ...                │
├──────────────────────┤
│ Row 1:               │
│   Null Bitmap: [u8]  │
│   Field data...      │
├──────────────────────┤
│ ...                  │
└──────────────────────┘

Null Bitmap

Size: ceil((field_count + 7) / 8) bytes per row
Bit i set = field i is null
Only non-null fields have data written

For a schema with 5 fields, the bitmap is 1 byte. If bit 2 is set, field 2 is null and its data is skipped.

Field Data

Each non-null field is encoded according to its schema type:

Primitive types: fixed-size encoding
String: u32 string table index
Nested struct: recursively encoded fields (with their own null bitmap)
Array field: count + typed elements

Homogeneous Array Encoding

Top-level arrays use homogeneous (packed) encoding only for two types:

Integer Arrays (i32 only)

All elements must be Value::Int and fit within the i32 range (-2³¹ to 2³¹ - 1). Integer arrays where any value exceeds i32 fall through to heterogeneous encoding.

Count: u32
Element Type: 0x04 (Int32)
Elements: [i32 × Count]  -- packed, no type tags

String Arrays

Count: u32
Element Type: 0x10 (String)
Elements: [u32 × Count]  -- string table indices

All Other Top-Level Arrays

Arrays of UInt, Bool, Float, Timestamp, Int64 (values exceeding i32), and mixed-type arrays all use heterogeneous encoding (see below). This keeps the top-level format simple for third-party implementations.

Schema-Typed Field Arrays

Arrays within struct fields are a separate case — they use homogeneous encoding for their schema-declared type, regardless of the top-level restrictions:

Count: u32
Element Type: u8 (from schema field type)
Elements: [packed data]

Heterogeneous Array Encoding

For mixed-type arrays and all top-level arrays not covered by Int32/String homogeneous encoding:

Count: u32
Element Type: 0xFF (heterogeneous marker)
Elements: [
  type: u8, data,
  type: u8, data,
  ...
]

Each element carries its own type tag.

Object Encoding

Field Count: u16
Fields: [
  key_idx: u32    (string table index)
  type: u8        (value type code)
  data: [...]     (type-specific encoding)
]

Objects are the untyped key-value container. Unlike struct arrays, each field carries its name and type.

Map Encoding

Count: u32
Entries: [
  key_type: u8,    key_data: [...],
  value_type: u8,  value_data: [...],
]

Both keys and values carry type tags.

Reference Encoding

name_idx: u32    (string table index for the reference name)

A reference is just a string table pointer to the target name.

Tagged Value Encoding

tag_idx: u32     (string table index for the tag name)
value_type: u8   (type code of the inner value)
value_data: [...]  (type-specific encoding of the inner value)

Varint Encoding

Used for bytes length:

Value: 300 (0x012C)
Encoded: 0xAC 0x02

Bit layout:
  0xAC = 1_0101100  → continuation bit set, value bits: 0101100 (44)
  0x02 = 0_0000010  → no continuation, value bits: 0000010 (2)

  Result: 44 + (2 << 7) = 44 + 256 = 300

Continuation bit: 0x80 – if set, more bytes follow
7 value bits per byte
Least-significant group first

Compression

Applied per section:

Check if uncompressed size > 64 bytes
Compress with ZLIB (deflate)
If compressed size < 90% of original, use compressed version
Set compression flag in section index entry
Store both size (compressed) and uncompressed_size in the index

String Table

The string table is a core component of the binary format that provides string deduplication.

Purpose

In a typical document with 1,000 user records, field values like "active", "Engineering", or city names repeat frequently. Without deduplication, each occurrence stores the full string. The string table stores each unique string once and uses 4-byte indices everywhere else.

Structure

┌─────────────────────────────┐
│ Total Size: u32              │  Size of the entire string table section
│ Count: u32                   │  Number of unique strings
├─────────────────────────────┤
│ Offsets: [u32 × Count]       │  Byte offset of each string in the data section
│ Lengths: [u32 × Count]       │  Length of each string (up to 4 GB)
├─────────────────────────────┤
│ String Data: [u8...]         │  Concatenated UTF-8 string data
└─────────────────────────────┘

How It Works

During Compilation

The writer traverses all values in the document
Every unique string is collected (keys, string values, schema names, field names, ref names, tag names)
Duplicates are eliminated
Each string gets an index (0, 1, 2, …)
The string table is written first in the file
All subsequent encoding uses indices instead of raw strings

During Reading

The reader loads the string table at startup
When decoding a string value, it reads a u32 index
The index maps to an offset and length in the string data
The string is read from the data section

Lookup Performance

String table access is O(1) by index:

index → offsets[index] → offset in data section
index → lengths[index] → number of bytes to read
string = data[offset..offset+length]

Size Impact

Example: 1,000 Users with 5 Fields

Without deduplication:

Field names repeated 1,000 times each
Common values (“active”, “Engineering”) repeated many times
Estimated overhead: ~20-30 KB just for repeated strings

With string table:

Each unique string stored once
References are 4 bytes each
Estimated savings: 60-80% on string data

Extreme Case: Large Tabular Data

For 10,000 rows with 10 fields, field names alone would consume:

Approach	Field Name Storage
JSON (per-field)	~10 × 10,000 × avg(8 bytes) = ~800 KB
TeaLeaf (string table)	10 × avg(8 bytes) + 100,000 × 4 bytes = ~400 KB
TeaLeaf with schema	10 × avg(8 bytes) = ~80 bytes (field names in schema only!)

With schema-typed data, field names appear only in the schema table – the string table contains only the actual string values.

What Gets Deduplicated

String Source	Deduplicated?
Top-level key names	Yes
Object field names	Yes
String values	Yes
Schema names	Yes
Schema field names	Yes
Reference names	Yes
Tag names	Yes

Maximum String Length

String lengths are stored as u32, supporting individual strings up to ~4 GB. The total string table size (all strings + metadata) is also capped at u32::MAX by the table’s Size header field.

Interaction with Compression

The string table itself is not compressed (it’s needed for decoding). However, data sections that reference the string table benefit doubly:

String references are 4 bytes (already compact)
ZLIB compression can further compress repetitive index patterns

Implementation Note

The string table uses a HashMap<String, u32> during compilation for O(1) dedup lookups. The final table is written as parallel arrays (offsets + lengths + data) for O(1) indexed access during reading.

Schema Inference

TeaLeaf can automatically infer schemas from JSON arrays of uniform objects. This page explains the algorithm.

When Schema Inference Runs

Schema inference is triggered by:

tealeaf from-json CLI command
tealeaf json-to-tlbx CLI command
TeaLeaf::from_json_with_schemas() Rust API

It is not triggered by:

TeaLeaf::from_json() (plain import, no schemas)
TLDocument.FromJson() (.NET API – plain import)

Algorithm

Step 1: Array Detection

Scan top-level JSON values for arrays where all elements are objects with identical key sets:

{
  "users": [           // ← Candidate: array of uniform objects
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"}
  ],
  "tags": ["a", "b"],  // ← Not candidate: array of strings
  "config": {...}       // ← Not candidate: not an array
}

Step 2: Name Inference

The schema name is derived from the parent key by singularization:

Key	Inferred Schema Name
`"users"`	`user`
`"products"`	`product`
`"employees"`	`employee`
`"addresses"`	`address`
`"data"`	`data` (already singular)
`"items_list"`	`items_list` (compound, kept as-is)

Basic singularization rules:

Remove trailing s if the word doesn’t end in ss
Remove trailing es for -es words
Remove trailing ies → y

Step 3: Type Inference

For each field, scan all array elements to determine the type:

JSON Values Seen	Inferred TeaLeaf Type
All integers	`int`
All numbers (mixed int/float)	`float`
All strings	`string`
All booleans	`bool`
All objects (uniform keys)	Nested struct reference
All arrays	Inferred element type
Mixed types	`string` (fallback)

Step 4: Nullable Detection

If any element has null for a field, that field becomes nullable:

[
  {"id": 1, "email": "alice@ex.com"},
  {"id": 2, "email": null}           // ← email becomes string?
]

Step 5: Nested Schema Inference

If a field’s value is an object across all array elements, and those objects have identical keys, a nested schema is created:

{
  "users": [
    {"name": "Alice", "address": {"city": "Seattle", "zip": "98101"}},
    {"name": "Bob", "address": {"city": "Austin", "zip": "78701"}}
  ]
}

Inferred schemas:

@struct address (city: string, zip: string)
@struct user (address: address, name: string)

This is recursive – nested objects can have their own nested schemas.

Output

The inferred schemas are:

Added to the document as @struct definitions
The original JSON arrays are converted to @table tuples
Written in the output file before the data

Example

Input JSON:

{
  "products": [
    {"id": 1, "name": "Widget", "price": 9.99, "in_stock": true},
    {"id": 2, "name": "Gadget", "price": 24.99, "in_stock": false}
  ]
}

Output TeaLeaf:

@struct product (id: int, in_stock: bool, name: string, price: float)

products: @table product [
  (1, true, Widget, 9.99),
  (2, false, Gadget, 24.99),
]

Limitations

Field order – JSON objects have no guaranteed order. Fields are sorted alphabetically in the inferred schema.
Type ambiguity – JSON numbers don’t distinguish int from float. If any element has a decimal, the field becomes float.
Non-uniform arrays – arrays where objects have different key sets are not schema-inferred. They remain as plain arrays of objects.
Deeply nested arrays – only the first level of array → schema inference is applied. Nested arrays within objects are not auto-inferred.
No timestamp detection – ISO 8601 strings in JSON remain as strings, not timestamps.

Testing

TeaLeaf has a comprehensive test suite spanning the Rust core, FFI layer, and .NET bindings.

Test Structure

tealeaf/
├── tealeaf-core/tests/
│   ├── canonical.rs          # Canonical fixture round-trip tests
│   └── derive.rs             # Derive macro tests
│
├── tealeaf-ffi/src/lib.rs    # FFI safety tests (inline #[cfg(test)])
│
├── bindings/dotnet/
│   ├── TeaLeaf.Tests/        # .NET unit tests
│   └── TeaLeaf.Generators.Tests/  # Source generator tests
│
└── canonical/                # Shared test fixtures
    ├── samples/              # .tl text files (14 canonical samples)
    ├── expected/             # Expected .json outputs
    ├── binary/               # Pre-compiled .tlbx files
    └── errors/               # Invalid files for error testing

Running Tests

Rust

# All Rust tests
cargo test --workspace

# Core tests only
cargo test --package tealeaf-core

# Derive macro tests
cargo test --package tealeaf-core --test derive

# Canonical fixture tests
cargo test --package tealeaf-core --test canonical

# FFI tests
cargo test --package tealeaf-ffi

.NET

cd bindings/dotnet
dotnet test

Everything

# Rust
cargo test --workspace

# .NET
cd bindings/dotnet && dotnet test

Canonical Test Fixtures

The canonical/ directory contains 14 sample files that test every feature:

Sample	Features Tested
`primitives`	All primitive types (bool, int, float, string, null)
`arrays`	Simple and nested arrays
`objects`	Nested objects
`schemas`	`@struct` definitions and `@table` usage
`nested_schemas`	Struct-referencing-struct
`deep_nesting`	Multi-level struct nesting
`nullable`	Nullable fields with `~` values
`maps`	`@map` with various key types
`references`	`!ref` definitions and usage
`tagged`	`:tag value` tagged values
`timestamps`	ISO 8601 timestamp parsing
`mixed`	Combination of multiple features
`comments`	Comment handling
`strings`	Quoted, unquoted, multiline strings

Each sample has:

canonical/samples/{name}.tl – the text source
canonical/expected/{name}.json – expected JSON output
canonical/binary/{name}.tlbx – pre-compiled binary

Canonical Test Pattern

#![allow(unused)]
fn main() {
#[test]
fn test_canonical_sample() {
    let tl = TeaLeaf::load("canonical/samples/primitives.tl").unwrap();

    // Round-trip: text → binary → text
    let tmp = tempfile::NamedTempFile::new().unwrap();
    tl.compile(tmp.path(), true).unwrap();
    let reader = Reader::open(tmp.path()).unwrap();

    // Verify values match
    assert_eq!(reader.get("count").unwrap().as_int(), Some(42));

    // JSON output matches expected
    let json = tl.to_json().unwrap();
    let expected = std::fs::read_to_string("canonical/expected/primitives.json").unwrap();
    assert_json_eq(&json, &expected);
}
}

Error Fixtures

The canonical/errors/ directory contains intentionally invalid files:

File	Error Tested
Invalid syntax	Parser error handling
Missing struct	`@table` references undefined schema
Type mismatches	Schema validation
Malformed binary	Binary reader error handling

Derive Macro Tests

Tests for #[derive(ToTeaLeaf, FromTeaLeaf)]:

Basic struct serialization/deserialization
All attribute combinations (rename, skip, optional, type, flatten, default)
Nested structs
Enum variants
Collection types (Vec, HashMap, IndexMap, Option)
Edge cases (empty structs, single-field structs)

.NET Test Categories

The .NET test suite covers:

Source Generator Tests

Schema generation for all type combinations
Serialization output (text, JSON, binary)
Deserialization from documents
Nested types and collections
Enum handling
Attribute processing

Reflection Serializer Tests

Generic serialization/deserialization
Type mapping accuracy
Nullable handling
Dictionary and List support

Native Type Tests

TLDocument lifecycle (parse, access, dispose)
TLValue type accessors
TLReader binary access
Schema introspection
Error handling (disposed objects, missing keys)

DTO Serialization Tests

Full round-trip (C# object → TeaLeaf → C# object)
Edge cases (empty strings, nulls, large numbers)
Collection serialization

Test Philosophy

Canonical fixtures – shared across Rust and .NET, ensuring format consistency
Round-trip testing – text → binary → text verifies no data loss
JSON equivalence – text → JSON and binary → JSON produce identical output
Error coverage – every error path has at least one test
Cross-language – same fixtures tested in Rust, .NET, and via FFI

Benchmarks

TeaLeaf includes a Criterion-based benchmark suite that measures encode/decode performance and output size across multiple serialization formats.

Running Benchmarks

# Run all benchmarks
cargo bench -p tealeaf-core

# Run a specific scenario
cargo bench -p tealeaf-core -- small_object
cargo bench -p tealeaf-core -- large_array_1000
cargo bench -p tealeaf-core -- tabular_5000

# List available benchmarks
cargo bench -p tealeaf-core -- --list

Results are saved to target/criterion/ with HTML reports and JSON data. Criterion tracks historical performance across runs.

Formats Compared

Each scenario benchmarks encode and decode across six formats:

Format	Library	Notes
TeaLeaf Parse	`tealeaf`	Text parsing (`.tl` → in-memory)
TeaLeaf Binary	`tealeaf`	Binary compile/read (`.tlbx`)
JSON	`serde_json`	Standard JSON serialization
MessagePack	`rmp_serde`	Binary, schemaless
CBOR	`ciborium`	Binary, schemaless
Protobuf	`prost`	Binary with generated code from `.proto` definitions

Note: Protobuf benchmarks use prost with code generation via build.rs. The generated structs have known field offsets at compile time, giving Protobuf a structural speed advantage over TeaLeaf’s dynamic key-based access.

Benchmark Scenarios

Group	Data Shape	Sizes	What It Tests
`small_object`	Config-like object	1	Header overhead, small payload efficiency
`large_array_100`	Array of Point structs	100	Array encoding at small scale
`large_array_1000`	Array of Point structs	1,000	Array encoding at medium scale
`large_array_10000`	Array of Point structs	10,000	Array encoding at large scale, throughput
`nested_structs`	Nested objects	2 levels	Nesting overhead
`nested_structs_100`	Nested objects	100 levels	Deep nesting scalability
`mixed_types`	Heterogeneous data	1	Strings, numbers, booleans mixed
`tabular_100`	`@table` User records	100	Schema-bound tabular data, small
`tabular_1000`	`@table` User records	1,000	Schema-bound tabular data, medium
`tabular_5000`	`@table` User records	5,000	Schema-bound tabular data, large

Each group measures both encode (serialize) and decode (deserialize) operations, using Throughput::Elements for per-element metrics on scaled scenarios.

Size Comparison Results

From cargo run --example size_report on tealeaf-core:

Format	Small Object	10K Points	1K Users
JSON	1.00x	1.00x	1.00x
Protobuf	0.38x	0.65x	0.41x
MessagePack	0.35x	0.63x	0.38x
TeaLeaf Binary	3.56x	0.15x	0.47x

Key observations:

Small objects: TeaLeaf has a 64-byte header overhead. For objects under ~200 bytes, JSON or MessagePack are more compact.
Large arrays: String deduplication and schema-based compression produce 6-7x better compression than JSON for 10K+ records.
Tabular data: @table encoding with positional storage is competitive with Protobuf, with the advantage of embedded schemas.

Speed Characteristics

TeaLeaf’s dynamic key-based access is ~2-5x slower than Protobuf’s generated code:

Operation	TeaLeaf	Protobuf	JSON (serde)
Parse text	Moderate	N/A	Fast
Decode binary	Moderate	Fast	N/A
Random key access	O(1) hash	O(1) field	O(n) parse

Why TeaLeaf is slower than Protobuf:

Dynamic dispatch – fields resolved by name at runtime; Protobuf uses generated code with known offsets
String table lookup – each string access requires a table lookup
Schema resolution – schema structure parsed from binary at load time

When this matters:

Hot loops decoding millions of records → consider Protobuf
Cold reads or moderate throughput → TeaLeaf is fine
Size-constrained transmission → TeaLeaf’s smaller binary compensates for slower decode

Code Structure

tealeaf-core/benches/
├── benchmarks.rs          # Entry point: criterion_group + criterion_main
├── common/
│   ├── mod.rs             # Module exports
│   ├── data.rs            # Test data generation functions
│   └── structs.rs         # Rust struct definitions (serde-compatible)
└── scenarios/
    ├── mod.rs             # Module exports
    ├── small_object.rs    # Small config object benchmarks
    ├── large_array.rs     # Scaled array benchmarks (100-10K)
    ├── nested_structs.rs  # Nesting depth benchmarks (2-100)
    ├── mixed_types.rs     # Heterogeneous data benchmarks
    └── tabular_data.rs    # @table User record benchmarks (100-5K)

Each scenario module exports bench_encode and bench_decode functions. Scaled scenarios accept a size parameter.

For optimization tips and practical guidance on when to use each format, see Performance.

Accuracy Benchmark

The accuracy benchmark suite evaluates LLM providers’ ability to analyze structured data in TeaLeaf format. It sends analysis prompts with TeaLeaf-formatted business data to multiple providers and scores the responses.

Overview

The workflow:

Takes JSON data from various business domains
Converts it to TeaLeaf format using tealeaf-core
Sends analysis prompts to multiple LLM providers
Evaluates and compares the responses using a scoring framework

Supported Providers

Provider	Environment Variable	Model
Anthropic	`ANTHROPIC_API_KEY`	Claude Sonnet 4.5 (Extended Thinking)
OpenAI	`OPENAI_API_KEY`	GPT-5.2

Installation

Pre-built Binaries

Download the latest release from GitHub Releases:

Platform	Architecture	File
Windows	x64	`accuracy-benchmark-windows-x64.zip`
Windows	ARM64	`accuracy-benchmark-windows-arm64.zip`
macOS	Intel	`accuracy-benchmark-macos-x64.tar.gz`
macOS	Apple Silicon	`accuracy-benchmark-macos-arm64.tar.gz`
Linux	x64	`accuracy-benchmark-linux-x64.tar.gz`
Linux	ARM64	`accuracy-benchmark-linux-arm64.tar.gz`
Linux	x64 (static)	`accuracy-benchmark-linux-musl-x64.tar.gz`

Build from Source

cargo build -p accuracy-benchmark --release

# Or run directly
cargo run -p accuracy-benchmark -- --help

Usage

# Run with all available providers
cargo run -p accuracy-benchmark -- run

# Run with specific providers
cargo run -p accuracy-benchmark -- run --providers anthropic,openai

# Run specific categories only
cargo run -p accuracy-benchmark -- run --categories finance,retail

# Compare TeaLeaf vs JSON format performance
cargo run -p accuracy-benchmark -- run --compare-formats

# Verbose output
cargo run -p accuracy-benchmark -- -v run

# List available tasks
cargo run -p accuracy-benchmark -- list-tasks

# Generate configuration template
cargo run -p accuracy-benchmark -- init-config -o my-config.json

Benchmark Tasks

The suite includes 12 tasks across 10 business domains:

ID	Domain	Complexity	Output Type
FIN-001	Finance	Simple	Calculation
FIN-002	Finance	Moderate	Calculation
RET-001	Retail	Simple	Summary
RET-002	Retail	Complex	Recommendation
HLT-001	Healthcare	Simple	Summary
TEC-001	Technology	Moderate	Analysis
MKT-001	Marketing	Moderate	Calculation
LOG-001	Logistics	Moderate	Analysis
HR-001	HR	Moderate	Analysis
MFG-001	Manufacturing	Moderate	Calculation
RE-001	Real Estate	Complex	Recommendation
LEG-001	Legal	Complex	Analysis

Data Sources

Each task specifies input data in one of two ways:

Inline JSON:

#![allow(unused)]
fn main() {
BenchmarkTask::new("FIN-001", "finance", "Analyze this data:\n\n{tl_data}")
    .with_json_data(serde_json::json!({
        "revenue": 1000000,
        "expenses": 750000
    }))
}

JSON file reference:

#![allow(unused)]
fn main() {
BenchmarkTask::new("LOG-001", "logistics", "Analyze this data:\n\n{tl_data}")
    .with_json_file("tasks/logistics/data/shipments.json")
}

The {tl_data} placeholder in the prompt template is replaced with TeaLeaf-formatted data before sending to the LLM.

Analysis Framework

Accuracy Metrics

Responses are evaluated across five dimensions:

Metric	Weight	Description
Completeness	25%	Were all expected elements addressed?
Relevance	25%	How relevant is the response to the task?
Coherence	20%	Is the response well-structured?
Factual Accuracy	20%	Do values match validation patterns?
Actionability	10%	For recommendations – are they actionable?

Element Detection

Each task defines expected elements that should appear in the response:

#![allow(unused)]
fn main() {
// Keyword presence check
.expect("metric", "Total revenue calculation", true)

// Regex pattern validation
.expect_with_pattern("metric", "Percentage value", true, r"\d+\.?\d*%")
}

Without pattern: checks for keyword presence from description
With pattern: validates using regex (e.g., \$[\d,]+ for dollar amounts)

Scoring Rubrics

Different rubrics apply based on output type:

Output Type	Key Criteria
Calculation	Numeric content (5+ numbers), structured output
Analysis	Depth, structure, evidence with data
Recommendation	Actionable language, prioritization, justification
Summary	Completeness, conciseness, organization

Coherence Checks

Structure markers: ##, ###, **, -, numbered lists
Paragraph breaks (3+ paragraphs preferred)
Reasonable length (100-2000 words)

Actionability Keywords

For recommendation tasks, these keywords are detected:

recommend, should, suggest, consider, advise
action, implement, improve, optimize, prioritize
next step, immediate, critical, important

Format Comparison Results

Run with --compare-formats to compare TeaLeaf vs JSON input efficiency.

Sample run from February 5, 2026 with Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) and GPT-5.2 (gpt-5.2-2025-12-11):

Provider	TeaLeaf Score	JSON Score	Accuracy Diff	TeaLeaf Input	JSON Input	Input Savings
anthropic	0.988	0.978	+0.010	5,793	8,275	-30.0%
openai	0.901	0.899	+0.002	4,868	7,089	-31.3%

Input tokens = data sent to the model. Output tokens vary by model verbosity.

Key findings:

Provider	Accuracy	Data Token Efficiency
anthropic	Comparable (+1.0%)	TeaLeaf uses ~36% fewer data tokens
openai	Comparable (+0.2%)	TeaLeaf uses ~36% fewer data tokens

TeaLeaf data payloads use ~36% fewer tokens than equivalent JSON (median across 12 tasks, validated with tiktoken). Total input savings are ~30% because shared instruction text dilutes the data-only difference. Savings increase with larger and more structured datasets.

Sample Results: Reference benchmark results are available in accuracy-benchmark/results/sample/ in the repository.

Output Files

Results are saved in two formats:

TeaLeaf Format (`analysis.tl`)

# Accuracy Benchmark Results
# Generated: 2026-02-05 15:29:42 UTC

run_metadata: {
    run_id: "20260205-152419",
    started_at: 2026-02-05T15:24:19Z,
    completed_at: 2026-02-05T15:29:42Z,
    total_tasks: 12,
    providers: [anthropic, openai]
}

responses: @table api_response [
    (FIN-001, openai, "gpt-5.2-2025-12-11", 315, 490, 6742, 2026-02-05T15:24:38Z, success),
    (FIN-001, anthropic, "claude-sonnet-4-5-20250929", 396, 1083, 12309, 2026-02-05T15:24:31Z, success),
    ...
]

analysis_results: @table analysis_result [
    (FIN-001, openai, 0.667, 0.625, 0.943, 0.000),
    (FIN-001, anthropic, 1.000, 1.000, 1.000, 1.000),
    ...
]

comparisons: @table comparison_result [
    (FIN-001, [anthropic, openai], anthropic, 0.389),
    (RET-001, [anthropic, openai], anthropic, 0.047),
    ...
]

summary: {
    total_tasks: 12,
    wins: { anthropic: 11, openai: 1 },
    avg_scores: { anthropic: 0.988, openai: 0.901 },
    by_category: { ... },
    by_complexity: { ... }
}

JSON Summary (`summary.json`)

{
  "run_id": "20260205-152419",
  "total_tasks": 12,
  "provider_rankings": [
    { "provider": "anthropic", "wins": 11, "avg_score": 0.988 },
    { "provider": "openai", "wins": 1, "avg_score": 0.901 }
  ],
  "category_breakdown": {
    "retail": { "leader": "anthropic", "margin": 0.111 },
    "finance": { "leader": "anthropic", "margin": 0.197 },
    ...
  },
  "detailed_results_file": "analysis.tl"
}

Adding Custom Tasks

From JSON Data

#![allow(unused)]
fn main() {
BenchmarkTask::new(
    "CUSTOM-001",
    "custom_category",
    "Analyze this data:\n\n{tl_data}\n\nProvide summary and recommendations."
)
.with_json_file("tasks/custom/data/my_data.json")
.with_complexity(Complexity::Moderate)
.with_output_type(OutputType::Analysis)
.expect("summary", "Data overview", true)
.expect_with_pattern("metric", "Total value", true, r"\d+")
}

From TeaLeaf File

cargo run -p accuracy-benchmark -- run --tasks path/to/tasks.tl

Extending Providers

Implement the LLMProvider trait:

#![allow(unused)]
fn main() {
#[async_trait]
impl LLMProvider for NewProviderClient {
    fn name(&self) -> &str { "newprovider" }

    async fn complete(&self, request: CompletionRequest) -> ProviderResult<CompletionResponse> {
        // Implementation
    }
}
}

Then register in src/providers/mod.rs via create_all_providers() and create_providers().

Directory Structure

accuracy-benchmark/
├── src/
│   ├── main.rs           # CLI interface (clap)
│   ├── lib.rs            # Library exports
│   ├── config.rs         # Configuration management
│   ├── providers/        # LLM provider clients
│   │   ├── traits.rs     # LLMProvider trait
│   │   ├── anthropic.rs  # Claude implementation
│   │   └── openai.rs     # GPT implementation
│   ├── tasks/            # Task definitions
│   │   ├── mod.rs        # BenchmarkTask, DataSource
│   │   ├── categories.rs # Domain, Complexity, OutputType
│   │   └── loader.rs     # TeaLeaf file loader
│   ├── runner/           # Execution engine
│   │   ├── executor.rs   # Parallel task execution
│   │   └── rate_limiter.rs
│   ├── analysis/         # Response analysis
│   │   ├── metrics.rs    # AccuracyMetrics
│   │   ├── scoring.rs    # ScoringRubric
│   │   └── comparator.rs # Cross-provider comparison
│   └── reporting/        # Output generation
│       └── tl_writer.rs  # TeaLeaf format output
├── tasks/                # Sample data by domain
│   ├── finance/data/
│   ├── healthcare/data/
│   ├── retail/data/
│   └── ...
├── results/runs/         # Archived run results
└── Cargo.toml

Adversarial Tests

The adversarial test suite validates TeaLeaf’s error handling and robustness using crafted malformed inputs, binary corruption, compression edge cases, and large-corpus stress tests. All tests are isolated in the adversarial-tests/ directory to avoid touching core project files.

Current count: 58 tests across 9 categories.

Running Tests

# Run all adversarial tests
cd adversarial-tests/core-harness
cargo test --test adversarial

# With output
cargo test --test adversarial -- --nocapture

# Run via script (PowerShell)
./adversarial-tests/scripts/run_core_harness.ps1

# CLI adversarial tests
./adversarial-tests/scripts/run_cli_adversarial.ps1

# .NET adversarial harness
./adversarial-tests/scripts/run_dotnet_harness.ps1

Test Input Files

TeaLeaf Format (.tl) — 13 files

Crafted .tl files testing parser error paths:

File	Error Tested	Expected
`bad_unclosed_string.tl`	Unclosed string literal (`"Alice`)	Parse error
`bad_missing_colon.tl`	Missing colon in key-value pair	Parse error
`bad_invalid_escape.tl`	Invalid escape sequence (`\q`)	Parse error
`bad_number_overflow.tl`	Number exceeding u64 bounds	See note below
`bad_table_wrong_arity.tl`	Table row with wrong field count	Parse error
`bad_schema_unclosed.tl`	Unclosed `@struct` definition	Parse error
`bad_unicode_escape_short.tl`	Incomplete `\u` escape (`\u12`)	Parse error
`bad_unicode_escape_invalid_hex.tl`	Invalid hex in `\uZZZZ`	Parse error
`bad_unicode_escape_surrogate.tl`	Unicode surrogate pair (`\uD800`)	Parse error
`bad_unterminated_multiline.tl`	Unterminated `"""` multiline string	Parse error
`invalid_utf8.tl`	Invalid UTF-8 byte sequence	Parse error

Note: bad_number_overflow.tl does not cause a parse error. Numbers exceeding i64/u64 range are stored as Value::JsonNumber (exact decimal string), not rejected.

Edge cases that should succeed:

File	What It Tests	Expected
`deep_nesting.tl`	7 levels of nested arrays (`[[[[[[[1]]]]]]]`)	Parse OK
`empty_doc.tl`	Empty document	Parse OK

JSON Format (.json) — 6 files

Files testing from_json error and edge-case paths:

File	What It Tests	Expected
`invalid_json_trailing.json`	Trailing comma or content	Parse error
`invalid_json_unclosed.json`	Unclosed object or array	Parse error
`large_number.json`	Number overflowing f64	Stored as `JsonNumber`
`deep_array.json`	Deeply nested arrays	Parse OK
`empty_object.json`	Empty JSON object `{}`	Parse OK
`root_array.json`	Root-level array `[1,2,3]`	Preserved as array

Binary Format (.tlbx) — 4 files (unused)

These fixture files exist but are not referenced by any test. All binary adversarial tests generate malformed data inline using tempfile::tempdir(). These files are only used by the CLI adversarial scripts in results/cli/.

File	Content
`bad_magic.tlbx`	Invalid magic bytes
`bad_version.tlbx`	Invalid version field
`random_garbage.tlbx`	Random bytes
`truncated_header.tlbx`	Incomplete header

Test Functions

Parse Error Tests (10 tests)

Function	Input	Assertion
`parse_invalid_syntax_unclosed_string`	`name: "Alice`	`is_err()`
`parse_invalid_escape_sequence`	`name: "Alice\q"`	`is_err()`
`parse_missing_colon`	`name "Alice"`	`is_err()`
`parse_schema_unclosed`	Unclosed `@struct`	`is_err()`
`parse_table_wrong_arity`	3 fields for 2-field schema	`is_err()`
`parse_unicode_escape_short`	`\u12`	`is_err()`
`parse_unicode_escape_invalid_hex`	`\uZZZZ`	`is_err()`
`parse_unicode_escape_surrogate`	`\uD800`	`is_err()`
`parse_unterminated_multiline_string`	`"""unterminated`	`is_err()`
`from_json_invalid`	`{"a":1,}` (trailing comma)	`is_err()`

Success / Edge-Case Parse Tests (3 tests)

Function	Input	Assertion
`parse_number_overflow_falls_to_json_number`	`18446744073709551616`	Parse succeeds; stored as `Value::JsonNumber`
`parse_deep_nesting_ok`	`[[[[[[[1]]]]]]]`	Parse succeeds; `get("root")` returns value
`from_json_root_array_is_preserved`	`[1,2,3]`	Stored under `"root"` key as `Value::Array`

Error Variant Coverage (5 tests)

Tests that exercise specific Error enum variants for code coverage:

Function	What It Tests	Assertion
`parse_unknown_struct_in_table`	`@table nonexistent` references undefined struct	`is_err()`; message contains struct name
`parse_unexpected_eof_unclosed_brace`	`obj: {x: 1,`	`is_err()`; message indicates EOF
`parse_unexpected_eof_unclosed_bracket`	`arr: [1, 2,`	`is_err()`
`reader_missing_field`	`reader.get("nonexistent")` on valid binary	`is_err()`; message contains key name
`from_json_large_number_falls_to_json_number`	`{"big": 18446744073709551616}`	Parsed as `Value::JsonNumber`

Type Coercion Tests (2 tests)

Validates spec §2.5 best-effort numeric coercion during binary compilation:

Function	Input	Assertion
`writer_int_overflow_coerces_to_zero`	`int8` field with value `999`	Binary roundtrip produces `Value::Int(0)`
`writer_uint_negative_coerces_to_zero`	`uint8` field with value `-1`	Binary roundtrip produces `Value::UInt(0)`

Binary Reader Tests (4 tests)

Function	Input	Assertion
`reader_rejects_bad_magic`	`[0x58, 0x58, 0x58, 0x58]`	`Reader::open().is_err()`
`reader_rejects_bad_version`	Valid magic + version 3	`Reader::open().is_err()`
`load_invalid_file_errors`	`.tl` file with bad syntax	`TeaLeaf::load().is_err()`
`load_invalid_utf8_errors`	`[0xFF, 0xFE, 0xFA]`	`TeaLeaf::load().is_err()`

Binary Corruption Tests (12 tests)

Tests that take valid binary output, corrupt specific bytes, and verify the reader does not panic:

Function	What It Corrupts
`reader_corrupted_magic_byte`	Flips first magic byte
`reader_corrupted_string_table_offset`	Points string table offset past EOF
`reader_truncated_string_table`	Truncates file right after header
`reader_oversized_string_count`	Sets string count to `u32::MAX`
`reader_oversized_section_count`	Sets section count to `u32::MAX`
`reader_corrupted_schema_count`	Sets schema count to `u32::MAX`
`reader_flipped_bytes_in_section_data`	Flips bytes in last 10 bytes of section data
`reader_truncated_compressed_data`	Removes last 20 bytes from compressed file
`reader_invalid_zlib_stream`	Overwrites data section with `0xBA` bytes
`reader_zero_length_file`	Empty `Vec<u8>`
`reader_just_magic_no_header`	Only `b"TLBX"` (4 bytes, no header)
`reader_corrupted_type_code`	Replaces a type code byte with `0xFE`

All corruption tests assert no panic. Most also verify that Reader::from_bytes() or reader.get() either returns an error or handles the corruption gracefully.

Compression Stress Tests (4 tests)

Function	What It Tests
`compression_at_threshold_boundary`	Data just over 64 bytes triggers compression attempt; roundtrip OK
`compression_skipped_when_not_beneficial`	High-entropy data: compressed file not much larger than raw
`compression_all_identical_bytes`	10K zeros: compressed size < half of raw; roundtrip OK
`compression_below_threshold_stored_raw`	Small data with `compress=true`: stored raw (same size as uncompressed)

Soak / Large-Corpus Tests (8 tests)

Stress tests for parser, writer, and reader with large inputs:

Function	Scale	What It Tests
`soak_deeply_nested_arrays`	200 levels deep	Parser handles deep nesting without stack overflow
`soak_wide_object`	10,000 fields	Parser and `Value::Object` handle wide objects
`soak_large_array`	100,000 integers	Parser handles large arrays; first/last element correct
`soak_large_array_binary_roundtrip`	100,000 integers	Compile + read roundtrip with compression
`soak_many_sections`	5,000 top-level keys	Binary writer/reader handles many sections
`soak_many_schemas`	500 `@struct` definitions	Schema table handles large schema counts
`soak_string_deduplication`	15,000 strings (5K dupes)	String dedup in binary writer; roundtrip correct
`soak_long_string`	1 MB string	Binary writer/reader handles large string values

Memory-Mapped Reader Tests (10 tests)

Validates Reader::open_mmap() produces identical results to Reader::open() and Reader::from_bytes():

Function	What It Tests
`mmap_roundtrip_all_primitive_types`	Int, float, bool, string, timestamp via mmap
`mmap_roundtrip_containers`	Arrays, objects, nested arrays via mmap
`mmap_roundtrip_schemas`	`@struct` + `@table` data via mmap
`mmap_roundtrip_compressed`	500-element compressed array via mmap
`mmap_vs_open_equivalence`	All keys: `open_mmap` values == `open` values
`mmap_vs_from_bytes_equivalence`	All keys: `open_mmap` values == `from_bytes` values
`mmap_large_file`	50,000-element array via mmap
`mmap_nonexistent_file`	`open_mmap` on missing path returns error
`mmap_multiple_sections`	100 sections via mmap; boundary keys correct
`mmap_string_dedup`	100 identical string values via mmap; dedup preserved

Directory Structure

adversarial-tests/
├── inputs/
│   ├── tl/              # 13 crafted .tl files (11 error + 2 success)
│   ├── json/            # 6 crafted .json files
│   └── tlbx/            # 4 .tlbx files (used by CLI scripts, not Rust tests)
├── core-harness/
│   ├── tests/
│   │   └── adversarial.rs   # 58 Rust integration tests
│   └── Cargo.toml
├── dotnet-harness/          # C# harness using TeaLeaf bindings
├── scripts/
│   ├── run_core_harness.ps1
│   ├── run_cli_adversarial.ps1
│   └── run_dotnet_harness.ps1
├── results/                 # CLI test logs and outputs
└── README.md

Adding New Tests

1. Add an Inline Test (preferred)

Most adversarial tests generate their inputs inline. This avoids stale fixture files and keeps the test self-contained:

#![allow(unused)]
fn main() {
#[test]
fn parse_new_error_case() {
    assert_parse_err("malformed: input here");
}
}

The assert_parse_err helper asserts that TeaLeaf::parse(input).is_err().

2. For Binary Tests

Use the make_valid_binary helper to produce valid bytes, then corrupt them:

#![allow(unused)]
fn main() {
#[test]
fn reader_new_corruption_case() {
    let mut data = make_valid_binary("val: 42", false);
    data[0] ^= 0xFF; // corrupt something
    let result = Reader::from_bytes(data);
    // Assert no panic; error or graceful handling OK
    if let Ok(r) = result {
        let _ = r.get("val");
    }
}
}

3. Input File Tests (for CLI scripts)

Place malformed input in the appropriate subdirectory for CLI adversarial testing:

adversarial-tests/inputs/tl/bad_new_case.tl

The CLI script run_cli_adversarial.ps1 exercises these files through the tealeaf CLI binary and logs results to results/cli/.

Contributing Guide

Contributions to TeaLeaf are welcome. The full contributing guide lives in the repository root:

CONTRIBUTING.md

That document covers project architecture, build instructions, testing, the canonical test suite, version management, PR process, and areas of interest for contributors.

This page highlights the key points. See the Development Setup page for environment setup details.

Ways to Contribute

Bug reports – file issues on GitHub with reproduction steps
Feature requests – open an issue describing the use case
Code contributions – submit pull requests
Documentation – fix typos, improve explanations, add examples
Language bindings – create bindings for Python, Java, Go, etc.
Test cases – add canonical test fixtures or edge case tests

Repository

Source code: github.com/krishjag/tealeaf

Pull Request Checklist

Fork the repository and create a feature branch from main
Make your changes

Run tests and lints:

cargo test --workspace
cargo clippy --workspace
cargo fmt --check

If you modified .NET bindings: cd bindings/dotnet && dotnet test
Submit a pull request against main

CI runs on Linux, macOS, and Windows automatically. Version consistency is validated on every PR.

Code Style

Rust

Standard rustfmt formatting (no custom config)
Standard clippy lints (no custom config)
Document public APIs with /// doc comments
Edition 2021

C# (.NET)

Standard C# naming conventions
XML doc comments for public APIs
Target frameworks: net6.0, net8.0, net10.0, netstandard2.0

Areas of Interest

New Language Bindings

The FFI layer exposes a C-compatible API that can be used from any language. See the FFI Overview for getting started.

Desired bindings:

Python (via ctypes or cffi)
Java/Kotlin (via JNI or JNA)
Go (via cgo)
JavaScript/TypeScript (via WASM or N-API)

Format Improvements

Union support in binary encoding
Bytes literal syntax in text format
Streaming/append-only mode

Tooling

Editor plugins (VS Code syntax highlighting for .tl)
Schema validation tooling
Web-based playground

License

By contributing, you agree that your contributions will be licensed under the MIT License.

Development Setup

How to set up a development environment for working on TeaLeaf. See also the comprehensive CONTRIBUTING.md in the repository root for project architecture, version management, and PR guidelines.

Prerequisites

Tool	Version	Purpose
Rust	1.70+	Core library, CLI, FFI
.NET SDK	8.0+	.NET bindings and tests
Git	Any	Version control

Optional:

Protobuf compiler (for benchmark suite)
mdBook (for documentation)

Clone and Build

git clone https://github.com/krishjag/tealeaf.git
cd tealeaf

# Build everything
cargo build --workspace

# Build release
cargo build --workspace --release

Project Layout

tealeaf/
├── tealeaf-core/          # Core library + CLI binary
├── tealeaf-derive/        # Proc-macro (derive macros)
├── tealeaf-ffi/           # C FFI layer
├── bindings/dotnet/       # .NET bindings
├── canonical/             # Shared test fixtures
├── spec/                  # Format specification
├── examples/              # Example files
├── docs-site/             # Documentation site (mdBook)
└── accuracy-benchmark/    # Accuracy benchmark tool

Running Tests

Rust

# All tests
cargo test --workspace

# Specific package
cargo test --package tealeaf-core
cargo test --package tealeaf-derive
cargo test --package tealeaf-ffi

# Specific test file
cargo test --package tealeaf-core --test canonical
cargo test --package tealeaf-core --test derive

# With output
cargo test --workspace -- --nocapture

.NET

cd bindings/dotnet
dotnet build
dotnet test

Lint

cargo clippy --workspace
cargo fmt --check

Development Workflows

Modifying the Parser

Edit tealeaf-core/src/lib.rs (lexer and parser live here)
Run cargo test --package tealeaf-core
Check canonical fixtures still pass
Add new test cases for the change

Modifying the Binary Format

Edit tealeaf-core/src/writer.rs (encoder) and tealeaf-core/src/reader.rs (decoder)
Run canonical round-trip tests: cargo test --package tealeaf-core --test canonical
Regenerate binary fixtures if the format changed

Modifying Derive Macros

Edit files in tealeaf-derive/src/
Run: cargo test --package tealeaf-core --test derive
Check that derive tests cover your change

Modifying FFI

Edit tealeaf-ffi/src/lib.rs
Run: cargo test --package tealeaf-ffi
The C header is auto-regenerated by cbindgen during build

Modifying .NET Bindings

Edit files in bindings/dotnet/
Build: cd bindings/dotnet && dotnet build
Test: dotnet test
The native library must be built first: cargo build --package tealeaf-ffi

Documentation

Building the Documentation Site

# Install mdBook
cargo install mdbook

# Build
cd docs-site
mdbook build

# Serve locally with live reload
mdbook serve --open

Rust API Docs

cargo doc --workspace --no-deps --open

CI/CD

The project uses GitHub Actions for CI:

Workflow	Purpose
`rust-cli.yml`	Build and test Rust on all platforms
`dotnet-package.yml`	Build .NET package with native libraries
`accuracy-benchmark.yml`	Benchmark accuracy tests

All CI runs are triggered on push to main/develop and on pull requests.

Debugging

Rust

# Run with debug output
RUST_LOG=debug cargo run --package tealeaf-core -- info test.tl

# Run with backtrace
RUST_BACKTRACE=1 cargo test --package tealeaf-core

.NET

Use Visual Studio or VS Code with the C# extension for debugging the source generator and managed code.

For native library issues, attach a native debugger to the .NET test process.

Type Reference

Complete reference table for all TeaLeaf types, their text syntax, binary encoding, and language mappings.

Primitive Types

TeaLeaf Type	Text Syntax	Binary Code	Binary Size	Rust Type	C# Type
`bool`	`true` / `false`	`0x01`	1 byte	`bool`	`bool`
`int8`	`42`	`0x02`	1 byte	`i8`	`sbyte`
`int16`	`1000`	`0x03`	2 bytes	`i16`	`short`
`int` / `int32`	`100000`	`0x04`	4 bytes	`i32`	`int`
`int64`	`5000000000`	`0x05`	8 bytes	`i64`	`long`
`uint8`	`255`	`0x06`	1 byte	`u8`	`byte`
`uint16`	`65535`	`0x07`	2 bytes	`u16`	`ushort`
`uint` / `uint32`	`100000`	`0x08`	4 bytes	`u32`	`uint`
`uint64`	`18446744073709551615`	`0x09`	8 bytes	`u64`	`ulong`
`float32`	`3.14`	`0x0A`	4 bytes	`f32`	`float`
`float` / `float64`	`3.14`	`0x0B`	8 bytes	`f64`	`double`
`string`	`"hello"` / `hello`	`0x10`	4 bytes (index)	`String`	`string`
`bytes`	`b"cafef00d"`	`0x11`	varint + data	`Vec<u8>`	`byte[]`
`json_number`	(from JSON)	`0x12`	4 bytes (index)	`String`	`string`
`timestamp`	`2024-01-15T10:30:00Z`	`0x32`	10 bytes	`(i64, i16)`	`DateTimeOffset`

Special Types

TeaLeaf Type	Text Syntax	Binary Code	Description
`null`	`~`	`0x00`	Null/missing value

Container Types

TeaLeaf Type	Text Syntax	Binary Code	Description
Array	`[1, 2, 3]`	`0x20`	Ordered collection
Object	`{key: value}`	`0x21`	String-keyed map
Struct	`(val, val, ...)` in `@table`	`0x22`	Schema-typed record
Map	`@map {key: value}`	`0x23`	Any-keyed ordered map
Tuple	`(val, val, ...)`	`0x24` (reserved)	Currently parsed as array

Semantic Types

TeaLeaf Type	Text Syntax	Binary Code	Description
Ref	`!name`	`0x30`	Named reference
Tagged	`:tag value`	`0x31`	Discriminated value

Type Modifiers

Modifier	Syntax	Description
Nullable	`type?`	Field can be `~` (null)
Array	`[]type`	Array of the given type
Nullable array	`[]type?`	The field itself can be null

Type Widening Path

int8 → int16 → int32 → int64
uint8 → uint16 → uint32 → uint64
float32 → float64

Widening is automatic when reading binary data. Narrowing requires recompilation.

JSON Mapping

TeaLeaf Type	JSON Output	JSON Input
Null	`null`	`null` → Null
Bool	`true`/`false`	boolean → Bool
Int	number	integer → Int
UInt	number	large integer → UInt
Float	number	decimal → Float
String	`"text"`	string → String
Bytes	`"0xhex"`	(not auto-detected)
JsonNumber	number	large/precise number → JsonNumber
Timestamp	`"ISO 8601"`	(not auto-detected)
Array	`[...]`	array → Array
Object	`{...}`	object → Object
Map	`[[k,v],...]`	(not auto-detected)
Ref	`{"$ref":"name"}`	(not auto-detected)
Tagged	`{"$tag":"t","$value":v}`	(not auto-detected)

Comparison Matrix

How TeaLeaf compares to other data formats.

Feature Comparison

Feature	JSON	YAML	Protobuf	Avro	MsgPack	CBOR	TeaLeaf
Human-readable text	Yes	Yes	No*	No	No	No	Yes
Compact binary	No	No	Yes	Yes	Yes	Yes	Yes
Schema in text	No	No	External	External	No	No	Inline
Schema in binary	No	No	No	Yes	No	No	Yes
No codegen required	Yes	Yes	No	Partial	Yes	Yes	Yes
Comments	No	Yes	N/A	N/A	No	No	Yes
Built-in JSON conversion	–	–	No	No	No	No	Yes
String deduplication	No	No	No	No	No	No	Yes
Per-section compression	No	No	No	Yes	No	No	Yes
Null bitmaps	No	No	No	Yes	No	No	Yes
Random-access reading	No	No	No	No	No	No	Yes

*Protobuf TextFormat exists but is rarely used.

Size Comparison

Format	Small Object	10K Points	1K Users
JSON	1.00x	1.00x	1.00x
YAML	~1.1x	~1.1x	~1.1x
Protobuf	0.38x	0.65x	0.41x
MessagePack	0.35x	0.63x	0.38x
CBOR	~0.40x	~0.65x	~0.42x
TeaLeaf Binary	3.56x	0.15x	0.47x

Speed Comparison

Operation	JSON (serde)	Protobuf	MsgPack	TeaLeaf
Parse text	Fast	N/A	N/A	Moderate
Decode binary	N/A	Fast	Fast	Moderate
Encode text	Fast	N/A	N/A	Moderate
Encode binary	N/A	Fast	Fast	Moderate
Random key access	O(n) parse	O(1) generated	N/A	O(1) hash

When to Use Each Format

Use TeaLeaf When

Scenario	Why
LLM context / prompts	Schema-first reduces token count
Config files (human-edited + deployed)	Text for editing, binary for deployment
Large tabular data	6-7x compression with string dedup
Self-describing data exchange	No external schema files needed
Game save data / asset manifests	Compact, nested, self-describing
Scientific/sensor data	Null bitmaps for sparse data

Use JSON When

Scenario	Why
Web APIs / REST	Universal support
Small payloads (< 1 KB)	No overhead
JavaScript-heavy applications	Native parsing
Human-only data (no binary needed)	Simpler tooling

Use Protobuf When

Scenario	Why
RPC / gRPC services	First-class streaming support
Maximum decode speed	Generated code with known offsets
Schema evolution at scale	Field numbers + backward compat
Microservice communication	Established ecosystem

Use Avro When

Scenario	Why
Hadoop / big data pipelines	Ecosystem integration
Schema registry workflows	Built-in evolution
Large-scale data lake storage	Block compression

Use MessagePack / CBOR When

Scenario	Why
Tiny payloads (< 100 bytes)	Minimal overhead
Schemaless binary	No schema definition needed
Drop-in JSON replacement	Similar data model

Ecosystem Maturity

Aspect	JSON	Protobuf	Avro	TeaLeaf
Language support	Universal	10+ languages	5+ languages	Rust, .NET
Tooling	Extensive	Extensive	Moderate	CLI + libraries
Community	Massive	Large	Medium	Early
Specification maturity	RFC 8259	Stable (proto3)	Apache spec	Beta
IDE support	Universal	Plugins	Plugins	Planned

TeaLeaf is a young format (v2.0.0-beta.8). It fills a specific niche that existing formats don’t serve well – but it doesn’t aim to replace established formats in their core use cases.

Specification Governance

How the TeaLeaf specification, implementation, and tests relate to each other.

Two Sources of Truth

TeaLeaf has a prose specification and an executable specification:

	Prose Spec	Executable Spec
Location	`spec/TEALEAF_SPEC.md`	`canonical/` test suite
Format	Markdown document	`.tl` samples + expected JSON + pre-compiled `.tlbx`
Enforced by	Human review	CI (automated on every push and PR)
Covers	Full grammar, type system, binary layout	14 feature areas, 52 tests (42 success + 10 error)

The canonical test suite is the normative specification. If the prose spec and the tests disagree, the tests are authoritative. The prose spec is documentation that describes intent and rationale.

What the Canonical Suite Validates

Each canonical sample is tested through three paths:

Text (.tl) ──────────────────────────────► JSON    (compare with expected/)
Binary (.tlbx) ──────────────────────────► JSON    (compare with expected/)
Text (.tl) ──► Binary (.tlbx) ──► Read ──► JSON    (full round-trip)

The 14 sample files cover:

File	Coverage
`primitives.tl`	Null, bool, int, float, string, escape sequences
`arrays.tl`	Empty, typed, mixed, nested arrays
`objects.tl`	Empty, simple, nested, deeply nested objects
`schemas.tl`	`@struct`, `@table`, nested structs, nullable fields
`special_types.tl`	References, tagged values, maps, edge cases
`timestamps.tl`	ISO 8601 variants, timezones, milliseconds
`numbers_extended.tl`	Hex, binary, scientific notation, i64 limits
`unions.tl`	`@union`, empty/multi-field variants
`multiline_strings.tl`	Triple-quoted strings, auto-dedent, code blocks
`unicode_escaping.tl`	CJK, Cyrillic, Arabic, emoji, ZWJ sequences
`refs_tags_maps.tl`	References, tagged values, maps, compositions
`mixed_schemas.tl`	Schema-bound and schemaless data together
`large_data.tl`	Stress tests: 100+ element arrays, deep nesting, long strings
`cyclic_refs.tl`	Reference cycles, forward references, self-references

Error tests in canonical/errors/ validate that invalid input produces specific, stable error messages across all interfaces (CLI, FFI, .NET).

Change Process

Adding New Behavior

When adding new syntax, types, or features:

Design – Describe the change in an issue or PR description
Implement – Modify the parser/encoder/decoder in tealeaf-core
Add canonical tests – Create or extend a sample in canonical/samples/, generate expected JSON and binary fixtures
Update the prose spec – Update spec/TEALEAF_SPEC.md to document the new behavior
CI validates – All three round-trip paths must pass

A PR that adds implementation without canonical tests is incomplete. A PR that updates the prose spec without tests is documentation-only and does not change behavior.

Modifying Existing Behavior

Behavior changes fall into two categories:

Non-breaking (output changes, error message improvements):

Update canonical expected outputs (canonical/expected/*.json)
Update error golden tests if error messages changed
Update the prose spec

Breaking (syntax changes, binary format changes, type system changes):

Requires a version bump in release.json
Regenerate all binary fixtures (canonical/binary/*.tlbx)
Update the prose spec with a clear note about the breaking change
Binary format changes must update the format version constant in writer.rs

Error Message Stability

Error messages are part of the public contract. The canonical/errors/ directory contains invalid input files paired with expected error messages in expected_errors.json. Changes to error text should be noted in the changelog and may require downstream consumers to update.

What Is Not Covered

The canonical suite focuses on the core format. These areas rely on their own test suites:

Area	Test Location	Notes
CLI flags and output formatting	`tealeaf-core/tests/cli_integration.rs`	Tests CLI behavior, not format correctness
Derive macros (Rust)	`tealeaf-core/tests/derive.rs`	Tests DTO conversion, not parsing
FFI memory management	`tealeaf-ffi` unit tests	Tests allocation/deallocation, not format
.NET source generator	`TeaLeaf.Generators.Tests`	Tests code generation, not format
.NET serialization	`TeaLeaf.Tests`	Tests managed-to-native bridge
Accuracy benchmark	`accuracy-benchmark`	Tests LLM accuracy, not format

Spec Versioning

The format version is embedded in the binary header (see writer.rs). The prose spec documents the current version. When the binary format changes in a backward-incompatible way:

The format version constant in writer.rs must be incremented
The reader (reader.rs) should handle both old and new versions where feasible
All binary fixtures in canonical/binary/ must be regenerated
The prose spec must document the version change

The project version (release.json) and the binary format version are independent. A project version bump does not necessarily mean a format version bump, and vice versa.

Changelog

v2.0.0-beta.8 (Current)

.NET

XML documentation in NuGet packages — TeaLeaf and TeaLeaf.Annotations packages now include XML doc files (TeaLeaf.xml, TeaLeaf.Annotations.xml) for all target frameworks. Consumers get IntelliSense tooltips for all public APIs. Previously, GenerateDocumentationFile was not enabled and the .xml files were absent from the .nupkg.
Added XML doc comments to all undocumented public members: TLType enum values (13), TLDocument.ToString/Dispose, TLReader.Dispose, TLField.ToString, TLSchema.ToString, TLException constructors (3)
Enabled TreatWarningsAsErrors for TeaLeaf and TeaLeaf.Annotations — missing XML docs or other warnings are now compile errors, preventing regressions

Testing

Added ToJson_PreservesSpecialCharacters_NoUnicodeEscaping — verifies +, <, >, ' survive binary round-trip without Unicode escaping in both ToJson() and ToJsonCompact() paths
Added ToJson_PreservesFloatDecimalPoint_WholeNumbers — verifies whole-number floats (99.0, 150.0, 0.0) retain .0 suffix and non-whole floats (4.5, 3.75) preserve decimal digits

v2.0.0-beta.7

.NET

Fixed TLReader.ToJson() escaping non-ASCII-safe characters — + in phone numbers rendered as \u002B, </> as \u003C/\u003E, etc. System.Text.Json’s default JavaScriptEncoder.Default HTML-encodes these characters for XSS safety, which is inappropriate for a data serialization library. All three JSON serialization methods (ToJson, ToJsonCompact, GetAsJson) now use JavaScriptEncoder.UnsafeRelaxedJsonEscaping via shared static readonly options.
Fixed TLReader.ToJson() dropping .0 suffix from whole-number floats — 3582.0 in source JSON became 3582 after binary round-trip because System.Text.Json’s JsonValue.Create(double) strips trailing .0. Added FloatToJsonNode helper that uses F1 formatting for whole-number doubles, preserving formatting fidelity with the Rust CLI path.

v2.0.0-beta.6

Features

Recursive array schema inference in JSON import — from_json_with_schemas now discovers schemas for arrays nested inside objects at arbitrary depth (e.g., items[].product.stock[]). Previously, analyze_nested_objects only recursed into nested objects but not nested arrays, causing deeply nested arrays to fall back to []any. The CLI and derive-macro paths now produce equivalent schema coverage.
Deterministic schema declaration order — analyze_array and analyze_nested_objects now use single-pass field-order traversal (depth-first), matching the derive macro’s field-declaration-order strategy. Previously, both functions made two separate passes (arrays first, then objects), causing schema declarations to appear in a different order than the derive/Builder API path. CLI and Builder API now produce byte-identical .tl output for the same data.

Bug Fixes

Fixed binary encoding corruption for []any typed arrays — encode_typed_value incorrectly wrote TLType::Struct as the element type for the “any” pseudo-type (the to_tl_type() default for unknown names), causing the reader to interpret heterogeneous data as struct schema indices. Arrays with mixed element types inside schema-typed objects (e.g., order.customer, order.payment) now correctly use heterogeneous 0xFF encoding when no matching schema exists.

Tooling

Version sync scripts (sync-version.ps1, sync-version.sh) now regenerate the workflow diagram (assets/tealeaf_workflow.png) via generate_workflow_diagram.py on each version bump

Testing

Added json_any_array_binary_roundtrip — focused regression test verifying []any fields inside schema-typed structs survive binary compilation with full data integrity verification
Added retail_orders_json_binary_roundtrip — end-to-end test exercising JSON → infer schemas → compile → binary read with retail_orders.json (the exact path that was untested)
Added .NET FromJson_HeterogeneousArrayInStruct_BinaryRoundTrips — mirrors the Rust []any regression test through the FFI layer
Strengthened .NET FromJson_RetailOrdersFixture_CompileRoundTrips — upgraded from string-contains check to structural JSON verification (10 orders, 4 products, 3 customers, spot-check order ID and item count)
Added json_inference_nested_array_inside_object — verifies arrays nested inside objects (e.g., items[].product.stock[]) get their own schema and typed array fields
Added gen_retail_orders_api_tl derive integration test — generates .tl from Rust DTOs via Builder API and confirms byte-identical output with CLI path
Added examples/retail_orders_different_shape_cli.tl and retail_orders_different_shape_api.tl comparison fixtures (2,395 bytes each, zero diff)
Moved retail_orders_different_shape.rs from examples/ to tealeaf-core/tests/fixtures/ to keep test dependencies within the crate boundary
Verified all 7 fuzz targets pass (~566K total runs, zero crashes)

v2.0.0-beta.5

Features

Schema-aware serialization for Builder API — to_tl_with_schemas() now produces compact @table output for documents built via TeaLeafBuilder with derive-macro schemas. Previously, PascalCase schema names from #[derive(ToTeaLeaf)] (e.g., SalesOrder) didn’t match the serializer’s singularize() heuristic (e.g., "orders" → "order"), causing all arrays to fall back to verbose [{k: v}] format. The serializer now resolves schemas via a 4-step chain: declared type from parent schema → singularize → case-insensitive singularize → structural field matching.

Bug Fixes

Fixed schema inference name collision when a field singularizes to the same name as its parent array’s schema — prevented self-referencing schemas (e.g., @struct root (root: root)) and data loss during round-trip (found via fuzzing)
Fixed @table serializer applying wrong schema when the same field name appears at multiple nesting levels with different object shapes — serializer now validates schema fields match the actual object keys before using positional tuple encoding

Testing

Added 8 Rust regression tests for schema name collisions: fuzz_repro_dots_in_field_name, schema_name_collision_field_matches_parent, analyze_node_nesting_stress_test, schema_collision_recursive_arrays, schema_collision_recursive_same_shape, schema_collision_three_level_nesting, schema_collision_three_level_divergent_leaves, all_orders_cli_vs_api_roundtrip
Added derive integration test test_builder_schema_aware_table_output — verifies Builder API with 5 nested PascalCase schemas produces @table encoding and round-trips correctly
Verified all 7 fuzz targets pass (~445K total runs, zero crashes)

v2.0.0-beta.4

Bug Fixes

Fixed binary encoding crash when compiling JSON with heterogeneous nested objects — from_json_with_schemas infers any pseudo-type for fields whose nested objects have varying shapes; the binary encoder now falls back to generic encoding instead of erroring with “schema-typed field ‘any’ requires a schema”
Fixed parser failing to resolve schema names that shadow built-in type keywords — schemas named bool, int, string, etc. now correctly resolve via LParen lookahead disambiguation (struct tuples always start with (, primitives never do)
Fixed singularize() producing empty string for single-character field names (e.g., "s" → "") — caused @struct definitions with missing names and unparseable TL text output
Fixed validate_tokens.py token comparison by converting API input to int for safety

.NET

Added TLValueExtensions with GetRequired() extension methods for TLValue and TLDocument — provides non-nullable access patterns, reducing CS8602 warnings in consuming code
Added TL007 diagnostic: [TeaLeaf] classes in the global namespace now produce a compile-time error (“TeaLeaf type must be in a named namespace”)
Removed SuppressDependenciesWhenPacking property from TeaLeaf.Generators.csproj
Exposed InternalsVisibleTo for TeaLeaf.Tests

CI/CD

Re-enabled all 6 GitHub Actions workflows after making the repository public (rust-cli, dotnet-package, accuracy-benchmark, docs, coverage, fuzz)
Fixed coverlet filter quoting in coverage workflow — commas URL-encoded as %2c to prevent shell argument splitting
Fixed Codecov token handling — made CODECOV_TOKEN optional for public repo tokenless uploads
Fixed Codecov multi-file upload format — changed from YAML block scalar to comma-separated single-line
Refactored coverage workflow to use dotnet-coverage with dedicated settings XML files
Added CodeQL security analysis workflow
Fixed accuracy-benchmark workflow permissions

Testing

Added Rust regression test for any pseudo-type compile round-trip
Added 21 Rust tests for schema names shadowing all built-in type keywords (bool, int, int8..int64, uint..uint64, float, float32, float64, string, timestamp, bytes) — covers JSON inference round-trip, direct TL parsing, self-referencing schemas, duplicate declarations, and multiple built-in-named schemas in one document
Added 4 .NET regression tests covering TLDocument.FromJson → Compile with heterogeneous nested objects, mixed-structure arrays, complex schema inference, and retail_orders.json end-to-end
Added .NET tests for JSON serialization of timestamps and byte arrays
Added .NET coverage tests for multi-word enums and nullable nested objects
Added .NET source generator tests (524 new lines in GeneratorTests.cs) including TL007 global namespace diagnostic
Added .NET TLValue.GetRequired() extension method tests
Added .NET TLReader binary reader tests (168 new lines)
Added cross-platform FindRepoFile helper for .NET test fixture discovery (walks up directory tree instead of hardcoded relative path depth)
Verified full .NET test suite on Linux (WSL Ubuntu 24.04)

Tooling

Added --version / -V CLI flag
Added delete-caches.ps1 and delete-caches.sh GitHub Actions cache cleanup scripts
Updated coverage.ps1 to support dotnet-coverage collection with XML settings files

Documentation

Updated binary deserialization method names in quick-start, LLM context guide, schema evolution guide, and derive macros docs
Updated tealeaf workflow diagram

v2.0.0-beta.3

Features

Byte literals — b"..." hex syntax for byte data in text format (e.g., payload: b"cafef00d")
Arbitrary-precision numbers — Value::JsonNumber preserves exact decimal representation for numbers exceeding native type ranges
Insertion order preservation — IndexMap replaces HashMap for all user-facing containers; JSON round-trips now preserve original key order (ADR-0001)
Timestamp timezone support — Timestamps encode timezone offset in minutes (10 bytes: 8 millis + 2 offset); supports Z, +HH:MM, -HH:MM, +HH formats
Special float values — NaN, inf, -inf keywords for IEEE 754 special values (JSON export converts to null)
Extended escape sequences — \b (backspace), \f (form feed), \uXXXX (Unicode code points) for full JSON string escape parity
Forward compatibility — Unknown directives silently ignored, enabling older implementations to partially parse files with newer features (spec §1.18)

Bug Fixes

Fixed bounds check failures and bitmap overflow issues in binary decoder
Fixed lexer infinite loop on certain malformed inputs (found via fuzzing)
Fixed NaN value quoting causing incorrect round-trip behavior
Fixed parser crashes on deeply nested structures
Fixed integer overflow in varint decoding
Fixed off-by-one errors in array length checks
Fixed negative hex/binary literal parsing
Fixed exponent-only numbers (e.g., 1e3) to parse as floats, not integers
Fixed timestamp timezone parsing to accept hour-only offsets (+05 = +05:00)
Rejected value-only types (object, map, tuple, ref, tagged) as schema field types per spec §2.1
Fixed .NET package publishing for TeaLeaf.Annotations and TeaLeaf.Generators to NuGet

Performance

Removed O(n log n) key sorting from all serialization paths: 6-17% faster for small/medium objects, up to 69% faster for tabular data
Binary decode 56-105% slower for generic object workloads due to IndexMap insertion cost (acceptable trade-off per ADR-0001; columnar workloads less affected)

Specification

Schema table header byte +6 stores Union Count (was reserved)
String table length encoding changed from u16 to u32 for strings > 65KB
Added type code 0x12 for JSONNUMBER
Timestamp encoding extended to 10 bytes (8 millis + 2 offset)
Added bytes_lit grammar production; extended number to include NaN/inf/-inf
Documented object, map, ref, tagged as value-only types (not valid in schema fields)
Resolved compression algorithm spec contradiction: binary format v2 uses ZLIB (deflate), not zstd (ADR-0004)

Tooling

Fuzzing infrastructure — 7 cargo-fuzz targets with custom dictionaries and structure-aware generation (ADR-0002)
Fuzzing CI workflow — GitHub Actions runs all targets for 120s each (~15 min per run)
Nesting depth limit — 256-level max for stack overflow protection (ADR-0003)
VS Code extension — Syntax highlighting for .tl files (vscode-tealeaf/)
FFI safety — Comprehensive # Safety docs on all FFI functions; regenerated tealeaf.h
Token validation — validate_tokens.py script validates API-reported token counts against tiktoken
Maintenance scripts — delete-deployments and delete-workflow-runs for GitHub cleanup

Testing

238+ adversarial tests for malformed binary input
333+ .NET edge case tests for FFI boundary conditions
Property-based tests with depth-bounded recursive generation
Accuracy benchmark token savings updated to ~36% fewer data tokens (validated with tiktoken)

Documentation

ADR-0001: IndexMap for Insertion Order Preservation
ADR-0002: Fuzzing Architecture and Strategy
ADR-0003: Maximum Nesting Depth Limit (256)
ADR-0004: ZLIB Compression for Binary Format
Code of Conduct, SECURITY.md, GitHub issue/PR templates
examples/showcase.tl — 736-line comprehensive format demonstration
Sample accuracy benchmark results

Breaking Changes

Value::Object uses IndexMap<String, Value> instead of HashMap (type alias ObjectMap provided; From<HashMap> retained for backward compatibility)
Value::Timestamp(i64) → Value::Timestamp(i64, i16) — second field is timezone offset in minutes
Value::JsonNumber(String) variant added — match expressions on Value need new arm
Binary timestamps not backward-compatible (beta.2 readers cannot decode beta.3 timestamps; beta.3 readers handle beta.2 files by defaulting offset to UTC)
JSON round-trips preserve key order instead of alphabetizing

v2.0.0-beta.2

Format

@union definitions now encoded in binary schema table (full text-binary-text roundtrip)
Union schema region uses backward-compatible extension of schema table header
Derive macro collect_unions() generates union definitions for Rust enums
TeaLeafBuilder::add_union() for programmatic union construction

Improvements

Version sync automation expanded to cover all project files (16 targets)
NuGet package icon added to all NuGet packages (TeaLeaf, Annotations, Generators)
CI badges added to README (Rust CI, .NET CI, crates.io, NuGet, codecov, License)
crates.io publish ordering fixed (tealeaf-derive before tealeaf-core)
Contributing guide added (CONTRIBUTING.md)
Spec governance documentation added
Accuracy benchmark dump-prompts subcommand for offline prompt inspection
TeaLeaf.Annotations published as separate NuGet package (fixes dependency resolution)
benches_proto/ excluded from crates.io package (removes protoc requirement for consumers)

v2.0.0-beta.1

Initial public beta release.

Format

Text format (.tl) with comments, schemas, and all value types
Binary format (.tlbx) with string deduplication, schema embedding, and per-section compression
15 primitive types + 6 container/semantic types
Inline schemas with @struct, @table, @map, @union
References (!name) and tagged values (:tag value)
File includes (@include)
ISO 8601 timestamp support
JSON bidirectional conversion with schema inference

CLI

8 commands: compile, decompile, info, validate, to-json, from-json, tlbx-to-json, json-to-tlbx
Pre-built binaries for 7 platforms (Windows, Linux, macOS – x64 and ARM64)

Rust

tealeaf-core crate with full parser, compiler, and reader
tealeaf-derive crate with #[derive(ToTeaLeaf, FromTeaLeaf)]
Builder API (TeaLeafBuilder)
Memory-mapped binary reading
Conversion traits with automatic schema collection

.NET

TeaLeaf NuGet package with native libraries for all platforms
C# incremental source generator ([TeaLeaf] attribute)
Reflection-based serializer (TeaLeafSerializer)
Managed wrappers (TLDocument, TLValue, TLReader)
Schema introspection API
Diagnostic codes TL001-TL006

FFI

C-compatible API via tealeaf-ffi crate
45+ exported functions
Thread-safe error handling
Null-safe for all pointer parameters
C header generation via cbindgen

Known Limitations

~~Bytes type does not round-trip through text format~~ (resolved: b"..." hex literals added)
JSON import does not recognize $ref, $tag, or timestamp strings
Individual string length limited to ~4 GB (u32) in binary format
64-byte header overhead makes TeaLeaf inefficient for very small objects

Keyboard shortcuts

TeaLeaf Documentation