Changelog

v2.0.0-beta.14 (Current)

Features

null keyword for explicit null values — The text format now supports null as a keyword alongside ~ (tilde). In most contexts they are interchangeable. In @table tuples, they have distinct semantics:
- ~ — absent field (the field was not present in the source object). For nullable fields, the parser drops the field entirely from the reconstructed object. For non-nullable fields, it is preserved as null.
- null — explicit null (the field was present with a null value in the source JSON). Always preserved as null in the output regardless of field nullability.
- The text serializer (write_tuple) writes null for fields that are explicitly null and ~ for fields that are absent.
- Lexer: new ExplicitNull token for the null keyword, distinct from Null (~).
Two-bit field state encoding in binary format — The binary struct array encoding now uses a two-bitmap layout (lo + hi) instead of the previous single null bitmap. Each field uses a two-bit state code: code = lo_bit(i) | (hi_bit(i) << 1):
- 0 (lo=0, hi=0) — has value: field data follows inline
- 1 (lo=1, hi=0) — explicit null: always preserved as null in output
- 2 (lo=0, hi=1) — absent: dropped for nullable fields, preserved as null for non-nullable fields
- Bitmap size per row is 2 × bms bytes (where bms = ceil(field_count / 8)), stored as lo bitmap followed by hi bitmap
- A null array element has all fields set to code=2 (lo bits all zero, hi bits all set)
- Applied to all 4 bitmap sites: encode_struct_array, encode_typed_value(Struct), decode_struct_array, decode_struct

Performance

Hex encoding optimization — Replaced format!("{:02x}") per-byte hex encoding with lookup-table-based push_hex_byte()/push_hex_bytes() helpers. Applies to bytes literals in text serialization (b"...") and JSON export ("0x..." strings). Eliminates allocation per byte.
Zero-copy binary section reads — Reader::read_section() now uses Cow<[u8]> — uncompressed sections borrow directly from the memory-mapped file (Cow::Borrowed) instead of copying to a Vec<u8>. Compressed sections still allocate (Cow::Owned).
Schema lookup via HashMap — Binary writer now uses schema_map (HashMap by name) for O(1) schema resolution instead of O(n) linear scans via self.schemas.iter().find(). Affects encode_struct_array, encode_typed_value, and nested struct field encoding.
Reduced redundant schema resolution — write_tuple in the text serializer now calls resolve_schema() once per field and reuses the result for both array and non-array branches, eliminating duplicate lookups.

Specification

null keyword in grammar — Added "null" as a primitive production: primitive = string | bytes_lit | number | bool | "~" | "null"
Two-bit field state encoding — Updated §4.8 struct array encoding from single null bitmap to lo/hi two-bitmap layout with three-state field codes
Duplicate key semantics — Documented that duplicate keys in objects use last-value-wins (§1.6)
Reference semantics — Documented compile-time resolution, define-before-use, multi-use, circular reference handling, and document-global scope (§1.13)
Tagged value semantics — Documented schema-free and schema-typed tagged value behavior (§1.14)
Include semantics — Documented recursive includes, circular detection, 32-level depth limit (§1.16)
Map key restrictions — Documented hashable key type restriction (string, int, uint) for map keys (§1.12)
§1.19 Limits and Constraints — New section documenting all format limits: nesting depth (256), object fields (u16), array elements (u32), string length (u32), section count (u32), schema count (u16), max decompressed section (256 MB), and empty collection encoding
§1.20 File Encoding — New section documenting UTF-8 requirement, BOM handling, byte order, and whitespace rules
Reserved type code ranges — Documented reserved type code ranges that readers should treat as errors (§4.6)
String table lookup — Documented O(1) index-based access and linear scan for name-based lookups (§4.4)
Top-level array encoding rationale — Documented why only Int32 and String use homogeneous encoding at the top level (§2.4)
Type coercion rationale — Documented LLM context use case for silent coercion (§2.5)

Accuracy Benchmark

Real-world dataset expansion — Added 7 new real-world datasets across 7 domains: SEC EDGAR (finance), BLS JOLTS (HR/labor), CDC PLACES (healthcare), Wisconsin Circuit Court (legal), NVD CVEs (technology), NYC PLUTO (real estate), USDA FoodData (retail). Each domain includes data processing scripts, README documentation, and multiple task configurations.
Updated benchmark results — 14 tasks, 7 domains. ~51% input token savings (TL vs JSON), ~20% (TOON vs JSON). Accuracy within noise (TL 0.942 vs JSON 0.945 on Claude Sonnet 4.5).
Evidence archive — New evidence/2026-02-14-01/ with full prompts, responses, analysis.tl, summary.json, and charts for reproducibility.
Chart generation — Added scripts/generate_charts.py for accuracy and token comparison visualizations.

Documentation

ADR-0005: Direct Value Representation — Architecture decision record explaining why TeaLeaf uses direct value representation without an intermediate AST.
Sitemap — Added docs-site/src/sitemap.xml with Google search console integration for doc-site discoverability.
Updated accuracy benchmark documentation with real-world dataset details.

Canonical Fixtures

Updated schemas.json, mixed_schemas.json, quoted_keys.json — nullable fields with ~ in source now produce absent fields (no "field": null) in expected JSON output. Explicit null values are preserved.
Recompiled all .tlbx binary fixtures with new two-bit bitmap encoding: schemas.tlbx, mixed_schemas.tlbx, quoted_keys.tlbx, large_data.tlbx, timestamps.tlbx.

Testing

Added test_explicit_null_roundtrip_text_and_binary — regression test for crash input [{"key":null}] verifying explicit null preservation through both text and binary roundtrips
Verified all 8 fuzz targets pass (1,524,164 total runs at 240s each, zero crashes)
All 860 Rust tests + 560 .NET tests passing

Bug Fixes

Fuzz crash on explicit null in JSON — Fixed crash on input [{"k4e2-eRy4eye-24ey":null}] where explicit null values in JSON were silently dropped during roundtrip. The format now correctly distinguishes between absent fields and explicitly null fields in both text and binary representations.

Breaking Changes (Binary Format)

Binary bitmap layout change — The struct array null bitmap is now a two-bitmap (lo + hi) layout instead of a single bitmap. All .tlbx files compiled with beta.13 or earlier must be recompiled. Text .tl files are unaffected.

v2.0.0-beta.13

.NET

@table positional encoding for nested object arrays — Serializer (both reflection-based and source generator) now emits @table struct_name [(v1, v2), ...] for List<NestedType> instead of [{field: val, ...}]. Eliminates repeated field names, conforming to the TeaLeaf spec for schema-bound data.
- Reflection serializer: new WriteTupleValue(), WriteTupleList(), WriteTupleDictionary() methods; modified WriteList() and ToText<T>(IEnumerable<T>)
- Source generator: emits WriteTeaLeafTupleValue() on each [TeaLeaf(Generate = true)] class; modified list property emission to use @table format
- Handles all field types in tuples: primitives, strings, enums, nested objects (recursive tuples), lists, dictionaries, DateTime, byte[], Guid, TimeSpan
- Deserialization unaffected — Rust parser handles @table to objects transparently
Parameterized constructor deserialization — Both reflection serializer and source generator now support types without parameterless constructors (e.g., records, immutable DTOs).
- Matches constructor parameter names to property names (case-insensitive)
- Supports constructor default values (int score = 100)
- Sets remaining settable properties after construction
- Removed new() constraint from FromDocument<T>, FromValue<T>, FromText<T>, FromList<T>
Unmatched constructor fallback — When a parameterized constructor has parameters that don’t match any serialized property and have no default value, uses RuntimeHelpers.GetUninitializedObject() to bypass the constructor entirely and set properties via setters. Prevents NullReferenceException for types like SecurityGroupAttributes(string groupName) where the constructor parameter serves initialization logic unrelated to serialized fields.

Bug Fixes

String quoting for special characters — Added %, =, ?, ', !, and non-ASCII (> 0x7F) to the NeedsQuoting character list in both the reflection serializer (TeaLeafTextHelper) and source generator (TLTextEmitter). Strings containing these characters are now correctly double-quoted, preventing parse errors on round-trip.
@table trailing comma — Fixed trailing comma after the last tuple in @table output. Both reflection serializer and source generator now use a separator pattern (first_ flag) instead of a terminator pattern, producing valid @table syntax that parses without errors.

Testing

Added 11 reflection serializer tests for parameterized constructor deserialization: basic immutable types, list properties, mixed constructor + setter, default values, nested objects, enums, round-trip, FromDocument, FromList
Added 4 source generator tests: parameterized constructor code generation, mixed constructor/setter, constructor defaults, parameterless constructor backward compatibility
Added 12 source-generated quoting regression tests: %, non-ASCII (°C), =, ?, ' round-trip and serialization format assertions, @table no-trailing-comma, combined special chars
Added 13 reflection serializer quoting regression tests: same special character coverage plus Unicode emoji (❤), collection @table trailing comma
Added 8 NeedsQuoting unit tests for special character detection (%, =, ?, ', !, non-ASCII, safe identifiers)
Added 3 source generator @table code emission tests: List<NestedType> emits @table, deeply nested types emit WriteTeaLeafTupleValue, separator pattern verification
Added 9 deeply nested @table tests (both paths): TeamWithMembers → List<PersonWithAddress> → Address round-trip, text format assertions, empty/single-item edge cases, ToText<T>(IEnumerable<T>) collection round-trip
Added 21 compact/compactFloats round-trip tests (both paths): compact output smaller than pretty, compact round-trips correctly, compactFloats strips .0 from whole-number floats, fractional floats preserved, combined compact+compactFloats on @table data, nested objects, deeply nested tables, collection serialization

v2.0.0-beta.12

Features

Clap CLI migration — Replaced hand-rolled env::args() parsing with clap v4 derive macros. Provides auto-generated colored help, typo suggestions (e.g., compil → “did you mean ‘compile’?”), --help/-h on every subcommand, and --version/-V flag — all derived from annotated structs with zero manual usage text.
- Added clap = { version = "4", features = ["derive", "color"] } and clap_complete = "4" dependencies
- try_parse() pattern preserves exit code 1 for all errors (including no-args help via arg_required_else_help)
- Deleted print_usage() function — clap auto-generates help from doc comments
Shell completions subcommand — tealeaf completions <shell> generates tab-completion scripts for bash, zsh, fish, powershell, and elvish. Completes subcommands, flags, and help arguments.
get_path dot-path navigation — Navigate nested values using dot-path expressions with array indexing.
- Rust: TeaLeaf::get_path("order.items[0].name") at document level, Value::get_path("items[0].price") within values
- FFI: tl_document_get_path(doc, path) and tl_value_get_path(value, path)
- .NET: TLDocument.GetPath(path) and TLValue.GetPath(path) with null-safe returns
- Path syntax: key.field[N].field — dot-separated field access, [N] for array indexing
Object field iteration API — Enumerate object fields by index without knowing keys upfront.
- FFI: tl_value_object_len, tl_value_object_get_key_at, tl_value_object_get_value_at
- .NET: TLValue.ObjectFieldCount, GetObjectKeyAt(index), GetObjectValueAt(index), AsObject() for (key, value) tuples, Keys property, GetEnumerator() for foreach support on objects and arrays
Document schema introspection API — Inspect schema definitions at runtime without parsing text.
- FFI: tl_document_schema_count, tl_document_schema_name, tl_document_schema_field_count, tl_document_schema_field_name, tl_document_schema_field_type, tl_document_schema_field_nullable, tl_document_schema_field_is_array (7 new C functions)
- .NET: TLDocument.Schemas (IReadOnlyList), TLDocument.GetSchema(name), TLDocument.SchemaCount; TLSchema and TLField classes with Name, Type, IsNullable, IsArray properties
TLDocumentBuilder (.NET) — Fluent API for building multi-key documents with schema support.
- Scalar overloads: Add(key, string|int|long|double|float|bool|DateTimeOffset)
- List overloads: AddList(key, IEnumerable<string|int|double>)
- Object support: Add<T>(key, value) and AddList<T>(key, items) for [TeaLeaf]-attributed types
- Document merging: AddDocument(TLDocument) with automatic schema deduplication
- Builder pattern: all methods return this for chaining
- Build() produces a TLDocument with merged schemas

Bug Fixes

Schema deduplication in .NET source generator — Diamond dependency scenarios (type A and type B both reference type C) no longer emit duplicate @struct definitions. New CollectTeaLeafSchemas(StringBuilder, HashSet<string>) method performs dependency-order traversal with a shared emitted set. Cross-assembly fallback detects types from referenced assemblies compiled with older generator versions.
Fixed memory leak in TLValue.AsMap() where failed key/value pairs weren’t disposed

CLI

Improved --compact description: “Omit insignificant whitespace for token-efficient output” (was “Use compact single-line output”)
Improved --compact-floats description: “Write whole-number floats as integers (e.g. 42.0 becomes 42)” (was “Use compact float representation”)

Testing

Added 26 TLDocumentBuilder tests — single/multi-key documents, nested types, list support, AddDocument merging, JSON and binary round-trips, scalar overloads, method chaining
Added 14 GetPath .NET tests — document and value-level navigation across 6-level nested objects with arrays and mixed types
Added 15 adversarial get_path edge-case tests — empty string, nested fields, array indexing (valid/out-of-bounds/negative), unclosed brackets, empty brackets, non-numeric indices, double dots, leading/trailing dots, huge index overflow, scalar rejection
Added fuzz_get_path fuzzer target — dedicated fuzzer splitting input between document and path strings, testing both TeaLeaf::get_path and Value::get_path
Updated fuzz_structured fuzzer — added get_path coverage with arbitrary and well-formed paths
Added 2 source generator deduplication tests — SharedNestedType_DeduplicatesSchemas (diamond) and DeepDiamondDependency_DeduplicatesSchemas
Added .NET tests for GetEnumerator() on objects/arrays/scalars and Keys property
Updated 3 CLI integration test assertions for clap error message format ("Unknown command" → "frobnicate" substring, "unrecognized subcommand" negative check)

Documentation

New completions CLI reference page — usage, arguments, install instructions for 5 shells, completion coverage details
Updated CLI overview — added completions to command table
Updated SUMMARY.md — added completions to CLI Reference navigation
Updated README.md — added completions to command list
Updated TEALEAF_SPEC.md — added completions under new “Utilities” subsection in Section 7

v2.0.0-beta.11

Features

Optional/nullable field support in schema inference — analyze_array() and analyze_nested_objects() no longer require strict field uniformity across all objects. Fields not present in every object are automatically marked nullable (?), enabling schema inference for real-world datasets where records share most fields but differ in optional ones (e.g., DCAT metadata, API responses). Requires at least 1 field present in all objects.
- Union-based field collection: computes union of all field names across objects in an array, tracks per-field presence counts, marks fields below 100% presence as nullable
- New object_matches_schema() helper allows nullable fields to be absent when matching objects to schemas — used in write_value_with_schemas() verification and InferredType::to_field_type() schema lookup
- InferredType::Object::merge() now keeps intersection of common fields instead of returning Mixed when field counts differ
- Nested array analysis collects ALL items from ALL parent objects (not just first representative), discovering the full field union across nested structures
- Structural @table fallback guarded with hint_name.is_some() || declared_type.is_some() to prevent matching inside tuple values
Absent-field preservation — Nullable fields with null/absent values (~ in text, null-bit in binary) are omitted from the reconstructed object instead of being inserted as explicit null. This preserves the semantic distinction between “field absent” and “field explicitly null”, ensuring JSON → TL → JSON roundtrip does not add spurious "field": null entries for fields that were simply missing in the original.
- Text parser: parse_tuple_with_schema() skips inserting null for nullable fields
- Binary reader: both bitmap decode paths skip inserting null for nullable fields
Most-complete object field ordering — Schema inference now uses the field ordering from the object with the most fields (the “most representative” object) as the canonical schema field order, instead of first-seen ordering across all objects. Fields present only in other objects are appended at the end. This preserves the original JSON field ordering for the majority of records during roundtrip.

Bug Fixes

Fixed JSON → TL → JSON roundtrip adding extra "field": null entries for nullable fields that were absent in the original data
Fixed field ordering loss during schema inference — first (often smallest) object determined schema field order, which didn’t match the majority of objects

Canonical Fixtures

Updated schemas.json, mixed_schemas.json, quoted_keys.json — nullable fields with ~ in source now produce absent fields (no "field": null) in expected JSON output
Recompiled mixed_schemas.tlbx to match updated binary reader behavior

Testing

Added test_schema_inference_optional_fields — array with optional field c marked as int?
Added test_schema_inference_optional_fields_roundtrip — JSON → TL → JSON with absent optional fields preserved
Added test_schema_inference_no_common_fields_skipped — [{x:1}, {y:2}] produces no schema (zero common fields)
Added test_schema_inference_single_common_field — [{id:1,a:2}, {id:3,b:4}] infers schema with id common
Added test_schema_inference_optional_nested_array — nested array field marked nullable when absent from some objects
Added test_schema_inference_optional_nested_object — nested object field marked nullable when absent
Added test_schema_inference_wa_health_data_pattern — pattern matching the WA health DCAT dataset structure
Added test_write_schemas_nullable_field_matching — @table applied correctly when objects are missing nullable fields
Added test_schema_field_ordering_uses_most_complete_object — most-fields object determines schema field order
Added test_schema_field_ordering_appends_extra_fields — fields from smaller objects appended after representative
Added test_schema_field_ordering_roundtrip_preserves_order — roundtrip preserves field ordering from most-complete object
Updated test_struct_with_nullable_field — expects absent field instead of explicit null
Updated .NET StructArray_Table_BinaryRoundTrip_PreservesData — expects null (absent) instead of TLType.Null for nullable fields in binary roundtrip

v2.0.0-beta.10

Features

Quoted field names in @struct definitions — Schema inference no longer skips arrays of objects when field names contain special characters (@, $, #, :, /, spaces, etc.). Previously, needs_quoting(field_name) in analyze_array() and analyze_nested_objects() rejected the entire array, falling back to verbose inline map notation. Now the guard only checks the schema name (which appears unquoted in @struct name(...) and @table name [...]); field names that need quoting are emitted with quotes in the definition (e.g., @struct record("@type":string, name:string)).
- Affects JSON-LD (@type, @id, @context), JSON Schema/OpenAPI ($ref, $id), OData (@odata.type, @odata.id), XML-to-JSON (#text, #comment), XML namespaces (xsi:type, dc:title), RDF/JSON (URI keys), and spreadsheet exports (space-separated keys)
- Parser (parse_struct_def()) now accepts TokenKind::String in addition to TokenKind::Word for field names
- Serialization required no changes — write_key() already quoted field names via needs_quoting()
Token counting script (accuracy-benchmark/scripts/count_tokens.py) — Python script that uses Anthropic’s messages.count_tokens() API to measure exact token usage for benchmark prompts across TL, JSON, and TOON formats without incurring completion costs. Generates per-task comparison table with TL vs JSON, TOON vs JSON, and TL vs TOON percentage columns, plus data-only token estimates (subtracting shared instruction overhead).

Canonical Fixtures

New: quoted_keys (15th canonical sample) — 8 schemas covering all special-character key patterns:
- jsonld_record — "@type", "@id" (JSON-LD)
- schema_def — "$ref", "$id" (JSON Schema / OpenAPI)
- xml_node — "#text", "#comment" (XML-to-JSON)
- ns_element — "xsi:type", "dc:title" (XML namespaces)
- odata_entity — "@odata.type", "@odata.id" (OData)
- rdf_triple — "http://schema.org/name", "http://schema.org/age" (RDF/JSON URIs)
- contact — "First Name", "Last Name" (space-separated keys)
- catalog_item — mixed quoted ("@type", "$id", "sku:code") and unquoted (name) fields

Testing

Added test_schema_inference_with_at_prefixed_keys — verifies JSON-LD @type fields trigger schema inference with quoted field names in @struct output
Added test_schema_inference_quoted_field_roundtrip — full JSON → TL → JSON roundtrip with @type keys
Added test_schema_inference_skips_when_schema_name_needs_quoting — confirms @items array key (which would produce schema name @item) correctly skips inference
Added test_schema_inference_root_array_with_at_keys — root-level @root-array with @type keys
Added test_schema_inference_dollar_prefixed_keys — $ref, $id (JSON Schema / OpenAPI)
Added test_schema_inference_hash_prefixed_keys — #text, #comment (XML-to-JSON)
Added test_schema_inference_colon_in_keys — xsi:type, dc:title (XML namespaces)
Added test_schema_inference_odata_keys — @odata.type, @odata.id (OData)
Added test_schema_inference_uri_keys — http://schema.org/name (RDF/JSON full URIs)
Added test_schema_inference_space_in_keys — First Name, Last Name (spreadsheet exports)
Added test_schema_inference_mixed_special_keys — @type + $id + sku:code + unquoted name in one schema, verifying only special-character fields are quoted
Added test_parse_struct_with_quoted_fields — parser accepts @struct foo("@type":string, name:string) and correctly deserializes @table rows
Added 5 canonical tests: quoted_keys text-to-JSON, binary roundtrip, full roundtrip (text → binary → JSON), compact roundtrip, compact-is-not-larger

Known Limitations (Resolved)

~~JSON import does not recognize $ref, $tag, or timestamp strings~~ — $ref and other special-character keys now get full schema inference and positional compression (timestamps and $tag remain unrecognized)

v2.0.0-beta.9

Features

FormatOptions struct with compact_floats — new FormatOptions struct replaces bare compact: bool throughout the serialization pipeline, adding compact_floats to strip .0 from whole-number floats (e.g., 35934000000.0 → 35934000000). Saves additional characters/tokens for financial and scientific datasets. Trade-off: re-parsing produces Int instead of Float for whole-number values.
- Rust: FormatOptions::compact().with_compact_floats() via doc.to_tl_with_options(&opts)
- CLI: --compact-floats flag on from-json and decompile commands
- FFI: tl_document_to_text_with_options(doc, compact, compact_floats) and tl_document_to_text_data_only_with_options(doc, compact, compact_floats)
- .NET: unified doc.ToText(compact, compactFloats, ignoreSchemas) with all-optional parameters
Lexer/parser refactor for colon handling — removed Tag(String) token kind from lexer; colon now always emits as Colon token. Tags (:name value) are parsed in the parser as Colon + Word sequence. This enables no-space syntax: key:value and key::tag now parse correctly without requiring a space after the colon.

.NET

Source generator: ResolveStructName() — nested types now honor [TeaLeaf(StructName = "...")] attribute when resolving struct names for arrays, lists, and nested objects. Previously always used ToSnakeCase(type.Name), ignoring the override.
Unified ToText() and ToTextDataOnly() into a single ToText(bool compact = false, bool compactFloats = false, bool ignoreSchemas = false) method with all-optional parameters. Replaces the previous 4-method API (ToText, ToTextDataOnly, ToTextCompact, ToTextCompactDataOnly) with named-parameter calling style (e.g., doc.ToText(compact: true), doc.ToText(ignoreSchemas: true)).

Accuracy Benchmark

Real dataset support — benchmark now supports real-world datasets alongside synthetic data. Starting with finance domain using SEC EDGAR 2025 Q4 10-K filings (AAPL, MSFT, GOOGL, AMZN, NVDA).
Three-format comparison in analysis.tl — TLWriter now emits @struct schema definitions for all table types (api_response, analysis_result, comparison_result) and two new format comparison tables (format_accuracy, format_tokens) when --compare-formats is used. String values in table rows are properly quoted for valid TeaLeaf parsing.
Reorganized task data into synthetic-data/ subdirectories to separate from real datasets.
Added task configuration files (real.json, synthetic.json) for independent benchmark runs.
Added utility examples: convert_formats.rs, tl_roundtrip.rs, toon_roundtrip.rs.
convert_json_to_tl() now uses FormatOptions::compact().with_compact_floats() for maximum token savings.

Testing

Added test_format_float_compact_floats — verifies whole-number stripping, non-whole preservation, special value handling (NaN/inf), and scientific notation passthrough
Added test_dumps_with_compact_floats — integration test for FormatOptions with mixed int/float data
Added test_colon_then_word — verifies lexer emits Colon + Word instead of Tag for :Circle syntax
Added test_tagged_value_no_space_after_colon — verifies key::tag value parsing
Added test_key_value_no_space_after_colon — verifies key:value parsing without spaces
Added 4 source generator tests for ResolveStructName: nested type with StructName override, list/array of nested type with override, and default snake_case fallback
Added 4 .NET tests for compact text and unified ToText API: ToTextCompact_RemovesInsignificantWhitespace, ToTextCompact_WithSchemas_IsSmallerThanPretty, ToTextCompact_RoundTrips, ToText_IgnoreSchemas_ExcludesSchemas
Updated 4 fuzz targets (fuzz_serialize, fuzz_parse, fuzz_structured, fuzz_json_schemas) with compact and compact_floats roundtrip coverage and values_numeric_equal comparator for Float↔Int/UInt coercion

Documentation

Added Compact Floats: Intentional Lossy Optimization section to round-trip fidelity guide
Updated CLI docs for --compact-floats flag on decompile and from-json
Updated Rust overview with FormatOptions section and to_tl_with_options API
Updated FFI API reference with tl_document_to_text_with_options and tl_document_to_text_data_only_with_options
Updated .NET overview with new ToText/ToTextDataOnly overloads
Updated LLM context guide with FormatOptions examples and compaction options table
Updated crates.io and NuGet README files
Updated accuracy benchmark documentation with analysis.tl structure, three-format comparison tables, and latest benchmark results (~43% input token savings on real-world data)
Updated token savings claims across README.md, tealeaf-core/README.md, introduction.md, and CLAUDE.md to reflect latest benchmark data

v2.0.0-beta.8

.NET

XML documentation in NuGet packages — TeaLeaf and TeaLeaf.Annotations packages now include XML doc files (TeaLeaf.xml, TeaLeaf.Annotations.xml) for all target frameworks. Consumers get IntelliSense tooltips for all public APIs. Previously, GenerateDocumentationFile was not enabled and the .xml files were absent from the .nupkg.
Added XML doc comments to all undocumented public members: TLType enum values (13), TLDocument.ToString/Dispose, TLReader.Dispose, TLField.ToString, TLSchema.ToString, TLException constructors (3)
Enabled TreatWarningsAsErrors for TeaLeaf and TeaLeaf.Annotations — missing XML docs or other warnings are now compile errors, preventing regressions

Testing

Added ToJson_PreservesSpecialCharacters_NoUnicodeEscaping — verifies +, <, >, ' survive binary round-trip without Unicode escaping in both ToJson() and ToJsonCompact() paths
Added ToJson_PreservesFloatDecimalPoint_WholeNumbers — verifies whole-number floats (99.0, 150.0, 0.0) retain .0 suffix and non-whole floats (4.5, 3.75) preserve decimal digits

v2.0.0-beta.7

.NET

Fixed TLReader.ToJson() escaping non-ASCII-safe characters — + in phone numbers rendered as \u002B, </> as \u003C/\u003E, etc. System.Text.Json’s default JavaScriptEncoder.Default HTML-encodes these characters for XSS safety, which is inappropriate for a data serialization library. All three JSON serialization methods (ToJson, ToJsonCompact, GetAsJson) now use JavaScriptEncoder.UnsafeRelaxedJsonEscaping via shared static readonly options.
Fixed TLReader.ToJson() dropping .0 suffix from whole-number floats — 3582.0 in source JSON became 3582 after binary round-trip because System.Text.Json’s JsonValue.Create(double) strips trailing .0. Added FloatToJsonNode helper that uses F1 formatting for whole-number doubles, preserving formatting fidelity with the Rust CLI path.

v2.0.0-beta.6

Features

Recursive array schema inference in JSON import — from_json_with_schemas now discovers schemas for arrays nested inside objects at arbitrary depth (e.g., items[].product.stock[]). Previously, analyze_nested_objects only recursed into nested objects but not nested arrays, causing deeply nested arrays to fall back to []any. The CLI and derive-macro paths now produce equivalent schema coverage.
Deterministic schema declaration order — analyze_array and analyze_nested_objects now use single-pass field-order traversal (depth-first), matching the derive macro’s field-declaration-order strategy. Previously, both functions made two separate passes (arrays first, then objects), causing schema declarations to appear in a different order than the derive/Builder API path. CLI and Builder API now produce byte-identical .tl output for the same data.

Bug Fixes

Fixed binary encoding corruption for []any typed arrays — encode_typed_value incorrectly wrote TLType::Struct as the element type for the “any” pseudo-type (the to_tl_type() default for unknown names), causing the reader to interpret heterogeneous data as struct schema indices. Arrays with mixed element types inside schema-typed objects (e.g., order.customer, order.payment) now correctly use heterogeneous 0xFF encoding when no matching schema exists.

Tooling

Version sync scripts (sync-version.ps1, sync-version.sh) now regenerate the workflow diagram (assets/tealeaf_workflow.png) via generate_workflow_diagram.py on each version bump

Testing

Added json_any_array_binary_roundtrip — focused regression test verifying []any fields inside schema-typed structs survive binary compilation with full data integrity verification
Added retail_orders_json_binary_roundtrip — end-to-end test exercising JSON → infer schemas → compile → binary read with retail_orders.json (the exact path that was untested)
Added .NET FromJson_HeterogeneousArrayInStruct_BinaryRoundTrips — mirrors the Rust []any regression test through the FFI layer
Strengthened .NET FromJson_RetailOrdersFixture_CompileRoundTrips — upgraded from string-contains check to structural JSON verification (10 orders, 4 products, 3 customers, spot-check order ID and item count)
Added json_inference_nested_array_inside_object — verifies arrays nested inside objects (e.g., items[].product.stock[]) get their own schema and typed array fields
Added gen_retail_orders_api_tl derive integration test — generates .tl from Rust DTOs via Builder API and confirms byte-identical output with CLI path
Added examples/retail_orders_different_shape_cli.tl and retail_orders_different_shape_api.tl comparison fixtures (2,395 bytes each, zero diff)
Moved retail_orders_different_shape.rs from examples/ to tealeaf-core/tests/fixtures/ to keep test dependencies within the crate boundary
Verified all 7 fuzz targets pass (~566K total runs, zero crashes)

v2.0.0-beta.5

Features

Schema-aware serialization for Builder API — to_tl_with_schemas() now produces compact @table output for documents built via TeaLeafBuilder with derive-macro schemas. Previously, PascalCase schema names from #[derive(ToTeaLeaf)] (e.g., SalesOrder) didn’t match the serializer’s singularize() heuristic (e.g., "orders" → "order"), causing all arrays to fall back to verbose [{k: v}] format. The serializer now resolves schemas via a 4-step chain: declared type from parent schema → singularize → case-insensitive singularize → structural field matching.

Bug Fixes

Fixed schema inference name collision when a field singularizes to the same name as its parent array’s schema — prevented self-referencing schemas (e.g., @struct root (root: root)) and data loss during round-trip (found via fuzzing)
Fixed @table serializer applying wrong schema when the same field name appears at multiple nesting levels with different object shapes — serializer now validates schema fields match the actual object keys before using positional tuple encoding

Testing

Added 8 Rust regression tests for schema name collisions: fuzz_repro_dots_in_field_name, schema_name_collision_field_matches_parent, analyze_node_nesting_stress_test, schema_collision_recursive_arrays, schema_collision_recursive_same_shape, schema_collision_three_level_nesting, schema_collision_three_level_divergent_leaves, all_orders_cli_vs_api_roundtrip
Added derive integration test test_builder_schema_aware_table_output — verifies Builder API with 5 nested PascalCase schemas produces @table encoding and round-trips correctly
Verified all 7 fuzz targets pass (~445K total runs, zero crashes)

v2.0.0-beta.4

Bug Fixes

Fixed binary encoding crash when compiling JSON with heterogeneous nested objects — from_json_with_schemas infers any pseudo-type for fields whose nested objects have varying shapes; the binary encoder now falls back to generic encoding instead of erroring with “schema-typed field ‘any’ requires a schema”
Fixed parser failing to resolve schema names that shadow built-in type keywords — schemas named bool, int, string, etc. now correctly resolve via LParen lookahead disambiguation (struct tuples always start with (, primitives never do)
Fixed singularize() producing empty string for single-character field names (e.g., "s" → "") — caused @struct definitions with missing names and unparseable TL text output
Fixed validate_tokens.py token comparison by converting API input to int for safety

.NET

Added TLValueExtensions with GetRequired() extension methods for TLValue and TLDocument — provides non-nullable access patterns, reducing CS8602 warnings in consuming code
Added TL007 diagnostic: [TeaLeaf] classes in the global namespace now produce a compile-time error (“TeaLeaf type must be in a named namespace”)
Removed SuppressDependenciesWhenPacking property from TeaLeaf.Generators.csproj
Exposed InternalsVisibleTo for TeaLeaf.Tests

CI/CD

Re-enabled all 6 GitHub Actions workflows after making the repository public (rust-cli, dotnet-package, accuracy-benchmark, docs, coverage, fuzz)
Fixed coverlet filter quoting in coverage workflow — commas URL-encoded as %2c to prevent shell argument splitting
Fixed Codecov token handling — made CODECOV_TOKEN optional for public repo tokenless uploads
Fixed Codecov multi-file upload format — changed from YAML block scalar to comma-separated single-line
Refactored coverage workflow to use dotnet-coverage with dedicated settings XML files
Added CodeQL security analysis workflow
Fixed accuracy-benchmark workflow permissions

Testing

Added Rust regression test for any pseudo-type compile round-trip
Added 21 Rust tests for schema names shadowing all built-in type keywords (bool, int, int8..int64, uint..uint64, float, float32, float64, string, timestamp, bytes) — covers JSON inference round-trip, direct TL parsing, self-referencing schemas, duplicate declarations, and multiple built-in-named schemas in one document
Added 4 .NET regression tests covering TLDocument.FromJson → Compile with heterogeneous nested objects, mixed-structure arrays, complex schema inference, and retail_orders.json end-to-end
Added .NET tests for JSON serialization of timestamps and byte arrays
Added .NET coverage tests for multi-word enums and nullable nested objects
Added .NET source generator tests (524 new lines in GeneratorTests.cs) including TL007 global namespace diagnostic
Added .NET TLValue.GetRequired() extension method tests
Added .NET TLReader binary reader tests (168 new lines)
Added cross-platform FindRepoFile helper for .NET test fixture discovery (walks up directory tree instead of hardcoded relative path depth)
Verified full .NET test suite on Linux (WSL Ubuntu 24.04)

Tooling

Added --version / -V CLI flag
Added delete-caches.ps1 and delete-caches.sh GitHub Actions cache cleanup scripts
Updated coverage.ps1 to support dotnet-coverage collection with XML settings files

Documentation

Updated binary deserialization method names in quick-start, LLM context guide, schema evolution guide, and derive macros docs
Updated tealeaf workflow diagram

v2.0.0-beta.3

Features

Byte literals — b"..." hex syntax for byte data in text format (e.g., payload: b"cafef00d")
Arbitrary-precision numbers — Value::JsonNumber preserves exact decimal representation for numbers exceeding native type ranges
Insertion order preservation — IndexMap replaces HashMap for all user-facing containers; JSON round-trips now preserve original key order (ADR-0001)
Timestamp timezone support — Timestamps encode timezone offset in minutes (10 bytes: 8 millis + 2 offset); supports Z, +HH:MM, -HH:MM, +HH formats
Special float values — NaN, inf, -inf keywords for IEEE 754 special values (JSON export converts to null)
Extended escape sequences — \b (backspace), \f (form feed), \uXXXX (Unicode code points) for full JSON string escape parity
Forward compatibility — Unknown directives silently ignored, enabling older implementations to partially parse files with newer features (spec §1.18)

Bug Fixes

Fixed bounds check failures and bitmap overflow issues in binary decoder
Fixed lexer infinite loop on certain malformed inputs (found via fuzzing)
Fixed NaN value quoting causing incorrect round-trip behavior
Fixed parser crashes on deeply nested structures
Fixed integer overflow in varint decoding
Fixed off-by-one errors in array length checks
Fixed negative hex/binary literal parsing
Fixed exponent-only numbers (e.g., 1e3) to parse as floats, not integers
Fixed timestamp timezone parsing to accept hour-only offsets (+05 = +05:00)
Rejected value-only types (object, map, tuple, ref, tagged) as schema field types per spec §2.1
Fixed .NET package publishing for TeaLeaf.Annotations and TeaLeaf.Generators to NuGet

Performance

Removed O(n log n) key sorting from all serialization paths: 6-17% faster for small/medium objects, up to 69% faster for tabular data
Binary decode 56-105% slower for generic object workloads due to IndexMap insertion cost (acceptable trade-off per ADR-0001; columnar workloads less affected)

Specification

Schema table header byte +6 stores Union Count (was reserved)
String table length encoding changed from u16 to u32 for strings > 65KB
Added type code 0x12 for JSONNUMBER
Timestamp encoding extended to 10 bytes (8 millis + 2 offset)
Added bytes_lit grammar production; extended number to include NaN/inf/-inf
Documented object, map, ref, tagged as value-only types (not valid in schema fields)
Resolved compression algorithm spec contradiction: binary format v2 uses ZLIB (deflate), not zstd (ADR-0004)

Tooling

Fuzzing infrastructure — 7 cargo-fuzz targets with custom dictionaries and structure-aware generation (ADR-0002)
Fuzzing CI workflow — GitHub Actions runs all targets for 120s each (~15 min per run)
Nesting depth limit — 256-level max for stack overflow protection (ADR-0003)
VS Code extension — Syntax highlighting for .tl files (vscode-tealeaf/)
FFI safety — Comprehensive # Safety docs on all FFI functions; regenerated tealeaf.h
Token validation — validate_tokens.py script validates API-reported token counts against tiktoken
Maintenance scripts — delete-deployments and delete-workflow-runs for GitHub cleanup

Testing

238+ adversarial tests for malformed binary input
333+ .NET edge case tests for FFI boundary conditions
Property-based tests with depth-bounded recursive generation
Accuracy benchmark token savings updated to ~36% fewer data tokens (validated with tiktoken)

Documentation

ADR-0001: IndexMap for Insertion Order Preservation
ADR-0002: Fuzzing Architecture and Strategy
ADR-0003: Maximum Nesting Depth Limit (256)
ADR-0004: ZLIB Compression for Binary Format
Code of Conduct, SECURITY.md, GitHub issue/PR templates
examples/showcase.tl — 736-line comprehensive format demonstration
Sample accuracy benchmark results

Breaking Changes

Value::Object uses IndexMap<String, Value> instead of HashMap (type alias ObjectMap provided; From<HashMap> retained for backward compatibility)
Value::Timestamp(i64) → Value::Timestamp(i64, i16) — second field is timezone offset in minutes
Value::JsonNumber(String) variant added — match expressions on Value need new arm
Binary timestamps not backward-compatible (beta.2 readers cannot decode beta.3 timestamps; beta.3 readers handle beta.2 files by defaulting offset to UTC)
JSON round-trips preserve key order instead of alphabetizing

v2.0.0-beta.2

Format

@union definitions now encoded in binary schema table (full text-binary-text roundtrip)
Union schema region uses backward-compatible extension of schema table header
Derive macro collect_unions() generates union definitions for Rust enums
TeaLeafBuilder::add_union() for programmatic union construction

Improvements

Version sync automation expanded to cover all project files (16 targets)
NuGet package icon added to all NuGet packages (TeaLeaf, Annotations, Generators)
CI badges added to README (Rust CI, .NET CI, crates.io, NuGet, codecov, License)
crates.io publish ordering fixed (tealeaf-derive before tealeaf-core)
Contributing guide added (CONTRIBUTING.md)
Spec governance documentation added
Accuracy benchmark dump-prompts subcommand for offline prompt inspection
TeaLeaf.Annotations published as separate NuGet package (fixes dependency resolution)
benches_proto/ excluded from crates.io package (removes protoc requirement for consumers)

v2.0.0-beta.1

Initial public beta release.

Format

Text format (.tl) with comments, schemas, and all value types
Binary format (.tlbx) with string deduplication, schema embedding, and per-section compression
15 primitive types + 6 container/semantic types
Inline schemas with @struct, @table, @map, @union
References (!name) and tagged values (:tag value)
File includes (@include)
ISO 8601 timestamp support
JSON bidirectional conversion with schema inference

CLI

8 commands: compile, decompile, info, validate, to-json, from-json, tlbx-to-json, json-to-tlbx
Pre-built binaries for 7 platforms (Windows, Linux, macOS – x64 and ARM64)

Rust

tealeaf-core crate with full parser, compiler, and reader
tealeaf-derive crate with #[derive(ToTeaLeaf, FromTeaLeaf)]
Builder API (TeaLeafBuilder)
Memory-mapped binary reading
Conversion traits with automatic schema collection

.NET

TeaLeaf NuGet package with native libraries for all platforms
C# incremental source generator ([TeaLeaf] attribute)
Reflection-based serializer (TeaLeafSerializer)
Managed wrappers (TLDocument, TLValue, TLReader)
Schema introspection API
Diagnostic codes TL001-TL006

FFI

C-compatible API via tealeaf-ffi crate
45+ exported functions
Thread-safe error handling
Null-safe for all pointer parameters
C header generation via cbindgen

Known Limitations

~~Bytes type does not round-trip through text format~~ (resolved: b"..." hex literals added)
JSON import does not recognize $ref, $tag, or timestamp strings
Individual string length limited to ~4 GB (u32) in binary format
64-byte header overhead makes TeaLeaf inefficient for very small objects

Keyboard shortcuts

TeaLeaf Documentation