Changelog
v2.0.0-beta.9 (Current)
Features
FormatOptionsstruct withcompact_floats— newFormatOptionsstruct replaces barecompact: boolthroughout the serialization pipeline, addingcompact_floatsto strip.0from whole-number floats (e.g.,35934000000.0→35934000000). Saves additional characters/tokens for financial and scientific datasets. Trade-off: re-parsing producesIntinstead ofFloatfor whole-number values.- Rust:
FormatOptions::compact().with_compact_floats()viadoc.to_tl_with_options(&opts) - CLI:
--compact-floatsflag onfrom-jsonanddecompilecommands - FFI:
tl_document_to_text_with_options(doc, compact, compact_floats)andtl_document_to_text_data_only_with_options(doc, compact, compact_floats) - .NET: unified
doc.ToText(compact, compactFloats, ignoreSchemas)with all-optional parameters
- Rust:
- Lexer/parser refactor for colon handling — removed
Tag(String)token kind from lexer; colon now always emits asColontoken. Tags (:name value) are parsed in the parser asColon + Wordsequence. This enables no-space syntax:key:valueandkey::tagnow parse correctly without requiring a space after the colon.
.NET
- Source generator:
ResolveStructName()— nested types now honor[TeaLeaf(StructName = "...")]attribute when resolving struct names for arrays, lists, and nested objects. Previously always usedToSnakeCase(type.Name), ignoring the override. - Unified
ToText()andToTextDataOnly()into a singleToText(bool compact = false, bool compactFloats = false, bool ignoreSchemas = false)method with all-optional parameters. Replaces the previous 4-method API (ToText,ToTextDataOnly,ToTextCompact,ToTextCompactDataOnly) with named-parameter calling style (e.g.,doc.ToText(compact: true),doc.ToText(ignoreSchemas: true)).
Accuracy Benchmark
- Real dataset support — benchmark now supports real-world datasets alongside synthetic data. Starting with finance domain using SEC EDGAR 2025 Q4 10-K filings (AAPL, MSFT, GOOGL, AMZN, NVDA).
- Three-format comparison in
analysis.tl—TLWriternow emits@structschema definitions for all table types (api_response,analysis_result,comparison_result) and two new format comparison tables (format_accuracy,format_tokens) when--compare-formatsis used. String values in table rows are properly quoted for valid TeaLeaf parsing. - Reorganized task data into
synthetic-data/subdirectories to separate from real datasets. - Added task configuration files (
real.json,synthetic.json) for independent benchmark runs. - Added utility examples:
convert_formats.rs,tl_roundtrip.rs,toon_roundtrip.rs. convert_json_to_tl()now usesFormatOptions::compact().with_compact_floats()for maximum token savings.
Testing
- Added
test_format_float_compact_floats— verifies whole-number stripping, non-whole preservation, special value handling (NaN/inf), and scientific notation passthrough - Added
test_dumps_with_compact_floats— integration test forFormatOptionswith mixed int/float data - Added
test_colon_then_word— verifies lexer emitsColon + Wordinstead ofTagfor:Circlesyntax - Added
test_tagged_value_no_space_after_colon— verifieskey::tag valueparsing - Added
test_key_value_no_space_after_colon— verifieskey:valueparsing without spaces - Added 4 source generator tests for
ResolveStructName: nested type withStructNameoverride, list/array of nested type with override, and default snake_case fallback - Added 4 .NET tests for compact text and unified
ToTextAPI:ToTextCompact_RemovesInsignificantWhitespace,ToTextCompact_WithSchemas_IsSmallerThanPretty,ToTextCompact_RoundTrips,ToText_IgnoreSchemas_ExcludesSchemas - Updated 4 fuzz targets (
fuzz_serialize,fuzz_parse,fuzz_structured,fuzz_json_schemas) with compact andcompact_floatsroundtrip coverage andvalues_numeric_equalcomparator for Float↔Int/UInt coercion
Documentation
- Added Compact Floats: Intentional Lossy Optimization section to round-trip fidelity guide
- Updated CLI docs for
--compact-floatsflag ondecompileandfrom-json - Updated Rust overview with
FormatOptionssection andto_tl_with_optionsAPI - Updated FFI API reference with
tl_document_to_text_with_optionsandtl_document_to_text_data_only_with_options - Updated .NET overview with new
ToText/ToTextDataOnlyoverloads - Updated LLM context guide with
FormatOptionsexamples and compaction options table - Updated crates.io and NuGet README files
- Updated accuracy benchmark documentation with
analysis.tlstructure, three-format comparison tables, and latest benchmark results (~43% input token savings on real-world data) - Updated token savings claims across README.md, tealeaf-core/README.md, introduction.md, and CLAUDE.md to reflect latest benchmark data
v2.0.0-beta.8
.NET
- XML documentation in NuGet packages —
TeaLeafandTeaLeaf.Annotationspackages now include XML doc files (TeaLeaf.xml,TeaLeaf.Annotations.xml) for all target frameworks. Consumers get IntelliSense tooltips for all public APIs. Previously,GenerateDocumentationFilewas not enabled and the.xmlfiles were absent from the.nupkg. - Added XML doc comments to all undocumented public members:
TLTypeenum values (13),TLDocument.ToString/Dispose,TLReader.Dispose,TLField.ToString,TLSchema.ToString,TLExceptionconstructors (3) - Enabled
TreatWarningsAsErrorsforTeaLeafandTeaLeaf.Annotations— missing XML docs or other warnings are now compile errors, preventing regressions
Testing
- Added
ToJson_PreservesSpecialCharacters_NoUnicodeEscaping— verifies+,<,>,'survive binary round-trip without Unicode escaping in bothToJson()andToJsonCompact()paths - Added
ToJson_PreservesFloatDecimalPoint_WholeNumbers— verifies whole-number floats (99.0,150.0,0.0) retain.0suffix and non-whole floats (4.5,3.75) preserve decimal digits
v2.0.0-beta.7
.NET
- Fixed
TLReader.ToJson()escaping non-ASCII-safe characters —+in phone numbers rendered as\u002B,</>as\u003C/\u003E, etc.System.Text.Json’s defaultJavaScriptEncoder.DefaultHTML-encodes these characters for XSS safety, which is inappropriate for a data serialization library. All three JSON serialization methods (ToJson,ToJsonCompact,GetAsJson) now useJavaScriptEncoder.UnsafeRelaxedJsonEscapingvia sharedstatic readonlyoptions. - Fixed
TLReader.ToJson()dropping.0suffix from whole-number floats —3582.0in source JSON became3582after binary round-trip becauseSystem.Text.Json’sJsonValue.Create(double)strips trailing.0. AddedFloatToJsonNodehelper that usesF1formatting for whole-number doubles, preserving formatting fidelity with the Rust CLI path.
v2.0.0-beta.6
Features
- Recursive array schema inference in JSON import —
from_json_with_schemasnow discovers schemas for arrays nested inside objects at arbitrary depth (e.g.,items[].product.stock[]). Previously,analyze_nested_objectsonly recursed into nested objects but not nested arrays, causing deeply nested arrays to fall back to[]any. The CLI and derive-macro paths now produce equivalent schema coverage. - Deterministic schema declaration order —
analyze_arrayandanalyze_nested_objectsnow use single-pass field-order traversal (depth-first), matching the derive macro’s field-declaration-order strategy. Previously, both functions made two separate passes (arrays first, then objects), causing schema declarations to appear in a different order than the derive/Builder API path. CLI and Builder API now produce byte-identical.tloutput for the same data.
Bug Fixes
- Fixed binary encoding corruption for
[]anytyped arrays —encode_typed_valueincorrectly wroteTLType::Structas the element type for the “any” pseudo-type (theto_tl_type()default for unknown names), causing the reader to interpret heterogeneous data as struct schema indices. Arrays with mixed element types inside schema-typed objects (e.g.,order.customer,order.payment) now correctly use heterogeneous0xFFencoding when no matching schema exists.
Tooling
- Version sync scripts (
sync-version.ps1,sync-version.sh) now regenerate the workflow diagram (assets/tealeaf_workflow.png) viagenerate_workflow_diagram.pyon each version bump
Testing
- Added
json_any_array_binary_roundtrip— focused regression test verifying[]anyfields inside schema-typed structs survive binary compilation with full data integrity verification - Added
retail_orders_json_binary_roundtrip— end-to-end test exercising JSON → infer schemas → compile → binary read withretail_orders.json(the exact path that was untested) - Added .NET
FromJson_HeterogeneousArrayInStruct_BinaryRoundTrips— mirrors the Rust[]anyregression test through the FFI layer - Strengthened .NET
FromJson_RetailOrdersFixture_CompileRoundTrips— upgraded from string-contains check to structural JSON verification (10 orders, 4 products, 3 customers, spot-check order ID and item count) - Added
json_inference_nested_array_inside_object— verifies arrays nested inside objects (e.g.,items[].product.stock[]) get their own schema and typed array fields - Added
gen_retail_orders_api_tlderive integration test — generates.tlfrom Rust DTOs via Builder API and confirms byte-identical output with CLI path - Added
examples/retail_orders_different_shape_cli.tlandretail_orders_different_shape_api.tlcomparison fixtures (2,395 bytes each, zero diff) - Moved
retail_orders_different_shape.rsfromexamples/totealeaf-core/tests/fixtures/to keep test dependencies within the crate boundary - Verified all 7 fuzz targets pass (~566K total runs, zero crashes)
v2.0.0-beta.5
Features
- Schema-aware serialization for Builder API —
to_tl_with_schemas()now produces compact@tableoutput for documents built viaTeaLeafBuilderwith derive-macro schemas. Previously, PascalCase schema names from#[derive(ToTeaLeaf)](e.g.,SalesOrder) didn’t match the serializer’ssingularize()heuristic (e.g.,"orders"→"order"), causing all arrays to fall back to verbose[{k: v}]format. The serializer now resolves schemas via a 4-step chain: declared type from parent schema → singularize → case-insensitive singularize → structural field matching.
Bug Fixes
- Fixed schema inference name collision when a field singularizes to the same name as its parent array’s schema — prevented self-referencing schemas (e.g.,
@struct root (root: root)) and data loss during round-trip (found via fuzzing) - Fixed
@tableserializer applying wrong schema when the same field name appears at multiple nesting levels with different object shapes — serializer now validates schema fields match the actual object keys before using positional tuple encoding
Testing
- Added 8 Rust regression tests for schema name collisions:
fuzz_repro_dots_in_field_name,schema_name_collision_field_matches_parent,analyze_node_nesting_stress_test,schema_collision_recursive_arrays,schema_collision_recursive_same_shape,schema_collision_three_level_nesting,schema_collision_three_level_divergent_leaves,all_orders_cli_vs_api_roundtrip - Added derive integration test
test_builder_schema_aware_table_output— verifies Builder API with 5 nested PascalCase schemas produces@tableencoding and round-trips correctly - Verified all 7 fuzz targets pass (~445K total runs, zero crashes)
v2.0.0-beta.4
Bug Fixes
- Fixed binary encoding crash when compiling JSON with heterogeneous nested objects —
from_json_with_schemasinfersanypseudo-type for fields whose nested objects have varying shapes; the binary encoder now falls back to generic encoding instead of erroring with “schema-typed field ‘any’ requires a schema” - Fixed parser failing to resolve schema names that shadow built-in type keywords — schemas named
bool,int,string, etc. now correctly resolve via LParen lookahead disambiguation (struct tuples always start with(, primitives never do) - Fixed
singularize()producing empty string for single-character field names (e.g.,"s"→"") — caused@structdefinitions with missing names and unparseable TL text output - Fixed
validate_tokens.pytoken comparison by converting API input tointfor safety
.NET
- Added
TLValueExtensionswithGetRequired()extension methods forTLValueandTLDocument— provides non-nullable access patterns, reducing CS8602 warnings in consuming code - Added TL007 diagnostic:
[TeaLeaf]classes in the global namespace now produce a compile-time error (“TeaLeaf type must be in a named namespace”) - Removed
SuppressDependenciesWhenPackingproperty fromTeaLeaf.Generators.csproj - Exposed
InternalsVisibleToforTeaLeaf.Tests
CI/CD
- Re-enabled all 6 GitHub Actions workflows after making the repository public (rust-cli, dotnet-package, accuracy-benchmark, docs, coverage, fuzz)
- Fixed coverlet filter quoting in coverage workflow — commas URL-encoded as
%2cto prevent shell argument splitting - Fixed Codecov token handling — made
CODECOV_TOKENoptional for public repo tokenless uploads - Fixed Codecov multi-file upload format — changed from YAML block scalar to comma-separated single-line
- Refactored coverage workflow to use
dotnet-coveragewith dedicated settings XML files - Added CodeQL security analysis workflow
- Fixed accuracy-benchmark workflow permissions
Testing
- Added Rust regression test for
anypseudo-type compile round-trip - Added 21 Rust tests for schema names shadowing all built-in type keywords (
bool,int,int8..int64,uint..uint64,float,float32,float64,string,timestamp,bytes) — covers JSON inference round-trip, direct TL parsing, self-referencing schemas, duplicate declarations, and multiple built-in-named schemas in one document - Added 4 .NET regression tests covering
TLDocument.FromJson→Compilewith heterogeneous nested objects, mixed-structure arrays, complex schema inference, and retail_orders.json end-to-end - Added .NET tests for JSON serialization of timestamps and byte arrays
- Added .NET coverage tests for multi-word enums and nullable nested objects
- Added .NET source generator tests (524 new lines in
GeneratorTests.cs) including TL007 global namespace diagnostic - Added .NET
TLValue.GetRequired()extension method tests - Added .NET
TLReaderbinary reader tests (168 new lines) - Added cross-platform
FindRepoFilehelper for .NET test fixture discovery (walks up directory tree instead of hardcoded relative path depth) - Verified full .NET test suite on Linux (WSL Ubuntu 24.04)
Tooling
- Added
--version/-VCLI flag - Added
delete-caches.ps1anddelete-caches.shGitHub Actions cache cleanup scripts - Updated
coverage.ps1to supportdotnet-coveragecollection with XML settings files
Documentation
- Updated binary deserialization method names in quick-start, LLM context guide, schema evolution guide, and derive macros docs
- Updated tealeaf workflow diagram
v2.0.0-beta.3
Features
- Byte literals —
b"..."hex syntax for byte data in text format (e.g.,payload: b"cafef00d") - Arbitrary-precision numbers —
Value::JsonNumberpreserves exact decimal representation for numbers exceeding native type ranges - Insertion order preservation —
IndexMapreplacesHashMapfor all user-facing containers; JSON round-trips now preserve original key order (ADR-0001) - Timestamp timezone support — Timestamps encode timezone offset in minutes (10 bytes: 8 millis + 2 offset); supports
Z,+HH:MM,-HH:MM,+HHformats - Special float values —
NaN,inf,-infkeywords for IEEE 754 special values (JSON export converts tonull) - Extended escape sequences —
\b(backspace),\f(form feed),\uXXXX(Unicode code points) for full JSON string escape parity - Forward compatibility — Unknown directives silently ignored, enabling older implementations to partially parse files with newer features (spec §1.18)
Bug Fixes
- Fixed bounds check failures and bitmap overflow issues in binary decoder
- Fixed lexer infinite loop on certain malformed inputs (found via fuzzing)
- Fixed NaN value quoting causing incorrect round-trip behavior
- Fixed parser crashes on deeply nested structures
- Fixed integer overflow in varint decoding
- Fixed off-by-one errors in array length checks
- Fixed negative hex/binary literal parsing
- Fixed exponent-only numbers (e.g.,
1e3) to parse as floats, not integers - Fixed timestamp timezone parsing to accept hour-only offsets (
+05=+05:00) - Rejected value-only types (
object,map,tuple,ref,tagged) as schema field types per spec §2.1 - Fixed .NET package publishing for
TeaLeaf.AnnotationsandTeaLeaf.Generatorsto NuGet
Performance
- Removed O(n log n) key sorting from all serialization paths: 6-17% faster for small/medium objects, up to 69% faster for tabular data
- Binary decode 56-105% slower for generic object workloads due to
IndexMapinsertion cost (acceptable trade-off per ADR-0001; columnar workloads less affected)
Specification
- Schema table header byte +6 stores Union Count (was reserved)
- String table length encoding changed from
u16tou32for strings > 65KB - Added type code
0x12forJSONNUMBER - Timestamp encoding extended to 10 bytes (8 millis + 2 offset)
- Added
bytes_litgrammar production; extendednumberto includeNaN/inf/-inf - Documented
object,map,ref,taggedas value-only types (not valid in schema fields) - Resolved compression algorithm spec contradiction: binary format v2 uses ZLIB (deflate), not zstd (ADR-0004)
Tooling
- Fuzzing infrastructure — 7 cargo-fuzz targets with custom dictionaries and structure-aware generation (ADR-0002)
- Fuzzing CI workflow — GitHub Actions runs all targets for 120s each (~15 min per run)
- Nesting depth limit — 256-level max for stack overflow protection (ADR-0003)
- VS Code extension — Syntax highlighting for
.tlfiles (vscode-tealeaf/) - FFI safety — Comprehensive
# Safetydocs on all FFI functions; regeneratedtealeaf.h - Token validation —
validate_tokens.pyscript validates API-reported token counts against tiktoken - Maintenance scripts —
delete-deploymentsanddelete-workflow-runsfor GitHub cleanup
Testing
- 238+ adversarial tests for malformed binary input
- 333+ .NET edge case tests for FFI boundary conditions
- Property-based tests with depth-bounded recursive generation
- Accuracy benchmark token savings updated to ~36% fewer data tokens (validated with tiktoken)
Documentation
- ADR-0001: IndexMap for Insertion Order Preservation
- ADR-0002: Fuzzing Architecture and Strategy
- ADR-0003: Maximum Nesting Depth Limit (256)
- ADR-0004: ZLIB Compression for Binary Format
- Code of Conduct, SECURITY.md, GitHub issue/PR templates
examples/showcase.tl— 736-line comprehensive format demonstration- Sample accuracy benchmark results
Breaking Changes
Value::ObjectusesIndexMap<String, Value>instead ofHashMap(type aliasObjectMapprovided;From<HashMap>retained for backward compatibility)Value::Timestamp(i64)→Value::Timestamp(i64, i16)— second field is timezone offset in minutesValue::JsonNumber(String)variant added — match expressions onValueneed new arm- Binary timestamps not backward-compatible (beta.2 readers cannot decode beta.3 timestamps; beta.3 readers handle beta.2 files by defaulting offset to UTC)
- JSON round-trips preserve key order instead of alphabetizing
v2.0.0-beta.2
Format
@uniondefinitions now encoded in binary schema table (full text-binary-text roundtrip)- Union schema region uses backward-compatible extension of schema table header
- Derive macro
collect_unions()generates union definitions for Rust enums TeaLeafBuilder::add_union()for programmatic union construction
Improvements
- Version sync automation expanded to cover all project files (16 targets)
- NuGet package icon added to all NuGet packages (TeaLeaf, Annotations, Generators)
- CI badges added to README (Rust CI, .NET CI, crates.io, NuGet, codecov, License)
- crates.io publish ordering fixed (
tealeaf-derivebeforetealeaf-core) - Contributing guide added (
CONTRIBUTING.md) - Spec governance documentation added
- Accuracy benchmark
dump-promptssubcommand for offline prompt inspection TeaLeaf.Annotationspublished as separate NuGet package (fixes dependency resolution)benches_proto/excluded from crates.io package (removesprotocrequirement for consumers)
v2.0.0-beta.1
Initial public beta release.
Format
- Text format (
.tl) with comments, schemas, and all value types - Binary format (
.tlbx) with string deduplication, schema embedding, and per-section compression - 15 primitive types + 6 container/semantic types
- Inline schemas with
@struct,@table,@map,@union - References (
!name) and tagged values (:tag value) - File includes (
@include) - ISO 8601 timestamp support
- JSON bidirectional conversion with schema inference
CLI
- 8 commands:
compile,decompile,info,validate,to-json,from-json,tlbx-to-json,json-to-tlbx - Pre-built binaries for 7 platforms (Windows, Linux, macOS – x64 and ARM64)
Rust
tealeaf-corecrate with full parser, compiler, and readertealeaf-derivecrate with#[derive(ToTeaLeaf, FromTeaLeaf)]- Builder API (
TeaLeafBuilder) - Memory-mapped binary reading
- Conversion traits with automatic schema collection
.NET
TeaLeafNuGet package with native libraries for all platforms- C# incremental source generator (
[TeaLeaf]attribute) - Reflection-based serializer (
TeaLeafSerializer) - Managed wrappers (
TLDocument,TLValue,TLReader) - Schema introspection API
- Diagnostic codes TL001-TL006
FFI
- C-compatible API via
tealeaf-fficrate - 45+ exported functions
- Thread-safe error handling
- Null-safe for all pointer parameters
- C header generation via
cbindgen
Known Limitations
Bytes type does not round-trip through text format(resolved:b"..."hex literals added)- JSON import does not recognize
$ref,$tag, or timestamp strings - Individual string length limited to ~4 GB (u32) in binary format
- 64-byte header overhead makes TeaLeaf inefficient for very small objects