Changelog
v2.0.0-beta.14 (Current)
Features
nullkeyword for explicit null values — The text format now supportsnullas a keyword alongside~(tilde). In most contexts they are interchangeable. In@tabletuples, they have distinct semantics:~— absent field (the field was not present in the source object). For nullable fields, the parser drops the field entirely from the reconstructed object. For non-nullable fields, it is preserved as null.null— explicit null (the field was present with anullvalue in the source JSON). Always preserved asnullin the output regardless of field nullability.- The text serializer (
write_tuple) writesnullfor fields that are explicitly null and~for fields that are absent. - Lexer: new
ExplicitNulltoken for thenullkeyword, distinct fromNull(~).
- Two-bit field state encoding in binary format — The binary struct array encoding now uses a two-bitmap layout (
lo+hi) instead of the previous single null bitmap. Each field uses a two-bit state code:code = lo_bit(i) | (hi_bit(i) << 1):- 0 (lo=0, hi=0) — has value: field data follows inline
- 1 (lo=1, hi=0) — explicit null: always preserved as
nullin output - 2 (lo=0, hi=1) — absent: dropped for nullable fields, preserved as null for non-nullable fields
- Bitmap size per row is
2 × bmsbytes (wherebms = ceil(field_count / 8)), stored as lo bitmap followed by hi bitmap - A null array element has all fields set to code=2 (lo bits all zero, hi bits all set)
- Applied to all 4 bitmap sites:
encode_struct_array,encode_typed_value(Struct),decode_struct_array,decode_struct
Performance
- Hex encoding optimization — Replaced
format!("{:02x}")per-byte hex encoding with lookup-table-basedpush_hex_byte()/push_hex_bytes()helpers. Applies to bytes literals in text serialization (b"...") and JSON export ("0x..."strings). Eliminates allocation per byte. - Zero-copy binary section reads —
Reader::read_section()now usesCow<[u8]>— uncompressed sections borrow directly from the memory-mapped file (Cow::Borrowed) instead of copying to aVec<u8>. Compressed sections still allocate (Cow::Owned). - Schema lookup via HashMap — Binary writer now uses
schema_map(HashMap by name) for O(1) schema resolution instead of O(n) linear scans viaself.schemas.iter().find(). Affectsencode_struct_array,encode_typed_value, and nested struct field encoding. - Reduced redundant schema resolution —
write_tuplein the text serializer now callsresolve_schema()once per field and reuses the result for both array and non-array branches, eliminating duplicate lookups.
Specification
nullkeyword in grammar — Added"null"as a primitive production:primitive = string | bytes_lit | number | bool | "~" | "null"- Two-bit field state encoding — Updated §4.8 struct array encoding from single null bitmap to lo/hi two-bitmap layout with three-state field codes
- Duplicate key semantics — Documented that duplicate keys in objects use last-value-wins (§1.6)
- Reference semantics — Documented compile-time resolution, define-before-use, multi-use, circular reference handling, and document-global scope (§1.13)
- Tagged value semantics — Documented schema-free and schema-typed tagged value behavior (§1.14)
- Include semantics — Documented recursive includes, circular detection, 32-level depth limit (§1.16)
- Map key restrictions — Documented hashable key type restriction (string, int, uint) for map keys (§1.12)
- §1.19 Limits and Constraints — New section documenting all format limits: nesting depth (256), object fields (u16), array elements (u32), string length (u32), section count (u32), schema count (u16), max decompressed section (256 MB), and empty collection encoding
- §1.20 File Encoding — New section documenting UTF-8 requirement, BOM handling, byte order, and whitespace rules
- Reserved type code ranges — Documented reserved type code ranges that readers should treat as errors (§4.6)
- String table lookup — Documented O(1) index-based access and linear scan for name-based lookups (§4.4)
- Top-level array encoding rationale — Documented why only Int32 and String use homogeneous encoding at the top level (§2.4)
- Type coercion rationale — Documented LLM context use case for silent coercion (§2.5)
Accuracy Benchmark
- Real-world dataset expansion — Added 7 new real-world datasets across 7 domains: SEC EDGAR (finance), BLS JOLTS (HR/labor), CDC PLACES (healthcare), Wisconsin Circuit Court (legal), NVD CVEs (technology), NYC PLUTO (real estate), USDA FoodData (retail). Each domain includes data processing scripts, README documentation, and multiple task configurations.
- Updated benchmark results — 14 tasks, 7 domains. ~51% input token savings (TL vs JSON), ~20% (TOON vs JSON). Accuracy within noise (TL 0.942 vs JSON 0.945 on Claude Sonnet 4.5).
- Evidence archive — New
evidence/2026-02-14-01/with full prompts, responses, analysis.tl, summary.json, and charts for reproducibility. - Chart generation — Added
scripts/generate_charts.pyfor accuracy and token comparison visualizations.
Documentation
- ADR-0005: Direct Value Representation — Architecture decision record explaining why TeaLeaf uses direct value representation without an intermediate AST.
- Sitemap — Added
docs-site/src/sitemap.xmlwith Google search console integration for doc-site discoverability. - Updated accuracy benchmark documentation with real-world dataset details.
Canonical Fixtures
- Updated
schemas.json,mixed_schemas.json,quoted_keys.json— nullable fields with~in source now produce absent fields (no"field": null) in expected JSON output. Explicitnullvalues are preserved. - Recompiled all
.tlbxbinary fixtures with new two-bit bitmap encoding:schemas.tlbx,mixed_schemas.tlbx,quoted_keys.tlbx,large_data.tlbx,timestamps.tlbx.
Testing
- Added
test_explicit_null_roundtrip_text_and_binary— regression test for crash input[{"key":null}]verifying explicit null preservation through both text and binary roundtrips - Verified all 8 fuzz targets pass (1,524,164 total runs at 240s each, zero crashes)
- All 860 Rust tests + 560 .NET tests passing
Bug Fixes
- Fuzz crash on explicit null in JSON — Fixed crash on input
[{"k4e2-eRy4eye-24ey":null}]where explicitnullvalues in JSON were silently dropped during roundtrip. The format now correctly distinguishes between absent fields and explicitly null fields in both text and binary representations.
Breaking Changes (Binary Format)
- Binary bitmap layout change — The struct array null bitmap is now a two-bitmap (lo + hi) layout instead of a single bitmap. All
.tlbxfiles compiled with beta.13 or earlier must be recompiled. Text.tlfiles are unaffected.
v2.0.0-beta.13
.NET
@tablepositional encoding for nested object arrays — Serializer (both reflection-based and source generator) now emits@table struct_name [(v1, v2), ...]forList<NestedType>instead of[{field: val, ...}]. Eliminates repeated field names, conforming to the TeaLeaf spec for schema-bound data.- Reflection serializer: new
WriteTupleValue(),WriteTupleList(),WriteTupleDictionary()methods; modifiedWriteList()andToText<T>(IEnumerable<T>) - Source generator: emits
WriteTeaLeafTupleValue()on each[TeaLeaf(Generate = true)]class; modified list property emission to use@tableformat - Handles all field types in tuples: primitives, strings, enums, nested objects (recursive tuples), lists, dictionaries, DateTime, byte[], Guid, TimeSpan
- Deserialization unaffected — Rust parser handles
@tableto objects transparently
- Reflection serializer: new
- Parameterized constructor deserialization — Both reflection serializer and source generator now support types without parameterless constructors (e.g., records, immutable DTOs).
- Matches constructor parameter names to property names (case-insensitive)
- Supports constructor default values (
int score = 100) - Sets remaining settable properties after construction
- Removed
new()constraint fromFromDocument<T>,FromValue<T>,FromText<T>,FromList<T>
- Unmatched constructor fallback — When a parameterized constructor has parameters that don’t match any serialized property and have no default value, uses
RuntimeHelpers.GetUninitializedObject()to bypass the constructor entirely and set properties via setters. Prevents NullReferenceException for types likeSecurityGroupAttributes(string groupName)where the constructor parameter serves initialization logic unrelated to serialized fields.
Bug Fixes
- String quoting for special characters — Added
%,=,?,',!, and non-ASCII (> 0x7F) to theNeedsQuotingcharacter list in both the reflection serializer (TeaLeafTextHelper) and source generator (TLTextEmitter). Strings containing these characters are now correctly double-quoted, preventing parse errors on round-trip. @tabletrailing comma — Fixed trailing comma after the last tuple in@tableoutput. Both reflection serializer and source generator now use a separator pattern (first_flag) instead of a terminator pattern, producing valid@tablesyntax that parses without errors.
Testing
- Added 11 reflection serializer tests for parameterized constructor deserialization: basic immutable types, list properties, mixed constructor + setter, default values, nested objects, enums, round-trip, FromDocument, FromList
- Added 4 source generator tests: parameterized constructor code generation, mixed constructor/setter, constructor defaults, parameterless constructor backward compatibility
- Added 12 source-generated quoting regression tests:
%, non-ASCII (°C),=,?,'round-trip and serialization format assertions,@tableno-trailing-comma, combined special chars - Added 13 reflection serializer quoting regression tests: same special character coverage plus Unicode emoji (
❤), collection@tabletrailing comma - Added 8
NeedsQuotingunit tests for special character detection (%,=,?,',!, non-ASCII, safe identifiers) - Added 3 source generator
@tablecode emission tests:List<NestedType>emits@table, deeply nested types emitWriteTeaLeafTupleValue, separator pattern verification - Added 9 deeply nested
@tabletests (both paths):TeamWithMembers→List<PersonWithAddress>→Addressround-trip, text format assertions, empty/single-item edge cases,ToText<T>(IEnumerable<T>)collection round-trip - Added 21 compact/compactFloats round-trip tests (both paths): compact output smaller than pretty, compact round-trips correctly,
compactFloatsstrips.0from whole-number floats, fractional floats preserved, combined compact+compactFloats on@tabledata, nested objects, deeply nested tables, collection serialization
v2.0.0-beta.12
Features
- Clap CLI migration — Replaced hand-rolled
env::args()parsing withclapv4 derive macros. Provides auto-generated colored help, typo suggestions (e.g.,compil→ “did you mean ‘compile’?”),--help/-hon every subcommand, and--version/-Vflag — all derived from annotated structs with zero manual usage text.- Added
clap = { version = "4", features = ["derive", "color"] }andclap_complete = "4"dependencies try_parse()pattern preserves exit code 1 for all errors (including no-args help viaarg_required_else_help)- Deleted
print_usage()function — clap auto-generates help from doc comments
- Added
- Shell completions subcommand —
tealeaf completions <shell>generates tab-completion scripts for bash, zsh, fish, powershell, and elvish. Completes subcommands, flags, andhelparguments. get_pathdot-path navigation — Navigate nested values using dot-path expressions with array indexing.- Rust:
TeaLeaf::get_path("order.items[0].name")at document level,Value::get_path("items[0].price")within values - FFI:
tl_document_get_path(doc, path)andtl_value_get_path(value, path) - .NET:
TLDocument.GetPath(path)andTLValue.GetPath(path)with null-safe returns - Path syntax:
key.field[N].field— dot-separated field access,[N]for array indexing
- Rust:
- Object field iteration API — Enumerate object fields by index without knowing keys upfront.
- FFI:
tl_value_object_len,tl_value_object_get_key_at,tl_value_object_get_value_at - .NET:
TLValue.ObjectFieldCount,GetObjectKeyAt(index),GetObjectValueAt(index),AsObject()for(key, value)tuples,Keysproperty,GetEnumerator()forforeachsupport on objects and arrays
- FFI:
- Document schema introspection API — Inspect schema definitions at runtime without parsing text.
- FFI:
tl_document_schema_count,tl_document_schema_name,tl_document_schema_field_count,tl_document_schema_field_name,tl_document_schema_field_type,tl_document_schema_field_nullable,tl_document_schema_field_is_array(7 new C functions) - .NET:
TLDocument.Schemas(IReadOnlyList),TLDocument.GetSchema(name),TLDocument.SchemaCount;TLSchemaandTLFieldclasses withName,Type,IsNullable,IsArrayproperties
- FFI:
TLDocumentBuilder(.NET) — Fluent API for building multi-key documents with schema support.- Scalar overloads:
Add(key, string|int|long|double|float|bool|DateTimeOffset) - List overloads:
AddList(key, IEnumerable<string|int|double>) - Object support:
Add<T>(key, value)andAddList<T>(key, items)for[TeaLeaf]-attributed types - Document merging:
AddDocument(TLDocument)with automatic schema deduplication - Builder pattern: all methods return
thisfor chaining Build()produces aTLDocumentwith merged schemas
- Scalar overloads:
Bug Fixes
- Schema deduplication in .NET source generator — Diamond dependency scenarios (type A and type B both reference type C) no longer emit duplicate
@structdefinitions. NewCollectTeaLeafSchemas(StringBuilder, HashSet<string>)method performs dependency-order traversal with a sharedemittedset. Cross-assembly fallback detects types from referenced assemblies compiled with older generator versions. - Fixed memory leak in
TLValue.AsMap()where failed key/value pairs weren’t disposed
CLI
- Improved
--compactdescription: “Omit insignificant whitespace for token-efficient output” (was “Use compact single-line output”) - Improved
--compact-floatsdescription: “Write whole-number floats as integers (e.g. 42.0 becomes 42)” (was “Use compact float representation”)
Testing
- Added 26
TLDocumentBuildertests — single/multi-key documents, nested types, list support,AddDocumentmerging, JSON and binary round-trips, scalar overloads, method chaining - Added 14
GetPath.NET tests — document and value-level navigation across 6-level nested objects with arrays and mixed types - Added 15 adversarial
get_pathedge-case tests — empty string, nested fields, array indexing (valid/out-of-bounds/negative), unclosed brackets, empty brackets, non-numeric indices, double dots, leading/trailing dots, huge index overflow, scalar rejection - Added
fuzz_get_pathfuzzer target — dedicated fuzzer splitting input between document and path strings, testing bothTeaLeaf::get_pathandValue::get_path - Updated
fuzz_structuredfuzzer — addedget_pathcoverage with arbitrary and well-formed paths - Added 2 source generator deduplication tests —
SharedNestedType_DeduplicatesSchemas(diamond) andDeepDiamondDependency_DeduplicatesSchemas - Added .NET tests for
GetEnumerator()on objects/arrays/scalars andKeysproperty - Updated 3 CLI integration test assertions for clap error message format (
"Unknown command"→"frobnicate"substring,"unrecognized subcommand"negative check)
Documentation
- New completions CLI reference page — usage, arguments, install instructions for 5 shells, completion coverage details
- Updated CLI overview — added
completionsto command table - Updated SUMMARY.md — added completions to CLI Reference navigation
- Updated README.md — added
completionsto command list - Updated TEALEAF_SPEC.md — added
completionsunder new “Utilities” subsection in Section 7
v2.0.0-beta.11
Features
- Optional/nullable field support in schema inference —
analyze_array()andanalyze_nested_objects()no longer require strict field uniformity across all objects. Fields not present in every object are automatically marked nullable (?), enabling schema inference for real-world datasets where records share most fields but differ in optional ones (e.g., DCAT metadata, API responses). Requires at least 1 field present in all objects.- Union-based field collection: computes union of all field names across objects in an array, tracks per-field presence counts, marks fields below 100% presence as nullable
- New
object_matches_schema()helper allows nullable fields to be absent when matching objects to schemas — used inwrite_value_with_schemas()verification andInferredType::to_field_type()schema lookup InferredType::Object::merge()now keeps intersection of common fields instead of returningMixedwhen field counts differ- Nested array analysis collects ALL items from ALL parent objects (not just first representative), discovering the full field union across nested structures
- Structural
@tablefallback guarded withhint_name.is_some() || declared_type.is_some()to prevent matching inside tuple values
- Absent-field preservation — Nullable fields with null/absent values (
~in text, null-bit in binary) are omitted from the reconstructed object instead of being inserted as explicitnull. This preserves the semantic distinction between “field absent” and “field explicitly null”, ensuring JSON → TL → JSON roundtrip does not add spurious"field": nullentries for fields that were simply missing in the original.- Text parser:
parse_tuple_with_schema()skips inserting null for nullable fields - Binary reader: both bitmap decode paths skip inserting null for nullable fields
- Text parser:
- Most-complete object field ordering — Schema inference now uses the field ordering from the object with the most fields (the “most representative” object) as the canonical schema field order, instead of first-seen ordering across all objects. Fields present only in other objects are appended at the end. This preserves the original JSON field ordering for the majority of records during roundtrip.
Bug Fixes
- Fixed JSON → TL → JSON roundtrip adding extra
"field": nullentries for nullable fields that were absent in the original data - Fixed field ordering loss during schema inference — first (often smallest) object determined schema field order, which didn’t match the majority of objects
Canonical Fixtures
- Updated
schemas.json,mixed_schemas.json,quoted_keys.json— nullable fields with~in source now produce absent fields (no"field": null) in expected JSON output - Recompiled
mixed_schemas.tlbxto match updated binary reader behavior
Testing
- Added
test_schema_inference_optional_fields— array with optional fieldcmarked asint? - Added
test_schema_inference_optional_fields_roundtrip— JSON → TL → JSON with absent optional fields preserved - Added
test_schema_inference_no_common_fields_skipped—[{x:1}, {y:2}]produces no schema (zero common fields) - Added
test_schema_inference_single_common_field—[{id:1,a:2}, {id:3,b:4}]infers schema withidcommon - Added
test_schema_inference_optional_nested_array— nested array field marked nullable when absent from some objects - Added
test_schema_inference_optional_nested_object— nested object field marked nullable when absent - Added
test_schema_inference_wa_health_data_pattern— pattern matching the WA health DCAT dataset structure - Added
test_write_schemas_nullable_field_matching—@tableapplied correctly when objects are missing nullable fields - Added
test_schema_field_ordering_uses_most_complete_object— most-fields object determines schema field order - Added
test_schema_field_ordering_appends_extra_fields— fields from smaller objects appended after representative - Added
test_schema_field_ordering_roundtrip_preserves_order— roundtrip preserves field ordering from most-complete object - Updated
test_struct_with_nullable_field— expects absent field instead of explicit null - Updated .NET
StructArray_Table_BinaryRoundTrip_PreservesData— expectsnull(absent) instead ofTLType.Nullfor nullable fields in binary roundtrip
v2.0.0-beta.10
Features
- Quoted field names in
@structdefinitions — Schema inference no longer skips arrays of objects when field names contain special characters (@,$,#,:,/, spaces, etc.). Previously,needs_quoting(field_name)inanalyze_array()andanalyze_nested_objects()rejected the entire array, falling back to verbose inline map notation. Now the guard only checks the schema name (which appears unquoted in@struct name(...)and@table name [...]); field names that need quoting are emitted with quotes in the definition (e.g.,@struct record("@type":string, name:string)).- Affects JSON-LD (
@type,@id,@context), JSON Schema/OpenAPI ($ref,$id), OData (@odata.type,@odata.id), XML-to-JSON (#text,#comment), XML namespaces (xsi:type,dc:title), RDF/JSON (URI keys), and spreadsheet exports (space-separated keys) - Parser (
parse_struct_def()) now acceptsTokenKind::Stringin addition toTokenKind::Wordfor field names - Serialization required no changes —
write_key()already quoted field names vianeeds_quoting()
- Affects JSON-LD (
- Token counting script (
accuracy-benchmark/scripts/count_tokens.py) — Python script that uses Anthropic’smessages.count_tokens()API to measure exact token usage for benchmark prompts across TL, JSON, and TOON formats without incurring completion costs. Generates per-task comparison table with TL vs JSON, TOON vs JSON, and TL vs TOON percentage columns, plus data-only token estimates (subtracting shared instruction overhead).
Canonical Fixtures
- New:
quoted_keys(15th canonical sample) — 8 schemas covering all special-character key patterns:jsonld_record—"@type","@id"(JSON-LD)schema_def—"$ref","$id"(JSON Schema / OpenAPI)xml_node—"#text","#comment"(XML-to-JSON)ns_element—"xsi:type","dc:title"(XML namespaces)odata_entity—"@odata.type","@odata.id"(OData)rdf_triple—"http://schema.org/name","http://schema.org/age"(RDF/JSON URIs)contact—"First Name","Last Name"(space-separated keys)catalog_item— mixed quoted ("@type","$id","sku:code") and unquoted (name) fields
Testing
- Added
test_schema_inference_with_at_prefixed_keys— verifies JSON-LD@typefields trigger schema inference with quoted field names in@structoutput - Added
test_schema_inference_quoted_field_roundtrip— full JSON → TL → JSON roundtrip with@typekeys - Added
test_schema_inference_skips_when_schema_name_needs_quoting— confirms@itemsarray key (which would produce schema name@item) correctly skips inference - Added
test_schema_inference_root_array_with_at_keys— root-level@root-arraywith@typekeys - Added
test_schema_inference_dollar_prefixed_keys—$ref,$id(JSON Schema / OpenAPI) - Added
test_schema_inference_hash_prefixed_keys—#text,#comment(XML-to-JSON) - Added
test_schema_inference_colon_in_keys—xsi:type,dc:title(XML namespaces) - Added
test_schema_inference_odata_keys—@odata.type,@odata.id(OData) - Added
test_schema_inference_uri_keys—http://schema.org/name(RDF/JSON full URIs) - Added
test_schema_inference_space_in_keys—First Name,Last Name(spreadsheet exports) - Added
test_schema_inference_mixed_special_keys—@type+$id+sku:code+ unquotednamein one schema, verifying only special-character fields are quoted - Added
test_parse_struct_with_quoted_fields— parser accepts@struct foo("@type":string, name:string)and correctly deserializes@tablerows - Added 5 canonical tests:
quoted_keystext-to-JSON, binary roundtrip, full roundtrip (text → binary → JSON), compact roundtrip, compact-is-not-larger
Known Limitations (Resolved)
JSON import does not recognize—$ref,$tag, or timestamp strings$refand other special-character keys now get full schema inference and positional compression (timestamps and$tagremain unrecognized)
v2.0.0-beta.9
Features
FormatOptionsstruct withcompact_floats— newFormatOptionsstruct replaces barecompact: boolthroughout the serialization pipeline, addingcompact_floatsto strip.0from whole-number floats (e.g.,35934000000.0→35934000000). Saves additional characters/tokens for financial and scientific datasets. Trade-off: re-parsing producesIntinstead ofFloatfor whole-number values.- Rust:
FormatOptions::compact().with_compact_floats()viadoc.to_tl_with_options(&opts) - CLI:
--compact-floatsflag onfrom-jsonanddecompilecommands - FFI:
tl_document_to_text_with_options(doc, compact, compact_floats)andtl_document_to_text_data_only_with_options(doc, compact, compact_floats) - .NET: unified
doc.ToText(compact, compactFloats, ignoreSchemas)with all-optional parameters
- Rust:
- Lexer/parser refactor for colon handling — removed
Tag(String)token kind from lexer; colon now always emits asColontoken. Tags (:name value) are parsed in the parser asColon + Wordsequence. This enables no-space syntax:key:valueandkey::tagnow parse correctly without requiring a space after the colon.
.NET
- Source generator:
ResolveStructName()— nested types now honor[TeaLeaf(StructName = "...")]attribute when resolving struct names for arrays, lists, and nested objects. Previously always usedToSnakeCase(type.Name), ignoring the override. - Unified
ToText()andToTextDataOnly()into a singleToText(bool compact = false, bool compactFloats = false, bool ignoreSchemas = false)method with all-optional parameters. Replaces the previous 4-method API (ToText,ToTextDataOnly,ToTextCompact,ToTextCompactDataOnly) with named-parameter calling style (e.g.,doc.ToText(compact: true),doc.ToText(ignoreSchemas: true)).
Accuracy Benchmark
- Real dataset support — benchmark now supports real-world datasets alongside synthetic data. Starting with finance domain using SEC EDGAR 2025 Q4 10-K filings (AAPL, MSFT, GOOGL, AMZN, NVDA).
- Three-format comparison in
analysis.tl—TLWriternow emits@structschema definitions for all table types (api_response,analysis_result,comparison_result) and two new format comparison tables (format_accuracy,format_tokens) when--compare-formatsis used. String values in table rows are properly quoted for valid TeaLeaf parsing. - Reorganized task data into
synthetic-data/subdirectories to separate from real datasets. - Added task configuration files (
real.json,synthetic.json) for independent benchmark runs. - Added utility examples:
convert_formats.rs,tl_roundtrip.rs,toon_roundtrip.rs. convert_json_to_tl()now usesFormatOptions::compact().with_compact_floats()for maximum token savings.
Testing
- Added
test_format_float_compact_floats— verifies whole-number stripping, non-whole preservation, special value handling (NaN/inf), and scientific notation passthrough - Added
test_dumps_with_compact_floats— integration test forFormatOptionswith mixed int/float data - Added
test_colon_then_word— verifies lexer emitsColon + Wordinstead ofTagfor:Circlesyntax - Added
test_tagged_value_no_space_after_colon— verifieskey::tag valueparsing - Added
test_key_value_no_space_after_colon— verifieskey:valueparsing without spaces - Added 4 source generator tests for
ResolveStructName: nested type withStructNameoverride, list/array of nested type with override, and default snake_case fallback - Added 4 .NET tests for compact text and unified
ToTextAPI:ToTextCompact_RemovesInsignificantWhitespace,ToTextCompact_WithSchemas_IsSmallerThanPretty,ToTextCompact_RoundTrips,ToText_IgnoreSchemas_ExcludesSchemas - Updated 4 fuzz targets (
fuzz_serialize,fuzz_parse,fuzz_structured,fuzz_json_schemas) with compact andcompact_floatsroundtrip coverage andvalues_numeric_equalcomparator for Float↔Int/UInt coercion
Documentation
- Added Compact Floats: Intentional Lossy Optimization section to round-trip fidelity guide
- Updated CLI docs for
--compact-floatsflag ondecompileandfrom-json - Updated Rust overview with
FormatOptionssection andto_tl_with_optionsAPI - Updated FFI API reference with
tl_document_to_text_with_optionsandtl_document_to_text_data_only_with_options - Updated .NET overview with new
ToText/ToTextDataOnlyoverloads - Updated LLM context guide with
FormatOptionsexamples and compaction options table - Updated crates.io and NuGet README files
- Updated accuracy benchmark documentation with
analysis.tlstructure, three-format comparison tables, and latest benchmark results (~43% input token savings on real-world data) - Updated token savings claims across README.md, tealeaf-core/README.md, introduction.md, and CLAUDE.md to reflect latest benchmark data
v2.0.0-beta.8
.NET
- XML documentation in NuGet packages —
TeaLeafandTeaLeaf.Annotationspackages now include XML doc files (TeaLeaf.xml,TeaLeaf.Annotations.xml) for all target frameworks. Consumers get IntelliSense tooltips for all public APIs. Previously,GenerateDocumentationFilewas not enabled and the.xmlfiles were absent from the.nupkg. - Added XML doc comments to all undocumented public members:
TLTypeenum values (13),TLDocument.ToString/Dispose,TLReader.Dispose,TLField.ToString,TLSchema.ToString,TLExceptionconstructors (3) - Enabled
TreatWarningsAsErrorsforTeaLeafandTeaLeaf.Annotations— missing XML docs or other warnings are now compile errors, preventing regressions
Testing
- Added
ToJson_PreservesSpecialCharacters_NoUnicodeEscaping— verifies+,<,>,'survive binary round-trip without Unicode escaping in bothToJson()andToJsonCompact()paths - Added
ToJson_PreservesFloatDecimalPoint_WholeNumbers— verifies whole-number floats (99.0,150.0,0.0) retain.0suffix and non-whole floats (4.5,3.75) preserve decimal digits
v2.0.0-beta.7
.NET
- Fixed
TLReader.ToJson()escaping non-ASCII-safe characters —+in phone numbers rendered as\u002B,</>as\u003C/\u003E, etc.System.Text.Json’s defaultJavaScriptEncoder.DefaultHTML-encodes these characters for XSS safety, which is inappropriate for a data serialization library. All three JSON serialization methods (ToJson,ToJsonCompact,GetAsJson) now useJavaScriptEncoder.UnsafeRelaxedJsonEscapingvia sharedstatic readonlyoptions. - Fixed
TLReader.ToJson()dropping.0suffix from whole-number floats —3582.0in source JSON became3582after binary round-trip becauseSystem.Text.Json’sJsonValue.Create(double)strips trailing.0. AddedFloatToJsonNodehelper that usesF1formatting for whole-number doubles, preserving formatting fidelity with the Rust CLI path.
v2.0.0-beta.6
Features
- Recursive array schema inference in JSON import —
from_json_with_schemasnow discovers schemas for arrays nested inside objects at arbitrary depth (e.g.,items[].product.stock[]). Previously,analyze_nested_objectsonly recursed into nested objects but not nested arrays, causing deeply nested arrays to fall back to[]any. The CLI and derive-macro paths now produce equivalent schema coverage. - Deterministic schema declaration order —
analyze_arrayandanalyze_nested_objectsnow use single-pass field-order traversal (depth-first), matching the derive macro’s field-declaration-order strategy. Previously, both functions made two separate passes (arrays first, then objects), causing schema declarations to appear in a different order than the derive/Builder API path. CLI and Builder API now produce byte-identical.tloutput for the same data.
Bug Fixes
- Fixed binary encoding corruption for
[]anytyped arrays —encode_typed_valueincorrectly wroteTLType::Structas the element type for the “any” pseudo-type (theto_tl_type()default for unknown names), causing the reader to interpret heterogeneous data as struct schema indices. Arrays with mixed element types inside schema-typed objects (e.g.,order.customer,order.payment) now correctly use heterogeneous0xFFencoding when no matching schema exists.
Tooling
- Version sync scripts (
sync-version.ps1,sync-version.sh) now regenerate the workflow diagram (assets/tealeaf_workflow.png) viagenerate_workflow_diagram.pyon each version bump
Testing
- Added
json_any_array_binary_roundtrip— focused regression test verifying[]anyfields inside schema-typed structs survive binary compilation with full data integrity verification - Added
retail_orders_json_binary_roundtrip— end-to-end test exercising JSON → infer schemas → compile → binary read withretail_orders.json(the exact path that was untested) - Added .NET
FromJson_HeterogeneousArrayInStruct_BinaryRoundTrips— mirrors the Rust[]anyregression test through the FFI layer - Strengthened .NET
FromJson_RetailOrdersFixture_CompileRoundTrips— upgraded from string-contains check to structural JSON verification (10 orders, 4 products, 3 customers, spot-check order ID and item count) - Added
json_inference_nested_array_inside_object— verifies arrays nested inside objects (e.g.,items[].product.stock[]) get their own schema and typed array fields - Added
gen_retail_orders_api_tlderive integration test — generates.tlfrom Rust DTOs via Builder API and confirms byte-identical output with CLI path - Added
examples/retail_orders_different_shape_cli.tlandretail_orders_different_shape_api.tlcomparison fixtures (2,395 bytes each, zero diff) - Moved
retail_orders_different_shape.rsfromexamples/totealeaf-core/tests/fixtures/to keep test dependencies within the crate boundary - Verified all 7 fuzz targets pass (~566K total runs, zero crashes)
v2.0.0-beta.5
Features
- Schema-aware serialization for Builder API —
to_tl_with_schemas()now produces compact@tableoutput for documents built viaTeaLeafBuilderwith derive-macro schemas. Previously, PascalCase schema names from#[derive(ToTeaLeaf)](e.g.,SalesOrder) didn’t match the serializer’ssingularize()heuristic (e.g.,"orders"→"order"), causing all arrays to fall back to verbose[{k: v}]format. The serializer now resolves schemas via a 4-step chain: declared type from parent schema → singularize → case-insensitive singularize → structural field matching.
Bug Fixes
- Fixed schema inference name collision when a field singularizes to the same name as its parent array’s schema — prevented self-referencing schemas (e.g.,
@struct root (root: root)) and data loss during round-trip (found via fuzzing) - Fixed
@tableserializer applying wrong schema when the same field name appears at multiple nesting levels with different object shapes — serializer now validates schema fields match the actual object keys before using positional tuple encoding
Testing
- Added 8 Rust regression tests for schema name collisions:
fuzz_repro_dots_in_field_name,schema_name_collision_field_matches_parent,analyze_node_nesting_stress_test,schema_collision_recursive_arrays,schema_collision_recursive_same_shape,schema_collision_three_level_nesting,schema_collision_three_level_divergent_leaves,all_orders_cli_vs_api_roundtrip - Added derive integration test
test_builder_schema_aware_table_output— verifies Builder API with 5 nested PascalCase schemas produces@tableencoding and round-trips correctly - Verified all 7 fuzz targets pass (~445K total runs, zero crashes)
v2.0.0-beta.4
Bug Fixes
- Fixed binary encoding crash when compiling JSON with heterogeneous nested objects —
from_json_with_schemasinfersanypseudo-type for fields whose nested objects have varying shapes; the binary encoder now falls back to generic encoding instead of erroring with “schema-typed field ‘any’ requires a schema” - Fixed parser failing to resolve schema names that shadow built-in type keywords — schemas named
bool,int,string, etc. now correctly resolve via LParen lookahead disambiguation (struct tuples always start with(, primitives never do) - Fixed
singularize()producing empty string for single-character field names (e.g.,"s"→"") — caused@structdefinitions with missing names and unparseable TL text output - Fixed
validate_tokens.pytoken comparison by converting API input tointfor safety
.NET
- Added
TLValueExtensionswithGetRequired()extension methods forTLValueandTLDocument— provides non-nullable access patterns, reducing CS8602 warnings in consuming code - Added TL007 diagnostic:
[TeaLeaf]classes in the global namespace now produce a compile-time error (“TeaLeaf type must be in a named namespace”) - Removed
SuppressDependenciesWhenPackingproperty fromTeaLeaf.Generators.csproj - Exposed
InternalsVisibleToforTeaLeaf.Tests
CI/CD
- Re-enabled all 6 GitHub Actions workflows after making the repository public (rust-cli, dotnet-package, accuracy-benchmark, docs, coverage, fuzz)
- Fixed coverlet filter quoting in coverage workflow — commas URL-encoded as
%2cto prevent shell argument splitting - Fixed Codecov token handling — made
CODECOV_TOKENoptional for public repo tokenless uploads - Fixed Codecov multi-file upload format — changed from YAML block scalar to comma-separated single-line
- Refactored coverage workflow to use
dotnet-coveragewith dedicated settings XML files - Added CodeQL security analysis workflow
- Fixed accuracy-benchmark workflow permissions
Testing
- Added Rust regression test for
anypseudo-type compile round-trip - Added 21 Rust tests for schema names shadowing all built-in type keywords (
bool,int,int8..int64,uint..uint64,float,float32,float64,string,timestamp,bytes) — covers JSON inference round-trip, direct TL parsing, self-referencing schemas, duplicate declarations, and multiple built-in-named schemas in one document - Added 4 .NET regression tests covering
TLDocument.FromJson→Compilewith heterogeneous nested objects, mixed-structure arrays, complex schema inference, and retail_orders.json end-to-end - Added .NET tests for JSON serialization of timestamps and byte arrays
- Added .NET coverage tests for multi-word enums and nullable nested objects
- Added .NET source generator tests (524 new lines in
GeneratorTests.cs) including TL007 global namespace diagnostic - Added .NET
TLValue.GetRequired()extension method tests - Added .NET
TLReaderbinary reader tests (168 new lines) - Added cross-platform
FindRepoFilehelper for .NET test fixture discovery (walks up directory tree instead of hardcoded relative path depth) - Verified full .NET test suite on Linux (WSL Ubuntu 24.04)
Tooling
- Added
--version/-VCLI flag - Added
delete-caches.ps1anddelete-caches.shGitHub Actions cache cleanup scripts - Updated
coverage.ps1to supportdotnet-coveragecollection with XML settings files
Documentation
- Updated binary deserialization method names in quick-start, LLM context guide, schema evolution guide, and derive macros docs
- Updated tealeaf workflow diagram
v2.0.0-beta.3
Features
- Byte literals —
b"..."hex syntax for byte data in text format (e.g.,payload: b"cafef00d") - Arbitrary-precision numbers —
Value::JsonNumberpreserves exact decimal representation for numbers exceeding native type ranges - Insertion order preservation —
IndexMapreplacesHashMapfor all user-facing containers; JSON round-trips now preserve original key order (ADR-0001) - Timestamp timezone support — Timestamps encode timezone offset in minutes (10 bytes: 8 millis + 2 offset); supports
Z,+HH:MM,-HH:MM,+HHformats - Special float values —
NaN,inf,-infkeywords for IEEE 754 special values (JSON export converts tonull) - Extended escape sequences —
\b(backspace),\f(form feed),\uXXXX(Unicode code points) for full JSON string escape parity - Forward compatibility — Unknown directives silently ignored, enabling older implementations to partially parse files with newer features (spec §1.18)
Bug Fixes
- Fixed bounds check failures and bitmap overflow issues in binary decoder
- Fixed lexer infinite loop on certain malformed inputs (found via fuzzing)
- Fixed NaN value quoting causing incorrect round-trip behavior
- Fixed parser crashes on deeply nested structures
- Fixed integer overflow in varint decoding
- Fixed off-by-one errors in array length checks
- Fixed negative hex/binary literal parsing
- Fixed exponent-only numbers (e.g.,
1e3) to parse as floats, not integers - Fixed timestamp timezone parsing to accept hour-only offsets (
+05=+05:00) - Rejected value-only types (
object,map,tuple,ref,tagged) as schema field types per spec §2.1 - Fixed .NET package publishing for
TeaLeaf.AnnotationsandTeaLeaf.Generatorsto NuGet
Performance
- Removed O(n log n) key sorting from all serialization paths: 6-17% faster for small/medium objects, up to 69% faster for tabular data
- Binary decode 56-105% slower for generic object workloads due to
IndexMapinsertion cost (acceptable trade-off per ADR-0001; columnar workloads less affected)
Specification
- Schema table header byte +6 stores Union Count (was reserved)
- String table length encoding changed from
u16tou32for strings > 65KB - Added type code
0x12forJSONNUMBER - Timestamp encoding extended to 10 bytes (8 millis + 2 offset)
- Added
bytes_litgrammar production; extendednumberto includeNaN/inf/-inf - Documented
object,map,ref,taggedas value-only types (not valid in schema fields) - Resolved compression algorithm spec contradiction: binary format v2 uses ZLIB (deflate), not zstd (ADR-0004)
Tooling
- Fuzzing infrastructure — 7 cargo-fuzz targets with custom dictionaries and structure-aware generation (ADR-0002)
- Fuzzing CI workflow — GitHub Actions runs all targets for 120s each (~15 min per run)
- Nesting depth limit — 256-level max for stack overflow protection (ADR-0003)
- VS Code extension — Syntax highlighting for
.tlfiles (vscode-tealeaf/) - FFI safety — Comprehensive
# Safetydocs on all FFI functions; regeneratedtealeaf.h - Token validation —
validate_tokens.pyscript validates API-reported token counts against tiktoken - Maintenance scripts —
delete-deploymentsanddelete-workflow-runsfor GitHub cleanup
Testing
- 238+ adversarial tests for malformed binary input
- 333+ .NET edge case tests for FFI boundary conditions
- Property-based tests with depth-bounded recursive generation
- Accuracy benchmark token savings updated to ~36% fewer data tokens (validated with tiktoken)
Documentation
- ADR-0001: IndexMap for Insertion Order Preservation
- ADR-0002: Fuzzing Architecture and Strategy
- ADR-0003: Maximum Nesting Depth Limit (256)
- ADR-0004: ZLIB Compression for Binary Format
- Code of Conduct, SECURITY.md, GitHub issue/PR templates
examples/showcase.tl— 736-line comprehensive format demonstration- Sample accuracy benchmark results
Breaking Changes
Value::ObjectusesIndexMap<String, Value>instead ofHashMap(type aliasObjectMapprovided;From<HashMap>retained for backward compatibility)Value::Timestamp(i64)→Value::Timestamp(i64, i16)— second field is timezone offset in minutesValue::JsonNumber(String)variant added — match expressions onValueneed new arm- Binary timestamps not backward-compatible (beta.2 readers cannot decode beta.3 timestamps; beta.3 readers handle beta.2 files by defaulting offset to UTC)
- JSON round-trips preserve key order instead of alphabetizing
v2.0.0-beta.2
Format
@uniondefinitions now encoded in binary schema table (full text-binary-text roundtrip)- Union schema region uses backward-compatible extension of schema table header
- Derive macro
collect_unions()generates union definitions for Rust enums TeaLeafBuilder::add_union()for programmatic union construction
Improvements
- Version sync automation expanded to cover all project files (16 targets)
- NuGet package icon added to all NuGet packages (TeaLeaf, Annotations, Generators)
- CI badges added to README (Rust CI, .NET CI, crates.io, NuGet, codecov, License)
- crates.io publish ordering fixed (
tealeaf-derivebeforetealeaf-core) - Contributing guide added (
CONTRIBUTING.md) - Spec governance documentation added
- Accuracy benchmark
dump-promptssubcommand for offline prompt inspection TeaLeaf.Annotationspublished as separate NuGet package (fixes dependency resolution)benches_proto/excluded from crates.io package (removesprotocrequirement for consumers)
v2.0.0-beta.1
Initial public beta release.
Format
- Text format (
.tl) with comments, schemas, and all value types - Binary format (
.tlbx) with string deduplication, schema embedding, and per-section compression - 15 primitive types + 6 container/semantic types
- Inline schemas with
@struct,@table,@map,@union - References (
!name) and tagged values (:tag value) - File includes (
@include) - ISO 8601 timestamp support
- JSON bidirectional conversion with schema inference
CLI
- 8 commands:
compile,decompile,info,validate,to-json,from-json,tlbx-to-json,json-to-tlbx - Pre-built binaries for 7 platforms (Windows, Linux, macOS – x64 and ARM64)
Rust
tealeaf-corecrate with full parser, compiler, and readertealeaf-derivecrate with#[derive(ToTeaLeaf, FromTeaLeaf)]- Builder API (
TeaLeafBuilder) - Memory-mapped binary reading
- Conversion traits with automatic schema collection
.NET
TeaLeafNuGet package with native libraries for all platforms- C# incremental source generator (
[TeaLeaf]attribute) - Reflection-based serializer (
TeaLeafSerializer) - Managed wrappers (
TLDocument,TLValue,TLReader) - Schema introspection API
- Diagnostic codes TL001-TL006
FFI
- C-compatible API via
tealeaf-fficrate - 45+ exported functions
- Thread-safe error handling
- Null-safe for all pointer parameters
- C header generation via
cbindgen
Known Limitations
Bytes type does not round-trip through text format(resolved:b"..."hex literals added)- JSON import does not recognize
$ref,$tag, or timestamp strings - Individual string length limited to ~4 GB (u32) in binary format
- 64-byte header overhead makes TeaLeaf inefficient for very small objects