ADR-0001: Use IndexMap for Insertion Order Preservation
- Status: Accepted
- Date: 2026-02-05
- Applies to: tealeaf-core, tealeaf-derive, tealeaf-ffi
Context
TeaLeaf’s primary use case is context engineering for LLM applications, where structured data passes through multiple format conversions (JSON → .tl → .tlbx and back). Users intentionally order their JSON keys to convey semantic meaning — for example, placing name before description before details to mirror how a human would read the document. Prior to this change, all user-facing maps used HashMap<K, V>, and the text serializer and binary writer explicitly sorted keys alphabetically before output.
This caused two problems:
-
Semantic ordering was lost. A user who wrote
{"zebra": 1, "apple": 2}in their JSON would get{"apple": 2, "zebra": 1}after a round-trip through TeaLeaf. For LLM prompt engineering, this reordering could change how models interpret the context. -
Sorting was unnecessary work. Every serialization path (
dumps(),compile(),write_value(),to_tl_with_schemas()) collected keys into aVec, sorted them, and then iterated — adding O(n log n) overhead to every output operation.
Alternatives Considered
| Approach | Pros | Cons |
|---|---|---|
| Keep HashMap + sort (status quo) | Deterministic output, no dependency change | Loses user intent, sorting overhead |
| Vec of (key, value) pairs | Order preserved, no new dependency | Loses O(1) key lookup, breaks API surface broadly |
| IndexMap | Order preserved, O(1) lookup, drop-in API | Slightly slower decode (insertion cost), new dependency |
| BTreeMap | Sorted + deterministic | Still not insertion-ordered, lookup O(log n) |
Decision
Replace HashMap with IndexMap (from the indexmap crate v2) in all user-facing ordered containers:
Value::Object→ObjectMap<String, Value>(type alias forIndexMap)TeaLeaf.data,TeaLeaf.schemas,TeaLeaf.unions→IndexMap<String, _>Parseroutput,Reader.sections, trait return types →IndexMap
Internal lookup tables stay as HashMap because they don’t need ordering:
Writer.string_map,Writer.schema_map,Writer.union_mapReader.schema_map,Reader.union_map,Reader.cache
Additionally:
- Enable
serde_json’spreserve_orderfeature so JSON parsing also preserves key order - Remove all explicit
keys.sort()calls from serialization paths - Re-export
IndexMapandObjectMapfromtealeaf-coreso derive macros and downstream crates don’t need a directindexmapdependency
Consequences
Positive
- Round-trip fidelity. JSON → TeaLeaf → JSON now preserves the original key order at every level (sections, object fields, schema definitions).
- Encoding is faster. Removing O(n log n) sort calls from every serialization path yields measurable improvements in encode benchmarks (6–17% for small/medium objects).
- Simpler serialization code. Serialization loops iterate the map directly instead of collecting-sorting-iterating.
- Binary format is unchanged. Old
.tlbxfiles remain fully readable. The reader always produces keys in file order, which for old files happens to be alphabetical.
Negative
-
Binary decode is slower.
IndexMap::insert()is slower thanHashMap::insert()because it maintains a dense insertion-order array alongside the hash table. Benchmarks show +56% to +105% regression for decode-heavy workloads (large arrays of objects, deeply nested structs). For the primary use case (LLM context), this is acceptable because:- Documents are typically encoded once and consumed as text (not repeatedly decoded from binary)
- The absolute times remain in the microsecond-to-millisecond range
- Encode performance (the more common hot path) improved
-
New dependency.
indexmapv2 is a well-maintained, widely-used crate (used byserde_jsoninternally), so supply-chain risk is minimal. -
Public API change.
TeaLeaf::new()now takesIndexMapinstead ofHashMap. This is a breaking change, mitigated by:- The project is in beta (
2.0.0-beta.2) From<HashMap<String, Value>> for Valueconversion is retained for backward compatibility- Downstream code using
.get(),.insert(),.iter()works identically
- The project is in beta (
Benchmark Summary
| Workload | Encode | Decode |
|---|---|---|
| small_object | -16% (faster) | — |
| nested_structs | -10% to -17% (faster) | +56% to +68% (slower) |
| large_array_10000 | -5% (faster) | +105% (slower) |
| tabular_5000 | -69% (faster) | -48% (faster) |
Note: Tabular workloads use struct-array encoding (columnar), which has fewer per-row
IndexMapinsertions. The decode regression is concentrated in generic object decoding where each row creates a newObjectMapwith field-by-field inserts.
References
indexmapcrate documentation- serde_json
preserve_orderfeature - Implementation PR: HashMap → IndexMap migration across 16+ files