Performance optimizations to the NOMT storage engine while preserving
correctness (all triecompare cross-validation tests pass at 10K+ scale):
- Pool SHA256 hashers via sync.Pool in HashInternal and HashStem
- Replace allStems map with sorted slice + O(N+M) merge (in-place fast
path for incremental updates avoids allocation entirely)
- Add UpdateSorted to db.DB, skipping redundant sort of pre-sorted ops
- Simplify canonicalRoot to use pre-sorted allStems directly
- Optimize StemSharedBits with byte-level XOR + bits.LeadingZeros8
- Replace stemLess loops with bytes.Compare in all locations
- Eliminate per-stem map alloc in groupAndHashStems (use [256]bool dirty)
- Use stack-allocated [248]bool for downBits in BuildInternalTree
- Remove unused stemPathCmp function
BenchmarkHash/10000/nomt: 9.8ms → 8.2ms (-16%)
BenchmarkBlockWorkload/nomt: 7.7ms → 6.6ms (-14%)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove leaf compaction from PageWalker.compactStep, replace KeyValue
with StemKeyValue throughout the merkle engine and worker, and update
DB.Update to accept pre-hashed stem key-value pairs.
Key change: singleThreadedUpdate now uses the same depth-7 child-index
partitioning as the parallel path, ensuring identical intermediate
hashes without leaf compaction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Keccak256+MSB tagging with SHA256. Remove leaf nodes entirely,
replacing them with opaque stem hashes. Simplify NodeKind to just
Terminator and Internal. Add HashStem (8-level binary SHA256 tree
matching bintrie StemNode.Hash). Reduce max trie depth from 256 to 248
(31-byte stem path). Replace BuildTrie/LeafOp with
BuildInternalTree/StemKeyValue.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Parallelize the PageWalker trie update across multiple goroutines by
partitioning sorted operations by the root page's 64 child subtrees
(first 6 bits of each key path).
Each worker runs an independent PageWalker constrained to child pages
below the root (using parentPage mechanism), producing ChildPageRoots.
After all workers complete, a root walker places the child roots using
AdvanceAndPlaceNode and concludes with the final trie root.
Workers operate on disjoint page subtrees so no synchronization is
needed during computation — only sync.WaitGroup for goroutine join.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement crash-safe persistence for the Bitbox hash table:
- wal.go: WAL format with START/CLEAR/UPDATE/END entries, builder/reader
- sync.go: 3-phase sync protocol (BeginSync → WriteWAL → CommitSync)
- recover.go: WAL replay for crash recovery
The WAL records page diffs (not full pages) for compact logging. The
3-phase protocol ensures: WAL fsynced before HT modification, HT fsynced
before WAL truncation, providing at-least-once delivery of page updates.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement the on-disk open-addressing hash table for storing trie pages:
- htfile.go: HT file layout with header, meta pages, and data pages
- metamap.go: in-memory meta byte map with dirty page tracking
- probe.go: triangular probing with xxhash64 page ID hashing
- db.go: Bitbox DB with StorePage, LoadPage, DeletePage, FlushMeta, Sync
The hash table uses 1-byte meta tags (top 7 bits of hash) for fast
filtering before reading full 4096-byte data pages. Triangular probing
with power-of-2 capacity guarantees all buckets are visited.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>