Drains the binaryHasher's LeafProducer side-channel in StateDB.commit and
threads the stem writes through stateUpdate.encodeBinary into the pathdb
state set as per-offset accountData entries (key = stem||offset, value =
32-byte leaf or nil for clears).
The flat-state codec gains a Flush method that owns the in-memory→disk
write path, replacing the codec-agnostic per-entry loop in writeStates.
The merkle codec preserves its historical per-entry behavior verbatim;
the bintrie codec aggregates per-offset writes by stem so each stem hits
disk via a single read-modify-write, satisfying the codec's pre-aggregation
requirement and updating the clean cache with the merged blob it just
produced (no extra disk read).
stateUpdate.encodeBinary returns empty origin maps for the bintrie path:
state-history rollback for bintrie is deferred to a follow-up PR (see
BINTRIE_FLAT_STATE_REORG_GAP.md), and the diskLayer.revert path will
panic before consuming origins anyway.
Introduce the codec and on-disk blob format for the bintrie flat-state
layer. This commit only defines the types; the codec is NOT wired into
pathdb.Database.New yet (that happens in a later commit once the
leaf-production hook in binaryHasher and the stateUpdate wiring are in
place).
Three pieces:
1. trie/bintrie/pack.go
Canonical PackBasicData / UnpackBasicData helpers that encode an
account's (codeSize, nonce, balance) into the 32-byte BasicData leaf
defined by EIP-7864. Preserves the existing BinaryTrie.UpdateAccount
layout byte-for-byte (4-byte code_size at offset 4 rather than the
spec's 3-byte field at offset 5 — any realistic code size has byte 4
always zero and the two encodings are bit-equivalent in practice).
BinaryTrie.UpdateAccount is refactored to delegate to PackBasicData
so the flat-state codec can produce a bit-identical BasicData
encoding without duplicating the layout logic.
2. triedb/pathdb/stem_blob.go
Packed encoding of the populated (offset, value) pairs at a bintrie
stem. A stem can hold up to 256 offsets per EIP-7864 but in practice
only a handful are set; the layout is a 32-byte bitmap followed by
N 32-byte values in ascending offset order, where N = popcount.
Empty stems encode to nil so the caller knows to delete the on-disk
key rather than write a zero-length value.
Provides encodeStemBlob / decodeStemBlob / extractStemOffset /
mergeStemBlob and a stemBuilder type for accumulating writes. The
tombstone convention (32 zero bytes = "present with zero" as used
by DeleteStorage) is preserved.
11 unit tests cover: empty blob, BasicData+CodeHash roundtrip, all
256 offsets populated, sparse high offsets, set/clear roundtrip,
load-from-existing-blob RMW, merge helper, merge-to-empty, tombstone
zero bytes, malformed input detection, bitmap rank sanity.
3. triedb/pathdb/flat_codec_bintrie.go
bintrieFlatCodec implements flatStateCodec over the stem-blob layout.
Unlike merkleFlatCodec it is stateful: it holds a ethdb.KeyValueReader
reference used by applyWrites to read the existing stem blob before
merging in new writes. ethdb.Batch is write-only so the batch passed
to Write* cannot be used to fetch current state.
Pre-aggregation requirement is documented explicitly: within a single
flush, the caller must NOT issue two Write* calls targeting the same
stem, because the RMW read comes from the store (not the in-flight
batch). Commit 8 of the bintrie flat-state plan restructures
writeStates to pre-aggregate per-stem writes so callers don't have
to handle this manually.
Cache keys are prefix-disambiguated with a one-byte 0x01 to keep
bintrie stem lookups disjoint from merkle 32-byte account keys and
64-byte storage keys in the shared clean-state fastcache.
SplitMarker is a single-tier (stem-only) format, not the merkle
two-tier (account, account+storage) format.
7 unit tests cover: account roundtrip, storage roundtrip, multiple
writes to the same stem, DeleteAccount preserving unrelated offsets,
DeleteStorage removing the final offset collapsing the key, cache
key disjointness from merkle, SplitMarker semantics.
The codec is not dispatched by anything yet; MPT continues through the
merkle codec and bintrie mode still runs on the (soon-to-be-replaced)
keccak-shaped path until Commit 10 wires things up.