go-ethereum

forks/go-ethereum

Fork 1

mirror of https://github.com/ethereum/go-ethereum.git synced 2026-06-14 10:51:35 +00:00

Commit graph

Author	SHA1	Message	Date
CPerezz	9fc733c2e2	core/state,triedb/pathdb: add defensive length checks in encodeBinary and encodeStemBlob Addresses review findings I13 and S6. encodeBinary: reject non-nil bintrie leaves with length != 32 at the trust boundary between the hasher and the state update. Previously a wrong-length leaf silently made it into the diff layer's accountData and only surfaced as a panic deep in the Flush path (stemBuilder.set). encodeStemBlob: add an upper-bound check on the value count (must be <= 256, the maximum offsets per stem). Previously a buggy producer could pass an arbitrarily long values slice.	2026-04-15 15:00:41 +02:00
CPerezz	45de1c3cc1	triedb/pathdb: fix mid-stem generator resume via mergeStemBlob RMW Addresses review finding C1. Before this commit, flushStem in generateBinTrieStems used builder.encode() to overwrite the on-disk stem blob unconditionally. When a crash+restart interrupted generation mid-stem (e.g., at offset 3 of stemA), the resume iterator positioned at stemA\|\|3, the builder accumulated only offsets 3+, and flushStem overwrote the disk blob with a partial result — silently losing offsets 0, 1, 2 that were written in the prior pass. Fix: make flushStem a read-modify-write. It now reads the existing on-disk stem blob (if any), converts the builder's accumulated offsets to []stemOffsetValue via a new toOffsetValues() helper, and merges them via the existing mergeStemBlob function. The merge semantics are "builder values win" — new offsets overwrite their existing counterparts, and gaps are filled from the prior blob. This makes the RMW idempotent across resume cycles: the same stem can be re-walked from any midpoint and the final disk blob always contains the union of all passes. New helper: stemBuilder.toOffsetValues() converts the builder's populated bitmap entries into a []stemOffsetValue slice suitable for mergeStemBlob. ~20 LOC in stem_blob.go. Tests: * TestBintrieGeneratorResumeMidStem — pre-seeds disk with a partial stem (offsets 0, 1), resumes generator at offset 1, asserts all offsets survive including the pre-seeded offset 0. Before the fix this test fails with "BasicData lost after mid-stem resume". * TestBintrieGeneratorResumeStemBoundary — renamed from the original TestBintrieGeneratorResume, unchanged behavior (stem-boundary resume).	2026-04-15 15:00:41 +02:00
CPerezz	437a53bbe0	triedb/pathdb: implement bintrieFlatCodec + stem blob helpers Introduce the codec and on-disk blob format for the bintrie flat-state layer. This commit only defines the types; the codec is NOT wired into pathdb.Database.New yet (that happens in a later commit once the leaf-production hook in binaryHasher and the stateUpdate wiring are in place). Three pieces: 1. trie/bintrie/pack.go Canonical PackBasicData / UnpackBasicData helpers that encode an account's (codeSize, nonce, balance) into the 32-byte BasicData leaf defined by EIP-7864. Preserves the existing BinaryTrie.UpdateAccount layout byte-for-byte (4-byte code_size at offset 4 rather than the spec's 3-byte field at offset 5 — any realistic code size has byte 4 always zero and the two encodings are bit-equivalent in practice). BinaryTrie.UpdateAccount is refactored to delegate to PackBasicData so the flat-state codec can produce a bit-identical BasicData encoding without duplicating the layout logic. 2. triedb/pathdb/stem_blob.go Packed encoding of the populated (offset, value) pairs at a bintrie stem. A stem can hold up to 256 offsets per EIP-7864 but in practice only a handful are set; the layout is a 32-byte bitmap followed by N 32-byte values in ascending offset order, where N = popcount. Empty stems encode to nil so the caller knows to delete the on-disk key rather than write a zero-length value. Provides encodeStemBlob / decodeStemBlob / extractStemOffset / mergeStemBlob and a stemBuilder type for accumulating writes. The tombstone convention (32 zero bytes = "present with zero" as used by DeleteStorage) is preserved. 11 unit tests cover: empty blob, BasicData+CodeHash roundtrip, all 256 offsets populated, sparse high offsets, set/clear roundtrip, load-from-existing-blob RMW, merge helper, merge-to-empty, tombstone zero bytes, malformed input detection, bitmap rank sanity. 3. triedb/pathdb/flat_codec_bintrie.go bintrieFlatCodec implements flatStateCodec over the stem-blob layout. Unlike merkleFlatCodec it is stateful: it holds a ethdb.KeyValueReader reference used by applyWrites to read the existing stem blob before merging in new writes. ethdb.Batch is write-only so the batch passed to Write* cannot be used to fetch current state. Pre-aggregation requirement is documented explicitly: within a single flush, the caller must NOT issue two Write* calls targeting the same stem, because the RMW read comes from the store (not the in-flight batch). Commit 8 of the bintrie flat-state plan restructures writeStates to pre-aggregate per-stem writes so callers don't have to handle this manually. Cache keys are prefix-disambiguated with a one-byte 0x01 to keep bintrie stem lookups disjoint from merkle 32-byte account keys and 64-byte storage keys in the shared clean-state fastcache. SplitMarker is a single-tier (stem-only) format, not the merkle two-tier (account, account+storage) format. 7 unit tests cover: account roundtrip, storage roundtrip, multiple writes to the same stem, DeleteAccount preserving unrelated offsets, DeleteStorage removing the final offset collapsing the key, cache key disjointness from merkle, SplitMarker semantics. The codec is not dispatched by anything yet; MPT continues through the merkle codec and bintrie mode still runs on the (soon-to-be-replaced) keccak-shaped path until Commit 10 wires things up.	2026-04-15 15:00:40 +02:00

Author

SHA1

Message

Date

CPerezz

9fc733c2e2

core/state,triedb/pathdb: add defensive length checks in encodeBinary and encodeStemBlob

Addresses review findings I13 and S6.

encodeBinary: reject non-nil bintrie leaves with length != 32 at the
trust boundary between the hasher and the state update. Previously a
wrong-length leaf silently made it into the diff layer's accountData
and only surfaced as a panic deep in the Flush path (stemBuilder.set).

encodeStemBlob: add an upper-bound check on the value count (must be
<= 256, the maximum offsets per stem). Previously a buggy producer
could pass an arbitrarily long values slice.

2026-04-15 15:00:41 +02:00

CPerezz

45de1c3cc1

triedb/pathdb: fix mid-stem generator resume via mergeStemBlob RMW

Addresses review finding C1.

Before this commit, flushStem in generateBinTrieStems used
builder.encode() to overwrite the on-disk stem blob unconditionally.
When a crash+restart interrupted generation mid-stem (e.g., at offset 3
of stemA), the resume iterator positioned at stemA||3, the builder
accumulated only offsets 3+, and flushStem overwrote the disk blob with
a partial result — silently losing offsets 0, 1, 2 that were written in
the prior pass.

Fix: make flushStem a read-modify-write. It now reads the existing
on-disk stem blob (if any), converts the builder's accumulated offsets
to []stemOffsetValue via a new toOffsetValues() helper, and merges them
via the existing mergeStemBlob function. The merge semantics are
"builder values win" — new offsets overwrite their existing counterparts,
and gaps are filled from the prior blob. This makes the RMW idempotent
across resume cycles: the same stem can be re-walked from any midpoint
and the final disk blob always contains the union of all passes.

New helper: stemBuilder.toOffsetValues() converts the builder's
populated bitmap entries into a []stemOffsetValue slice suitable for
mergeStemBlob. ~20 LOC in stem_blob.go.

Tests:
  * TestBintrieGeneratorResumeMidStem — pre-seeds disk with a partial
    stem (offsets 0, 1), resumes generator at offset 1, asserts all
    offsets survive including the pre-seeded offset 0. Before the fix
    this test fails with "BasicData lost after mid-stem resume".
  * TestBintrieGeneratorResumeStemBoundary — renamed from the original
    TestBintrieGeneratorResume, unchanged behavior (stem-boundary
    resume).

2026-04-15 15:00:41 +02:00

CPerezz

437a53bbe0

triedb/pathdb: implement bintrieFlatCodec + stem blob helpers

Introduce the codec and on-disk blob format for the bintrie flat-state
layer. This commit only defines the types; the codec is NOT wired into
pathdb.Database.New yet (that happens in a later commit once the
leaf-production hook in binaryHasher and the stateUpdate wiring are in
place).

Three pieces:

1. trie/bintrie/pack.go

   Canonical PackBasicData / UnpackBasicData helpers that encode an
   account's (codeSize, nonce, balance) into the 32-byte BasicData leaf
   defined by EIP-7864. Preserves the existing BinaryTrie.UpdateAccount
   layout byte-for-byte (4-byte code_size at offset 4 rather than the
   spec's 3-byte field at offset 5 — any realistic code size has byte 4
   always zero and the two encodings are bit-equivalent in practice).

   BinaryTrie.UpdateAccount is refactored to delegate to PackBasicData
   so the flat-state codec can produce a bit-identical BasicData
   encoding without duplicating the layout logic.

2. triedb/pathdb/stem_blob.go

   Packed encoding of the populated (offset, value) pairs at a bintrie
   stem. A stem can hold up to 256 offsets per EIP-7864 but in practice
   only a handful are set; the layout is a 32-byte bitmap followed by
   N 32-byte values in ascending offset order, where N = popcount.
   Empty stems encode to nil so the caller knows to delete the on-disk
   key rather than write a zero-length value.

   Provides encodeStemBlob / decodeStemBlob / extractStemOffset /
   mergeStemBlob and a stemBuilder type for accumulating writes. The
   tombstone convention (32 zero bytes = "present with zero" as used
   by DeleteStorage) is preserved.

   11 unit tests cover: empty blob, BasicData+CodeHash roundtrip, all
   256 offsets populated, sparse high offsets, set/clear roundtrip,
   load-from-existing-blob RMW, merge helper, merge-to-empty, tombstone
   zero bytes, malformed input detection, bitmap rank sanity.

3. triedb/pathdb/flat_codec_bintrie.go

   bintrieFlatCodec implements flatStateCodec over the stem-blob layout.
   Unlike merkleFlatCodec it is stateful: it holds a ethdb.KeyValueReader
   reference used by applyWrites to read the existing stem blob before
   merging in new writes. ethdb.Batch is write-only so the batch passed
   to Write* cannot be used to fetch current state.

   Pre-aggregation requirement is documented explicitly: within a single
   flush, the caller must NOT issue two Write* calls targeting the same
   stem, because the RMW read comes from the store (not the in-flight
   batch). Commit 8 of the bintrie flat-state plan restructures
   writeStates to pre-aggregate per-stem writes so callers don't have
   to handle this manually.

   Cache keys are prefix-disambiguated with a one-byte 0x01 to keep
   bintrie stem lookups disjoint from merkle 32-byte account keys and
   64-byte storage keys in the shared clean-state fastcache.

   SplitMarker is a single-tier (stem-only) format, not the merkle
   two-tier (account, account+storage) format.

   7 unit tests cover: account roundtrip, storage roundtrip, multiple
   writes to the same stem, DeleteAccount preserving unrelated offsets,
   DeleteStorage removing the final offset collapsing the key, cache
   key disjointness from merkle, SplitMarker semantics.

The codec is not dispatched by anything yet; MPT continues through the
merkle codec and bintrie mode still runs on the (soon-to-be-replaced)
keccak-shaped path until Commit 10 wires things up.

2026-04-15 15:00:40 +02:00

3 commits