go-ethereum

forks/go-ethereum

Fork 1

mirror of https://github.com/ethereum/go-ethereum.git synced 2026-06-16 11:51:35 +00:00

Commit graph

Author	SHA1	Message	Date
CPerezz	45de1c3cc1	triedb/pathdb: fix mid-stem generator resume via mergeStemBlob RMW Addresses review finding C1. Before this commit, flushStem in generateBinTrieStems used builder.encode() to overwrite the on-disk stem blob unconditionally. When a crash+restart interrupted generation mid-stem (e.g., at offset 3 of stemA), the resume iterator positioned at stemA\|\|3, the builder accumulated only offsets 3+, and flushStem overwrote the disk blob with a partial result — silently losing offsets 0, 1, 2 that were written in the prior pass. Fix: make flushStem a read-modify-write. It now reads the existing on-disk stem blob (if any), converts the builder's accumulated offsets to []stemOffsetValue via a new toOffsetValues() helper, and merges them via the existing mergeStemBlob function. The merge semantics are "builder values win" — new offsets overwrite their existing counterparts, and gaps are filled from the prior blob. This makes the RMW idempotent across resume cycles: the same stem can be re-walked from any midpoint and the final disk blob always contains the union of all passes. New helper: stemBuilder.toOffsetValues() converts the builder's populated bitmap entries into a []stemOffsetValue slice suitable for mergeStemBlob. ~20 LOC in stem_blob.go. Tests: * TestBintrieGeneratorResumeMidStem — pre-seeds disk with a partial stem (offsets 0, 1), resumes generator at offset 1, asserts all offsets survive including the pre-seeded offset 0. Before the fix this test fails with "BasicData lost after mid-stem resume". * TestBintrieGeneratorResumeStemBoundary — renamed from the original TestBintrieGeneratorResume, unchanged behavior (stem-boundary resume).	2026-04-15 15:00:41 +02:00
CPerezz	0508d40aaf	triedb/pathdb: bintrie snapshot generator Adds generateBinTrieStems, the bintrie analogue of generateAccounts. It opens the bintrie via a sha256-aware bintrieDiskStore (the merkle disk store would always fail root validation against a binary node), iterates all leaves with binaryNodeIterator, aggregates them into per-stem builders, and emits one stem blob per stem boundary. Resume support is structural: ctx.marker is fed straight to the trie's NodeIterator, which uses binaryNodeIterator.seek (Commit 1) to position on the first leaf >= marker. Range proofs are deliberately skipped — the bintrie's Prove path is unimplemented and an iteration-only generation cycle is acceptable for a one-time startup cost. A bintrieGeneratorContext mirrors generatorContext but is much smaller: no holdable iterators (we walk the trie, not the existing flat state) and no two-tier marker (the bintrie key space is unified). checkAndFlushBin journals progress as a single 32-byte (stem \|\| offset) key so resume can pick up mid-stem. generator.run dispatches on codec type so callers see a uniform lifecycle whether the underlying scheme is merkle or bintrie.	2026-04-15 15:00:40 +02:00

Author

SHA1

Message

Date

CPerezz

45de1c3cc1

triedb/pathdb: fix mid-stem generator resume via mergeStemBlob RMW

Addresses review finding C1.

Before this commit, flushStem in generateBinTrieStems used
builder.encode() to overwrite the on-disk stem blob unconditionally.
When a crash+restart interrupted generation mid-stem (e.g., at offset 3
of stemA), the resume iterator positioned at stemA||3, the builder
accumulated only offsets 3+, and flushStem overwrote the disk blob with
a partial result — silently losing offsets 0, 1, 2 that were written in
the prior pass.

Fix: make flushStem a read-modify-write. It now reads the existing
on-disk stem blob (if any), converts the builder's accumulated offsets
to []stemOffsetValue via a new toOffsetValues() helper, and merges them
via the existing mergeStemBlob function. The merge semantics are
"builder values win" — new offsets overwrite their existing counterparts,
and gaps are filled from the prior blob. This makes the RMW idempotent
across resume cycles: the same stem can be re-walked from any midpoint
and the final disk blob always contains the union of all passes.

New helper: stemBuilder.toOffsetValues() converts the builder's
populated bitmap entries into a []stemOffsetValue slice suitable for
mergeStemBlob. ~20 LOC in stem_blob.go.

Tests:
  * TestBintrieGeneratorResumeMidStem — pre-seeds disk with a partial
    stem (offsets 0, 1), resumes generator at offset 1, asserts all
    offsets survive including the pre-seeded offset 0. Before the fix
    this test fails with "BasicData lost after mid-stem resume".
  * TestBintrieGeneratorResumeStemBoundary — renamed from the original
    TestBintrieGeneratorResume, unchanged behavior (stem-boundary
    resume).

2026-04-15 15:00:41 +02:00

CPerezz

0508d40aaf

triedb/pathdb: bintrie snapshot generator

Adds generateBinTrieStems, the bintrie analogue of generateAccounts. It
opens the bintrie via a sha256-aware bintrieDiskStore (the merkle disk
store would always fail root validation against a binary node), iterates
all leaves with binaryNodeIterator, aggregates them into per-stem
builders, and emits one stem blob per stem boundary.

Resume support is structural: ctx.marker is fed straight to the trie's
NodeIterator, which uses binaryNodeIterator.seek (Commit 1) to position
on the first leaf >= marker. Range proofs are deliberately skipped — the
bintrie's Prove path is unimplemented and an iteration-only generation
cycle is acceptable for a one-time startup cost.

A bintrieGeneratorContext mirrors generatorContext but is much smaller:
no holdable iterators (we walk the trie, not the existing flat state)
and no two-tier marker (the bintrie key space is unified). checkAndFlushBin
journals progress as a single 32-byte (stem || offset) key so resume
can pick up mid-stem.

generator.run dispatches on codec type so callers see a uniform
lifecycle whether the underlying scheme is merkle or bintrie.

2026-04-15 15:00:40 +02:00

2 commits