go-ethereum

forks/go-ethereum

Fork 1

mirror of https://github.com/ethereum/go-ethereum.git synced 2026-06-14 10:51:35 +00:00

Commit graph

Author	SHA1	Message	Date
CPerezz	fcc0587ec3	core/state,triedb/pathdb: fix bintrieFlatReader disk-layer shape via per-offset extraction Addresses review finding C2 (+ I5, S5, T2, T3, T12). Before this commit, bintrieFlatCodec.ReadAccount returned the FULL variable-length stem blob from disk while the in-memory diff-layer buffer stored per-offset 32-byte values. The consumer, bintrieFlatReader.Account, enforced len(basicBlob)!=32 → error, so every disk-layer hit produced "bintrie BasicData leaf invalid length" in production the moment the write buffer flushed. TestBintrieFlatReaderEndToEnd did not catch this because it never forced a buffer → disk flush. Fix: make bintrieFlatCodec.ReadAccount extract the offset from the stem blob (mirroring ReadStorage), so the disk path and the buffer path return the same 32-byte per-offset shape. Update AccountCacheKey/StorageCacheKey to embed the full 32-byte key (prefix + 31-byte stem + 1-byte offset), since caching under a stem-only key would collapse BasicData and CodeHash into the same slot and return the wrong value on the second hit. Update Flush's cache-update loop to store per-offset entries from the aggregated write set. Design note: I considered the alternative of introducing a new StemBlob(stem) interface method that returns the full blob synthesized from a stem-level lookup index. Rejected because (a) the index is a new data structure with its own consistency invariants, (b) the per-offset approach is strictly local to the codec + reader, and (c) the "1 Pebble read per Account" locality benefit is preserved at the OS page cache level — both offsets at the same stem live in the same Pebble block, so the second read is effectively free. bintrieFlatReader.Account still does two AccountRLP lookups; the torn-read hazard is gated by a new load-bearing invariant test, TestBinaryHasherWritesBothBasicAndCodeHash, which asserts that binaryHasher.updateAccount always emits both BasicData and CodeHash leaves together. A future code-only update that broke this invariant would fail the test. Tests added: * TestBintrieFlatReaderEndToEndAfterFlush — explicitly flushes via tdb.Commit(root, false) and re-reads through a fresh StateReader. This is the smoking-gun regression for C2. * TestBintrieFlatReaderMultipleOffsetsPerStem — multiple offsets at the same stem (BasicData, CodeHash, header storage slots) all round-trip post-flush. * TestBintrieCodecCrossFlushRMW — two Flush calls to the same stem from different "blocks" correctly merge on disk, with prior offsets preserved. * TestBinaryHasherWritesBothBasicAndCodeHash — locks down the hasher co-write invariant that bintrieFlatReader.Account relies on. Existing tests updated to match the new per-offset ReadAccount semantics: * TestBintrieCodecAccountRoundTrip, TestBintrieCodecMultipleWritesSameStem, TestBintrieCodecDeleteAccount — now read per-offset rather than calling extractStemOffset on the raw blob. * TestBintrieCodecCacheKeysDisjoint — additionally verifies two offsets at the same stem produce distinct cache keys. Error messages in bintrieFlatReader now include address and length context (S5).	2026-04-15 15:00:40 +02:00
CPerezz	bfb77d98f6	core/state,triedb/pathdb: enable bintrie flat state reads end-to-end Wires the pieces from Commits 1-9 into a running system: * triedb/pathdb.New: install the bintrieFlatCodec when isVerkle is set, backed by the same verkle-namespaced db used for trie nodes. * triedb/pathdb.database.go: drop isVerkle from the noBuild guard so the bintrie generator (Commit 9) runs on startup, and remove it from the generateSnapshot call path for the same reason. * triedb/pathdb.disklayer.revert: hard-fail on bintrie because the reorg path would replay merkle-shaped origin records against a per-stem layout. Tracked in BINTRIE_FLAT_STATE_REORG_GAP.md. * triedb/pathdb.journal: add IsBintrie to journalGenerator (rlp:"optional" so v3 journals still decode) and make journalProgress a method on generator so it stamps the active scheme; loadGenerator discards any journal whose scheme does not match the database, forcing a fresh regeneration. * triedb/pathdb.reader: export RawStateReader, a small extension of database.StateReader that exposes AccountRLP so callers outside the package can reach the raw flat-state bytes without going through the slim-RLP decode path that assumes merkle shape. * core/state.reader: add bintrieFlatReader, the bintrie equivalent of flatReader. It derives the EIP-7864 stem keys from (addr, slot), performs two AccountRLP lookups per Account call (BasicData + CodeHash), and decodes via bintrie.UnpackBasicData. Storage reads go through a single AccountRLP lookup at the slot's full bintrie key. * core/state.database.StateReader: dispatch to bintrieFlatReader when the path database is in verkle mode; merkle path unchanged. Depends on the lookup sentinel fix in the previous commit; without it missing-account reads on bintrie misreport as "layer stale".	2026-04-15 15:00:40 +02:00

Author

SHA1

Message

Date

CPerezz

fcc0587ec3

core/state,triedb/pathdb: fix bintrieFlatReader disk-layer shape via per-offset extraction

Addresses review finding C2 (+ I5, S5, T2, T3, T12).

Before this commit, bintrieFlatCodec.ReadAccount returned the FULL
variable-length stem blob from disk while the in-memory diff-layer
buffer stored per-offset 32-byte values. The consumer,
bintrieFlatReader.Account, enforced len(basicBlob)!=32 → error, so
every disk-layer hit produced "bintrie BasicData leaf invalid length"
in production the moment the write buffer flushed. TestBintrieFlatReaderEndToEnd
did not catch this because it never forced a buffer → disk flush.

Fix: make bintrieFlatCodec.ReadAccount extract the offset from the
stem blob (mirroring ReadStorage), so the disk path and the buffer
path return the same 32-byte per-offset shape. Update
AccountCacheKey/StorageCacheKey to embed the full 32-byte key
(prefix + 31-byte stem + 1-byte offset), since caching under a
stem-only key would collapse BasicData and CodeHash into the same
slot and return the wrong value on the second hit. Update
Flush's cache-update loop to store per-offset entries from the
aggregated write set.

Design note: I considered the alternative of introducing a new
StemBlob(stem) interface method that returns the full blob synthesized
from a stem-level lookup index. Rejected because (a) the index is a
new data structure with its own consistency invariants, (b) the
per-offset approach is strictly local to the codec + reader, and (c)
the "1 Pebble read per Account" locality benefit is preserved at the
OS page cache level — both offsets at the same stem live in the same
Pebble block, so the second read is effectively free.

bintrieFlatReader.Account still does two AccountRLP lookups; the
torn-read hazard is gated by a new load-bearing invariant test,
TestBinaryHasherWritesBothBasicAndCodeHash, which asserts that
binaryHasher.updateAccount always emits both BasicData and CodeHash
leaves together. A future code-only update that broke this invariant
would fail the test.

Tests added:
  * TestBintrieFlatReaderEndToEndAfterFlush — explicitly flushes via
    tdb.Commit(root, false) and re-reads through a fresh StateReader.
    This is the smoking-gun regression for C2.
  * TestBintrieFlatReaderMultipleOffsetsPerStem — multiple offsets at
    the same stem (BasicData, CodeHash, header storage slots) all
    round-trip post-flush.
  * TestBintrieCodecCrossFlushRMW — two Flush calls to the same stem
    from different "blocks" correctly merge on disk, with prior
    offsets preserved.
  * TestBinaryHasherWritesBothBasicAndCodeHash — locks down the hasher
    co-write invariant that bintrieFlatReader.Account relies on.

Existing tests updated to match the new per-offset ReadAccount
semantics:
  * TestBintrieCodecAccountRoundTrip, TestBintrieCodecMultipleWritesSameStem,
    TestBintrieCodecDeleteAccount — now read per-offset rather than
    calling extractStemOffset on the raw blob.
  * TestBintrieCodecCacheKeysDisjoint — additionally verifies two
    offsets at the same stem produce distinct cache keys.

Error messages in bintrieFlatReader now include address and length
context (S5).

2026-04-15 15:00:40 +02:00

CPerezz

bfb77d98f6

core/state,triedb/pathdb: enable bintrie flat state reads end-to-end

Wires the pieces from Commits 1-9 into a running system:

* triedb/pathdb.New: install the bintrieFlatCodec when isVerkle is set,
  backed by the same verkle-namespaced db used for trie nodes.
* triedb/pathdb.database.go: drop isVerkle from the noBuild guard so the
  bintrie generator (Commit 9) runs on startup, and remove it from the
  generateSnapshot call path for the same reason.
* triedb/pathdb.disklayer.revert: hard-fail on bintrie because the
  reorg path would replay merkle-shaped origin records against a
  per-stem layout. Tracked in BINTRIE_FLAT_STATE_REORG_GAP.md.
* triedb/pathdb.journal: add IsBintrie to journalGenerator (rlp:"optional"
  so v3 journals still decode) and make journalProgress a method on
  generator so it stamps the active scheme; loadGenerator discards any
  journal whose scheme does not match the database, forcing a fresh
  regeneration.
* triedb/pathdb.reader: export RawStateReader, a small extension of
  database.StateReader that exposes AccountRLP so callers outside the
  package can reach the raw flat-state bytes without going through the
  slim-RLP decode path that assumes merkle shape.
* core/state.reader: add bintrieFlatReader, the bintrie equivalent of
  flatReader. It derives the EIP-7864 stem keys from (addr, slot),
  performs two AccountRLP lookups per Account call (BasicData +
  CodeHash), and decodes via bintrie.UnpackBasicData. Storage reads go
  through a single AccountRLP lookup at the slot's full bintrie key.
* core/state.database.StateReader: dispatch to bintrieFlatReader when
  the path database is in verkle mode; merkle path unchanged.

Depends on the lookup sentinel fix in the previous commit; without it
missing-account reads on bintrie misreport as "layer stale".

2026-04-15 15:00:40 +02:00

2 commits