go-ethereum

mirror of https://github.com/ethereum/go-ethereum.git synced 2026-06-13 18:31:35 +00:00

Author	SHA1	Message	Date
CPerezz	69c0028094	triedb/pathdb: replace crit panic shim with error propagation through Flush Addresses review finding I2. The bintrieFlatCodec had a crit() helper whose doc claimed "delegates to log.Crit" but whose body was panic(fmt.Sprintf(...)). A corrupt on-disk stem blob would cause the buffer flush goroutine to panic, killing the process. On restart the same blob would cause the same panic — unrecoverable crash loop. Fix: applyWrites now returns ([]byte, error) instead of panicking. The Flush method on flatStateCodec gains an error return: (int, int) → (int, int, error) The error propagates up through writeStates → stateSet.write → buffer.flush → flushErr. A corrupted stem blob now causes a flush failure that the database can react to instead of a crash loop. The per-entry methods (WriteAccount, WriteStorage, DeleteAccount, DeleteStorage) — which are NOT on the production flush path — use log.Crit (the real function, not the deleted shim) on error, matching the merkle codec's existing convention for unrecoverable corruption at the per-entry level. The crit shim is deleted entirely.	2026-04-15 15:00:41 +02:00
CPerezz	fcc0587ec3	core/state,triedb/pathdb: fix bintrieFlatReader disk-layer shape via per-offset extraction Addresses review finding C2 (+ I5, S5, T2, T3, T12). Before this commit, bintrieFlatCodec.ReadAccount returned the FULL variable-length stem blob from disk while the in-memory diff-layer buffer stored per-offset 32-byte values. The consumer, bintrieFlatReader.Account, enforced len(basicBlob)!=32 → error, so every disk-layer hit produced "bintrie BasicData leaf invalid length" in production the moment the write buffer flushed. TestBintrieFlatReaderEndToEnd did not catch this because it never forced a buffer → disk flush. Fix: make bintrieFlatCodec.ReadAccount extract the offset from the stem blob (mirroring ReadStorage), so the disk path and the buffer path return the same 32-byte per-offset shape. Update AccountCacheKey/StorageCacheKey to embed the full 32-byte key (prefix + 31-byte stem + 1-byte offset), since caching under a stem-only key would collapse BasicData and CodeHash into the same slot and return the wrong value on the second hit. Update Flush's cache-update loop to store per-offset entries from the aggregated write set. Design note: I considered the alternative of introducing a new StemBlob(stem) interface method that returns the full blob synthesized from a stem-level lookup index. Rejected because (a) the index is a new data structure with its own consistency invariants, (b) the per-offset approach is strictly local to the codec + reader, and (c) the "1 Pebble read per Account" locality benefit is preserved at the OS page cache level — both offsets at the same stem live in the same Pebble block, so the second read is effectively free. bintrieFlatReader.Account still does two AccountRLP lookups; the torn-read hazard is gated by a new load-bearing invariant test, TestBinaryHasherWritesBothBasicAndCodeHash, which asserts that binaryHasher.updateAccount always emits both BasicData and CodeHash leaves together. A future code-only update that broke this invariant would fail the test. Tests added: * TestBintrieFlatReaderEndToEndAfterFlush — explicitly flushes via tdb.Commit(root, false) and re-reads through a fresh StateReader. This is the smoking-gun regression for C2. * TestBintrieFlatReaderMultipleOffsetsPerStem — multiple offsets at the same stem (BasicData, CodeHash, header storage slots) all round-trip post-flush. * TestBintrieCodecCrossFlushRMW — two Flush calls to the same stem from different "blocks" correctly merge on disk, with prior offsets preserved. * TestBinaryHasherWritesBothBasicAndCodeHash — locks down the hasher co-write invariant that bintrieFlatReader.Account relies on. Existing tests updated to match the new per-offset ReadAccount semantics: * TestBintrieCodecAccountRoundTrip, TestBintrieCodecMultipleWritesSameStem, TestBintrieCodecDeleteAccount — now read per-offset rather than calling extractStemOffset on the raw blob. * TestBintrieCodecCacheKeysDisjoint — additionally verifies two offsets at the same stem produce distinct cache keys. Error messages in bintrieFlatReader now include address and length context (S5).	2026-04-15 15:00:40 +02:00
CPerezz	a1ff36d9e1	core/state,triedb/pathdb: wire bintrie leaves through stateUpdate Drains the binaryHasher's LeafProducer side-channel in StateDB.commit and threads the stem writes through stateUpdate.encodeBinary into the pathdb state set as per-offset accountData entries (key = stem\|\|offset, value = 32-byte leaf or nil for clears). The flat-state codec gains a Flush method that owns the in-memory→disk write path, replacing the codec-agnostic per-entry loop in writeStates. The merkle codec preserves its historical per-entry behavior verbatim; the bintrie codec aggregates per-offset writes by stem so each stem hits disk via a single read-modify-write, satisfying the codec's pre-aggregation requirement and updating the clean cache with the merged blob it just produced (no extra disk read). stateUpdate.encodeBinary returns empty origin maps for the bintrie path: state-history rollback for bintrie is deferred to a follow-up PR (see BINTRIE_FLAT_STATE_REORG_GAP.md), and the diskLayer.revert path will panic before consuming origins anyway.	2026-04-15 15:00:40 +02:00
CPerezz	437a53bbe0	triedb/pathdb: implement bintrieFlatCodec + stem blob helpers Introduce the codec and on-disk blob format for the bintrie flat-state layer. This commit only defines the types; the codec is NOT wired into pathdb.Database.New yet (that happens in a later commit once the leaf-production hook in binaryHasher and the stateUpdate wiring are in place). Three pieces: 1. trie/bintrie/pack.go Canonical PackBasicData / UnpackBasicData helpers that encode an account's (codeSize, nonce, balance) into the 32-byte BasicData leaf defined by EIP-7864. Preserves the existing BinaryTrie.UpdateAccount layout byte-for-byte (4-byte code_size at offset 4 rather than the spec's 3-byte field at offset 5 — any realistic code size has byte 4 always zero and the two encodings are bit-equivalent in practice). BinaryTrie.UpdateAccount is refactored to delegate to PackBasicData so the flat-state codec can produce a bit-identical BasicData encoding without duplicating the layout logic. 2. triedb/pathdb/stem_blob.go Packed encoding of the populated (offset, value) pairs at a bintrie stem. A stem can hold up to 256 offsets per EIP-7864 but in practice only a handful are set; the layout is a 32-byte bitmap followed by N 32-byte values in ascending offset order, where N = popcount. Empty stems encode to nil so the caller knows to delete the on-disk key rather than write a zero-length value. Provides encodeStemBlob / decodeStemBlob / extractStemOffset / mergeStemBlob and a stemBuilder type for accumulating writes. The tombstone convention (32 zero bytes = "present with zero" as used by DeleteStorage) is preserved. 11 unit tests cover: empty blob, BasicData+CodeHash roundtrip, all 256 offsets populated, sparse high offsets, set/clear roundtrip, load-from-existing-blob RMW, merge helper, merge-to-empty, tombstone zero bytes, malformed input detection, bitmap rank sanity. 3. triedb/pathdb/flat_codec_bintrie.go bintrieFlatCodec implements flatStateCodec over the stem-blob layout. Unlike merkleFlatCodec it is stateful: it holds a ethdb.KeyValueReader reference used by applyWrites to read the existing stem blob before merging in new writes. ethdb.Batch is write-only so the batch passed to Write* cannot be used to fetch current state. Pre-aggregation requirement is documented explicitly: within a single flush, the caller must NOT issue two Write* calls targeting the same stem, because the RMW read comes from the store (not the in-flight batch). Commit 8 of the bintrie flat-state plan restructures writeStates to pre-aggregate per-stem writes so callers don't have to handle this manually. Cache keys are prefix-disambiguated with a one-byte 0x01 to keep bintrie stem lookups disjoint from merkle 32-byte account keys and 64-byte storage keys in the shared clean-state fastcache. SplitMarker is a single-tier (stem-only) format, not the merkle two-tier (account, account+storage) format. 7 unit tests cover: account roundtrip, storage roundtrip, multiple writes to the same stem, DeleteAccount preserving unrelated offsets, DeleteStorage removing the final offset collapsing the key, cache key disjointness from merkle, SplitMarker semantics. The codec is not dispatched by anything yet; MPT continues through the merkle codec and bintrie mode still runs on the (soon-to-be-replaced) keccak-shaped path until Commit 10 wires things up.	2026-04-15 15:00:40 +02:00

4 commits