go-ethereum

mirror of https://github.com/ethereum/go-ethereum.git synced 2026-06-12 09:51:36 +00:00

Author	SHA1	Message	Date
CPerezz	bfb77d98f6	core/state,triedb/pathdb: enable bintrie flat state reads end-to-end Wires the pieces from Commits 1-9 into a running system: * triedb/pathdb.New: install the bintrieFlatCodec when isVerkle is set, backed by the same verkle-namespaced db used for trie nodes. * triedb/pathdb.database.go: drop isVerkle from the noBuild guard so the bintrie generator (Commit 9) runs on startup, and remove it from the generateSnapshot call path for the same reason. * triedb/pathdb.disklayer.revert: hard-fail on bintrie because the reorg path would replay merkle-shaped origin records against a per-stem layout. Tracked in BINTRIE_FLAT_STATE_REORG_GAP.md. * triedb/pathdb.journal: add IsBintrie to journalGenerator (rlp:"optional" so v3 journals still decode) and make journalProgress a method on generator so it stamps the active scheme; loadGenerator discards any journal whose scheme does not match the database, forcing a fresh regeneration. * triedb/pathdb.reader: export RawStateReader, a small extension of database.StateReader that exposes AccountRLP so callers outside the package can reach the raw flat-state bytes without going through the slim-RLP decode path that assumes merkle shape. * core/state.reader: add bintrieFlatReader, the bintrie equivalent of flatReader. It derives the EIP-7864 stem keys from (addr, slot), performs two AccountRLP lookups per Account call (BasicData + CodeHash), and decodes via bintrie.UnpackBasicData. Storage reads go through a single AccountRLP lookup at the slot's full bintrie key. * core/state.database.StateReader: dispatch to bintrieFlatReader when the path database is in verkle mode; merkle path unchanged. Depends on the lookup sentinel fix in the previous commit; without it missing-account reads on bintrie misreport as "layer stale".	2026-04-15 15:00:40 +02:00
CPerezz	0508d40aaf	triedb/pathdb: bintrie snapshot generator Adds generateBinTrieStems, the bintrie analogue of generateAccounts. It opens the bintrie via a sha256-aware bintrieDiskStore (the merkle disk store would always fail root validation against a binary node), iterates all leaves with binaryNodeIterator, aggregates them into per-stem builders, and emits one stem blob per stem boundary. Resume support is structural: ctx.marker is fed straight to the trie's NodeIterator, which uses binaryNodeIterator.seek (Commit 1) to position on the first leaf >= marker. Range proofs are deliberately skipped — the bintrie's Prove path is unimplemented and an iteration-only generation cycle is acceptable for a one-time startup cost. A bintrieGeneratorContext mirrors generatorContext but is much smaller: no holdable iterators (we walk the trie, not the existing flat state) and no two-tier marker (the bintrie key space is unified). checkAndFlushBin journals progress as a single 32-byte (stem \|\| offset) key so resume can pick up mid-stem. generator.run dispatches on codec type so callers see a uniform lifecycle whether the underlying scheme is merkle or bintrie.	2026-04-15 15:00:40 +02:00
CPerezz	a1ff36d9e1	core/state,triedb/pathdb: wire bintrie leaves through stateUpdate Drains the binaryHasher's LeafProducer side-channel in StateDB.commit and threads the stem writes through stateUpdate.encodeBinary into the pathdb state set as per-offset accountData entries (key = stem\|\|offset, value = 32-byte leaf or nil for clears). The flat-state codec gains a Flush method that owns the in-memory→disk write path, replacing the codec-agnostic per-entry loop in writeStates. The merkle codec preserves its historical per-entry behavior verbatim; the bintrie codec aggregates per-offset writes by stem so each stem hits disk via a single read-modify-write, satisfying the codec's pre-aggregation requirement and updating the clean cache with the merged blob it just produced (no extra disk read). stateUpdate.encodeBinary returns empty origin maps for the bintrie path: state-history rollback for bintrie is deferred to a follow-up PR (see BINTRIE_FLAT_STATE_REORG_GAP.md), and the diskLayer.revert path will panic before consuming origins anyway.	2026-04-15 15:00:40 +02:00
CPerezz	437a53bbe0	triedb/pathdb: implement bintrieFlatCodec + stem blob helpers Introduce the codec and on-disk blob format for the bintrie flat-state layer. This commit only defines the types; the codec is NOT wired into pathdb.Database.New yet (that happens in a later commit once the leaf-production hook in binaryHasher and the stateUpdate wiring are in place). Three pieces: 1. trie/bintrie/pack.go Canonical PackBasicData / UnpackBasicData helpers that encode an account's (codeSize, nonce, balance) into the 32-byte BasicData leaf defined by EIP-7864. Preserves the existing BinaryTrie.UpdateAccount layout byte-for-byte (4-byte code_size at offset 4 rather than the spec's 3-byte field at offset 5 — any realistic code size has byte 4 always zero and the two encodings are bit-equivalent in practice). BinaryTrie.UpdateAccount is refactored to delegate to PackBasicData so the flat-state codec can produce a bit-identical BasicData encoding without duplicating the layout logic. 2. triedb/pathdb/stem_blob.go Packed encoding of the populated (offset, value) pairs at a bintrie stem. A stem can hold up to 256 offsets per EIP-7864 but in practice only a handful are set; the layout is a 32-byte bitmap followed by N 32-byte values in ascending offset order, where N = popcount. Empty stems encode to nil so the caller knows to delete the on-disk key rather than write a zero-length value. Provides encodeStemBlob / decodeStemBlob / extractStemOffset / mergeStemBlob and a stemBuilder type for accumulating writes. The tombstone convention (32 zero bytes = "present with zero" as used by DeleteStorage) is preserved. 11 unit tests cover: empty blob, BasicData+CodeHash roundtrip, all 256 offsets populated, sparse high offsets, set/clear roundtrip, load-from-existing-blob RMW, merge helper, merge-to-empty, tombstone zero bytes, malformed input detection, bitmap rank sanity. 3. triedb/pathdb/flat_codec_bintrie.go bintrieFlatCodec implements flatStateCodec over the stem-blob layout. Unlike merkleFlatCodec it is stateful: it holds a ethdb.KeyValueReader reference used by applyWrites to read the existing stem blob before merging in new writes. ethdb.Batch is write-only so the batch passed to Write* cannot be used to fetch current state. Pre-aggregation requirement is documented explicitly: within a single flush, the caller must NOT issue two Write* calls targeting the same stem, because the RMW read comes from the store (not the in-flight batch). Commit 8 of the bintrie flat-state plan restructures writeStates to pre-aggregate per-stem writes so callers don't have to handle this manually. Cache keys are prefix-disambiguated with a one-byte 0x01 to keep bintrie stem lookups disjoint from merkle 32-byte account keys and 64-byte storage keys in the shared clean-state fastcache. SplitMarker is a single-tier (stem-only) format, not the merkle two-tier (account, account+storage) format. 7 unit tests cover: account roundtrip, storage roundtrip, multiple writes to the same stem, DeleteAccount preserving unrelated offsets, DeleteStorage removing the final offset collapsing the key, cache key disjointness from merkle, SplitMarker semantics. The codec is not dispatched by anything yet; MPT continues through the merkle codec and bintrie mode still runs on the (soon-to-be-replaced) keccak-shaped path until Commit 10 wires things up.	2026-04-15 15:00:40 +02:00
CPerezz	0fb4d9226b	triedb/pathdb: bump journal version to 4 Reserve journal version 4 for the upcoming bintrie flat-state layout (per-stem blobs). Bumping now — with no on-disk format change yet — ensures that any v3 journals belonging to a bintrie database are discarded on load, so the new layout can be introduced cleanly in follow-up commits without a migration shim. MPT behavior is unchanged at this point: the only codec wired to the pathdb Database is still merkleFlatCodec. All pathdb, core/state, core/rawdb, and trie tests pass.	2026-04-15 15:00:40 +02:00
CPerezz	f1d7143afa	triedb/pathdb: thread flatStateCodec through internals Route the flatStateCodec from Database through every flat-state call site so that the trie-specific aspects of persistence and key derivation live behind a single abstraction. Pure refactor: merkle behavior and on-disk layout are unchanged because the only codec wired up is merkleFlatCodec, whose methods are thin wrappers over the existing rawdb accessors. Threaded sites: disklayer.account/storage use codec.{Read,AccountCacheKey, StorageCacheKey} instead of direct rawdb calls and bare hash slicing. flush.writeStates takes a codec parameter; persistence goes through codec.{Write,Delete} {Account,Storage}. buffer.flush carries the codec down into writeStates. states.write/dbsize takes the codec for prefix-size accounting. generate.go (g.codec) the generator owns a codec, used by generateAccounts/generateStorages callbacks; the unused top-level splitMarker helper is removed in favor of codec.SplitMarker. context.go the generator context owns the codec and uses codec.{AccountPrefix, StoragePrefix,Account/StorageKeyLength} to construct iterators. reader.go (HistoricalState) uses codec.{Account,Storage}Key for caller-side key derivation. The marker comparisons in writeStates remain merkle-shaped (two-tier account+storage marker) because the bintrie path will use a separate writer over single-tier stem markers in a later commit. All existing pathdb tests pass.	2026-04-15 15:00:39 +02:00
CPerezz	eaf5523a5a	triedb/pathdb: introduce flatStateCodec abstraction Introduce flatStateCodec, a small interface that captures the trie-specific aspects of flat-state storage: key derivation from (address, slot), persistence of account/storage entries, clean-cache key disambiguation, iterator setup, and progress-marker handling. Mirrors the existing nodeHasher pattern and complements the Hasher interface from state-hasher-iface-2 (which abstracts trie-side hashing and commit). The codec is stored on Database alongside the existing hasher field, ready to be threaded through the flat-state call sites (disklayer, flush, generator, reader) in the next commit. Provides merkleFlatCodec, a thin wrapper over the existing rawdb snapshot accessors and helpers. This is a pure refactor: behavior is unchanged. The bintrie-side codec implementation is added in a later commit, after all call sites have been routed through the abstraction.	2026-04-15 15:00:39 +02:00
CPerezz	3772bb536a	triedb/pathdb: fix lookup sentinel collision with zero disk layer root (#34680 )	2026-04-09 13:39:38 +08:00
Diego López León	52b8c09fdf	triedb/pathdb: skip duplicate-root layer insertion (#34642 ) PathDB keys diff layers by state root, not by block hash. That means a side-chain block can legitimately collide with an existing canonical diff layer when both blocks produce the same post-state (for example same parent, same coinbase, no txs). Today `layerTree.add` blindly inserts that second layer. If the root already exists, this overwrites `tree.layers[root]` and appends the same root to the mutation lookup again. Later account/storage lookups resolve that root to the wrong diff layer, which can corrupt reads for descendant canonical states. At runtime, the corruption is silent: no error is logged and no invariant check fires. State reads against affected descendants simply return stale data from the wrong diff layer (for example, an account balance that reflects one fewer block reward), which can propagate into RPC responses and block validation. This change makes duplicate-root inserts idempotent. A second layer with the same state root does not add any new retrievable state to a tree that is already keyed by root; keeping the original layer preserves the existing parent chain and avoids polluting the lookup history with duplicate roots. The regression test imports a canonical chain of two layers followed by a fork layer at height 1 with the same state root but a different block hash. Before the fix, account and storage lookups at the head resolve the fork layer instead of the canonical one. After the fix, the duplicate insert is skipped and lookups remain correct.	2026-04-07 21:31:41 +08:00
Jonny Rhea	bd6530a1d4	triedb, triedb/internal, triedb/pathdb: add GenerateTrie + extract shared pipeline into triedb/internal (#34654 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details This PR adds `GenerateTrie(db, scheme, root)` to the `triedb` package, which rebuilds all tries from flat snapshot KV data. This is needed by snap/2 sync so it can rebuild the trie after downloading the flat state. The shared trie generation pipeline from `pathdb/verifier.go` was moved into `triedb/internal/conversion.go` so both `GenerateTrie` and `VerifyState` reuse the same code.	2026-04-07 14:36:53 +08:00
rjl493456442	d8cb8a962b	core, eth, ethclient, triedb: report trienode index progress (#34633 ) Some checks failed / Linux Build (push) Has been cancelled Details / Linux Build (arm) (push) Has been cancelled Details / Keeper Build (push) Has been cancelled Details / Windows Build (push) Has been cancelled Details / Docker Image (push) Has been cancelled Details The trienode history indexing progress is also exposed via an RPC endpoint and contributes to the eth_syncing status.	2026-04-04 21:00:07 +08:00
rjl493456442	db6c7d06a2	triedb/pathdb: implement history index pruner (#33999 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details This PR implements the missing functionality for archive nodes by pruning stale index data. The current mechanism is relatively simple but sufficient for now: it periodically iterates over index entries and deletes outdated data on a per-block basis. The pruning process is triggered every 90,000 new blocks (approximately every 12 days), and the iteration typically takes ~30 minutes on a mainnet node. This mechanism is only applied with `gcmode=archive` enabled, having no impact on normal full node.	2026-04-02 00:21:58 +02:00
rjl493456442	9b2ce121dc	triedb/pathdb: enhance history index initer (#33640 ) This PR improves the pbss archive mode. Initial sync of an archive mode which has the --gcmode archive flag enabled will be significantly sped up. It achieves that with the following changes: The indexer now attempts to process histories in batch whenever possible. Batch indexing is enforced when the node is still syncing and the local chain head is behind the network chain head. In this scenario, instead of scheduling indexing frequently alongside block insertion, the indexer waits until a sufficient amount of history has accumulated and then processes it in a batch, which is significantly more efficient. --------- Co-authored-by: Sina M <1591639+s1na@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-17 15:29:30 +01:00
rjl493456442	7d13acd030	core/rawdb, triedb/pathdb: enable trienode history alongside existing data (#33934 ) Fixes https://github.com/ethereum/go-ethereum/issues/33907 Notably there is a behavioral change: - Previously Geth will refuse to restart if the existing trienode history is gapped with the state data - With this PR, the gapped trienode history will be entirely reset and being constructed from scratch	2026-03-12 09:21:54 +08:00
rjl493456442	dd202d4283	core, ethdb, triedb: add batch close (#33708 ) Pebble maintains a batch pool to recycle the batch object. Unfortunately batch object must be explicitly returned via `batch.Close` function. This PR extends the batch interface by adding the close function and also invoke batch.Close in some critical code paths. Memory allocation must be measured before merging this change. What's more, it's an open question that whether we should apply batch.Close as much as possible in every invocation.	2026-03-04 11:17:47 +01:00
sashass1315	919b238c82	triedb/pathdb: return nodeLoc by value to avoid heap allocation (#33819 )	2026-02-11 22:14:43 +08:00
0xFloki	a951aacb70	triedb/pathdb: preallocate slices in encode methods (#33736 ) Preallocates slices with known capacity in `stateSet.encode()` and `StateSetWithOrigin.encode()` methods to eliminate redundant reallocations during serialization.	2026-02-02 15:27:37 +08:00
alex017	cb97c48cb6	triedb/pathdb: preallocate slices in decodeRestartTrailer (#33715 ) Some checks failed / Linux Build (arm) (push) Has been cancelled Details / Keeper Build (push) Has been cancelled Details / Windows Build (push) Has been cancelled Details / Linux Build (push) Has been cancelled Details / Docker Image (push) Has been cancelled Details Preallocate capacity for `keyOffsets` and `valOffsets` slices in `decodeRestartTrailer` since the exact size (`nRestarts`) is known upfront. --------- Co-authored-by: rjl493456442 <garyrong0905@gmail.com>	2026-01-30 21:14:15 +08:00
rjl493456442	181a3ae9e0	triedb/pathdb: improve trienode reader for searching (#33681 ) Some checks are pending / Docker Image (push) Waiting to run Details / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details This PR optimizes the historical trie node reader by reworking how data is accessed and memory is managed, reducing allocation overhead significantly. Specifically: - Instead of decoding an entire history object to locate a specific trie node, the reader now searches directly within the history. - Besides, slice pre-allocation can avoid unnecessary deep-copy significantly.	2026-01-27 20:05:35 +08:00
rjl493456442	1022c7637d	core, eth, internal, triedb/pathdb: enable eth_getProofs for history (#32727 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details This PR enables the `eth_getProofs ` endpoint against the historical states.	2026-01-22 09:19:27 +08:00
cui	d0af257aa2	triedb/pathdb: double check the list availability before regeneration (#33622 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details Co-authored-by: rjl493456442 <garyrong0905@gmail.com>	2026-01-19 20:45:31 +08:00
rjl493456442	add1890a57	triedb/pathdb: enable trienode history (#32621 ) It's the part-4 for trienode history. The trienode history persistence has been enabled with this PR by flag `history.trienode <non-negative-number>`	2026-01-17 21:23:48 +08:00
rjl493456442	588dd94aad	triedb/pathdb: implement trienode history indexing scheme (#33551 ) This PR implements the indexing scheme for trie node history. Check https://github.com/ethereum/go-ethereum/pull/33399 for more details	2026-01-17 20:28:37 +08:00
rjl493456442	494908a852	triedb/pathdb: change the bitmap to big endian (#33584 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details The bitmap is used in compact-encoded trie nodes to indicate which elements have been modified. The bitmap format has been updated to use big-endian encoding. Bit positions are numbered from 0 to 15, where position 0 corresponds to the most significant bit of b[0], and position 15 corresponds to the least significant bit of b[1].	2026-01-15 17:28:57 +08:00
rjl493456442	f51870e40e	rlp, trie, triedb/pathdb: compress trienode history (#32913 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details This pull request introduces a mechanism to compress trienode history by storing only the node diffs between consecutive versions. - For full nodes, only the modified children are recorded in the history; - For short nodes, only the modified value is stored; If the node type has changed, or if the node is newly created or deleted, the entire node value is stored instead. To mitigate the overhead of reassembling nodes from diffs during history reads, checkpoints are introduced by periodically storing full node values. The current checkpoint interval is set to every 16 mutations, though this parameter may be made configurable in the future.	2026-01-08 21:58:02 +08:00
rjl493456442	d5efd34010	triedb/pathdb: introduce extension to history index structure (#33399 ) It's a PR based on #33303 and introduces an approach for trienode history indexing. --- In the current archive node design, resolving a historical trie node at a specific block involves the following steps: - Look up the corresponding trie node index and locate the first entry whose state ID is greater than the target state ID. - Resolve the trie node from the associated trienode history object. A naive approach would be to store mutation records for every trie node, similar to how flat state mutations are recorded. However, the total number of trie nodes is extremely large (approximately 2.4 billion), and the vast majority of them are rarely modified. Creating an index entry for each individual trie node would be very wasteful in both storage and indexing overhead. To address this, we aggregate multiple trie nodes into chunks and index mutations at the chunk level instead. --- For a storage trie, the trie is vertically partitioned into multiple sub tries, each spanning three consecutive levels. The top three levels (1 + 16 + 256 nodes) form the first chunk, and every subsequent three-level segment forms another chunk. ``` Original trie structure Level 0 [ ROOT ] 1 node Level 1 [0] [1] [2] ... [f] 16 nodes Level 2 [00] [01] ... [0f] [10] ... [ff] 256 nodes Level 3 [000] [001] ... [00f] [010] ... [fff] 4096 nodes Level 4 [0000] ... [000f] [0010] ... [001f] ... [ffff] 65536 nodes Vertical split into chunks (3 levels per chunk) Level0 [ ROOT ] 1 chunk Level3 [000] ... [fff] 4096 chunks Level6 [000000] ... [fffffff] 16777216 chunks ``` Within each chunk, there are 273 nodes in total, regardless of the chunk's depth in the trie. ``` Level 0 [ 0 ] 1 node Level 1 [ 1 ] … [ 16 ] 16 nodes Level 2 [ 17 ] … … [ 272 ] 256 nodes ``` Each chunk is uniquely identified by the path prefix of the root node of its corresponding sub-trie. Within a chunk, nodes are identified by a numeric index ranging from 0 to 272. For example, suppose that at block 100, the nodes with paths `[]`, `[0]`, `[f]`, `[00]`, and `[ff]` are modified. The mutation record for chunk 0 is then appended with the following entry: `[100 → [0, 1, 16, 17, 272]]`, `272` is the numeric ID of path `[ff]`. Furthermore, due to the structural properties of the Merkle Patricia Trie, if a child node is modified, all of its ancestors along the same path must also be updated. As a result, in the above example, recording mutations for nodes `00` and `ff` alone is sufficient, as this implicitly indicates that their ancestor nodes `[]`, `[0]` and `[f]` were also modified at block 100. --- Query processing is slightly more complicated. Since trie nodes are indexed at the chunk level, each individual trie node lookup requires an additional filtering step to ensure that a given mutation record actually corresponds to the target trie node. As mentioned earlier, mutation records store only the numeric identifiers of leaf nodes, while ancestor nodes are omitted for storage efficiency. Consequently, when querying an ancestor node, additional checks are required to determine whether the mutation record implicitly represents a modification to that ancestor. Moreover, since trie nodes are indexed at the chunk level, some trie nodes may be updated frequently, causing their mutation records to dominate the index. Queries targeting rarely modified trie nodes would then scan a large amount of irrelevant index data, significantly degrading performance. To address this issue, a bitmap is introduced for each index block and stored in the chunk's metadata. Before loading a specific index block, the bitmap is checked to determine whether the block contains mutation records relevant to the target trie node. If the bitmap indicates that the block does not contain such records, the block is skipped entirely.	2026-01-08 09:57:35 +01:00
rjl493456442	b3e7d9ee44	triedb/pathdb: optimize history indexing efficiency (#33303 ) This pull request optimizes history indexing by splitting a single large database batch into multiple smaller chunks. Originally, the indexer will resolve a batch of state histories and commit all corresponding index entries atomically together with the indexing marker. While indexing more state histories in a single batch improves efficiency, excessively large batches can cause significant memory issues. To mitigate this, the pull request splits the mega-batch into several smaller batches and flushes them independently during indexing. However, this introduces a potential inconsistency that some index entries may be flushed while the indexing marker is not, and an unclean shutdown may leave the database in a partially updated state. This can corrupt index data. To address this, head truncation is introduced. After a restart, any excessive index entries beyond the expected indexing marker are removed, ensuring the index remains consistent after an unclean shutdown.	2025-12-30 16:05:13 +01:00
rjl493456442	bf141fbfb1	core, eth: add lock protection in snap sync (#33428 ) Fixes #33396, #33397, #33398	2025-12-19 09:36:48 +01:00
Delweng	1b702f71d9	triedb/pathdb: use copy instead of append to reduce memory alloc (#33044 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details	2025-12-11 09:37:16 +08:00
Forostovec	6f2cbb7a27	triedb/pathdb: allow single-element history ranges (#33329 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details	2025-12-01 10:19:21 +08:00
rjl493456442	960c87a944	triedb/pathdb: implement iterator of history index (#32981 ) This change introduces an iterator for the history index in the pathdb. It provides sequential access to historical entries, enabling efficient scanning and future features built on top of historical state traversal.	2025-11-26 16:07:16 +08:00
Guillaume Ballet	2a2f106a01	cmd/evm/internal/t8ntool, trie: support for verkle-at-genesis, use UBT, and move the transition tree to its own package (#32445 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details This is broken off of #31730 to only focus on testing networks that start with verkle at genesis. The PR has seen a lot of work since its creation, and it now targets creating and re-executing tests for a binary tree testnet without the transition (so it starts at genesis). The transition tree has been moved to its own package. It also replaces verkle with the binary tree for this specific application. --------- Co-authored-by: Gary Rong <garyrong0905@gmail.com>	2025-11-14 15:25:30 +01:00
Forostovec	eb8f32588b	triedb/pathdb: fix ID assignment in history inspection (#33103 )	2025-11-13 14:51:41 +08:00
Delweng	d2a5dba48f	triedb/pathdb: fix 32-bit integer overflow in history trienode decoder (#33098 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details failed in 32bit: ``` --- FAIL: TestDecodeSingleCorruptedData (0.00s) panic: runtime error: slice bounds out of range [:-1501805520] [recovered, repanicked] goroutine 38872 [running]: testing.tRunner.func1.2({0x838db20, 0xa355620}) /opt/actions-runner/_work/_tool/go/1.25.3/x64/src/testing/testing.go:1872 +0x29b testing.tRunner.func1() /opt/actions-runner/_work/_tool/go/1.25.3/x64/src/testing/testing.go:1875 +0x414 panic({0x838db20, 0xa355620}) /opt/actions-runner/_work/_tool/go/1.25.3/x64/src/runtime/panic.go:783 +0x103 github.com/ethereum/go-ethereum/triedb/pathdb.decodeSingle({0x9e57500, 0x1432, 0x1432}, 0x0) /opt/actions-runner/_work/go-ethereum/go-ethereum/triedb/pathdb/history_trienode.go:399 +0x18d6 github.com/ethereum/go-ethereum/triedb/pathdb.TestDecodeSingleCorruptedData(0xa2db9e8) /opt/actions-runner/_work/go-ethereum/go-ethereum/triedb/pathdb/history_trienode_test.go:698 +0x180 testing.tRunner(0xa2db9e8, 0x83c86e8) /opt/actions-runner/_work/_tool/go/1.25.3/x64/src/testing/testing.go:1934 +0x114 created by testing.(*T).Run in goroutine 1 /opt/actions-runner/_work/_tool/go/1.25.3/x64/src/testing/testing.go:1997 +0x4b4 FAIL github.com/ethereum/go-ethereum/triedb/pathdb 41.453s ? github.com/ethereum/go-ethereum/version [no test files] FAIL ``` Found in https://github.com/ethereum/go-ethereum/actions/runs/18912701345/job/53990136071?pr=33052	2025-11-07 23:06:15 +01:00
rjl493456442	cfa3b96103	core/rawdb, triedb/pathdb: re-structure the trienode history header (#32907 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Keeper Build (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details In this PR, several changes have been made: (a) restructure the trienode history header section Previously, the offsets of the key and value sections were recorded before encoding data into these sections. As a result, these offsets referred to the start position of each chunk rather than the end position. This caused an issue where the end position of the last chunk was unknown, making it incompatible with the freezer partial-read APIs. With this update, all offsets now refer to the end position, and the start position of the first chunk is always 0. (b) Enable partial freezer read for trienode data retrieval The partial freezer read feature is now utilized in trienode data retrieval, improving efficiency.	2025-10-25 16:16:16 +08:00
rjl493456442	0a8b820725	triedb/pathdb: make batch with pre-allocated size (#32914 ) In this PR, the database batch for writing the history index data is pre-allocated. It's observed that database batch repeatedly grows the size of the mega-batch, causing significant memory allocation pressure. This approach can effectively mitigate the overhead.	2025-10-21 13:11:36 +02:00
hero5512	11c0fb98af	triedb/pathdb: fix index out of range panic in decodeSingle (#32937 ) Fixes TestCorruptedKeySection flaky test failure. https://github.com/ethereum/go-ethereum/actions/runs/18600235182/job/53037084761?pr=32920	2025-10-20 10:29:46 +08:00
Guillaume Ballet	52c484de86	triedb/pathdb: catch int conversion overflow in 32-bit (#32899 ) The limit check for `MaxUint32` is done after the cast to `int`. On 64 bits machines, that will work without a problem. On 32 bits machines, that will always fail. The compiler catches it and refuses to build. Note that this only fixes the compiler build. ~~If the limit is above `MaxInt32` but strictly below `MaxUint32` then this will fail at runtime and we have another issue.~~ I checked and this should not happen during regular execution, although it might happen in tests.	2025-10-14 09:23:05 +08:00
Delweng	a7359ceb69	triedb, core/rawdb: implement the partial read in freezer (#32132 ) This PR implements the partial read functionalities in the freezer, optimizing the state history reader by resolving less data from freezer. --------- Signed-off-by: jsvisa <delweng@gmail.com> Co-authored-by: Gary Rong <garyrong0905@gmail.com>	2025-10-13 19:40:03 +08:00
rjl493456442	de24450dbf	core/rawdb, triedb/pathdb: introduce trienode history (#32596 ) It's a pull request based on the #32523 , implementing the structure of trienode history.	2025-10-10 14:51:27 +08:00
rjl493456442	ada2db4304	triedb/pathdb: move head truncation log (#32649 ) Print the `Truncating from head` log only if head truncation is needed.	2025-09-22 14:45:15 +08:00
rjl493456442	21769f3474	triedb/pathdb: generalize the history indexer (#32523 ) This pull request is based on #32306 , is the second part for shipping trienode history. Specifically, this pull request generalize the existing index mechanism, making is usable by both state history and trienode history in the near future.	2025-09-17 15:57:16 +02:00
rjl493456442	ca6e2d141b	triedb/pathdb: sync ancient store before journal (#32557 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details This pull request addresses the corrupted path database with log indicating: `history head truncation out of range, tail: 122557, head: 212208, target: 212557` This is a rare edge case where the in-memory layers, including the write buffer in the disk layer, are fully persisted (e.g., written to file), but the state history freezer is not properly closed (e.g., Geth is terminated after journaling but before freezer.Close). In this situation, the recent state history writes will be truncated on the next startup, while the in-memory layers resolve correctly. As a result, the state history falls behind the disk layer (including the write buffer). In this pull request, the state history freezer is always synced before journal, ensuring the state history writes are always persisted before the others. Edit: It's confirmed that devops team has 10s container termination setting. It explains why Geth didn't finish the entire termination without state history being closed. https://github.com/ethpandaops/fusaka-devnets/pull/63/files	2025-09-09 14:39:54 +02:00
rjl493456442	bc4ee71a5d	triedb/pathdb: add recovery mechanism in state indexer (#32447 ) Alternative of #32335, enhancing the history indexer recovery after unclean shutdown.	2025-09-08 16:07:00 +08:00
Delweng	c4ec4504bb	core/state: state size tracking (#32362 ) Add state size tracking and retrieve api, start geth with `--state.size-tracking`, the initial bootstrap is required (around 1h on mainnet), after the bootstrap, use `debug_stateSize()` RPC to retrieve the state size: ``` > debug.stateSize() { accountBytes: "0x39681967b", accountTrienodeBytes: "0xc57939f0c", accountTrienodes: "0x198b36ac", accounts: "0x129da14a", blockNumber: "0x1635e90", contractCodeBytes: "0x2b63ef481", contractCodes: "0x1c7b45", stateRoot: "0x9c36a3ec3745d72eea8700bd27b90dcaa66de0494b187c5600750044151e620a", storageBytes: "0x18a6e7d3f1", storageTrienodeBytes: "0x2e7f53fae6", storageTrienodes: "0x6e49a234", storages: "0x517859c5" } ``` --------- Signed-off-by: jsvisa <delweng@gmail.com> Co-authored-by: Gary Rong <garyrong0905@gmail.com>	2025-09-08 14:00:23 +08:00
rjl493456442	902ec5baae	cmd, core, eth, triedb/pathdb: track node origins in the path database (#32418 ) This PR is the first step in the trienode history series. It introduces the `nodeWithOrigin` struct in the path database, which tracks the original values of dirty nodes to support trienode history construction. Note, the original value is always empty in this PR, so it won't break the existing journal for encoding and decoding. The compatibility of journal should be handled in the following PR.	2025-09-05 10:37:05 +08:00
Mars	0e69530c6e	all: improve ETA calculation across all progress indicators (#32521 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details ### Summary Fixes long-standing ETA calculation errors in progress indicators that have been present since February 2021. The current implementation produces increasingly inaccurate estimates due to integer division precision loss. ### Problem `3aeccadd04/triedb/pathdb/history_indexer.go (L541-L553)` The ETA calculation has two critical issues: 1. Integer division precision loss: `speed` is calculated as `uint64` 2. Off-by-one: `speed` uses `+ 1`(2 times) to avoid division by zero, however it makes mistake in the final calculation This results in wildly inaccurate time estimates that don't improve as progress continues. ### Example Current output during state history indexing: ``` lvl=info msg="Indexing state history" processed=16858580 left=41802252 elapsed=18h22m59.848s eta=11h36m42.252s ``` Expected calculation: - Speed: 16858580 ÷ 66179848ms = 0.255 blocks/ms - ETA: 41802252 ÷ 0.255 = ~45.6 hours Current buggy calculation: - Speed: rounds to 1 block/ms - ETA: 41802252 ÷ 1 = ~11.6 hours ❌ ### Solution - Created centralized `CalculateETA()` function in common package - Replaced all 8 duplicate code copies across the codebase ### Testing Verified accurate ETA calculations during archive node reindexing with significantly improved time estimates.	2025-09-01 13:47:02 +08:00
rjl493456442	7f78fa6912	triedb/pathdb, core: keep root->id mappings after truncation (#32502 ) This pull request preserves the root->ID mappings in the path database even after the associated state histories are truncated, regardless of whether the truncation occurs at the head or the tail. The motivation is to support an additional history type, trienode history. Since the root->ID mappings are shared between two history instances, they must not be removed by either one. As a consequence, the root->ID mappings remain in the database even after the corresponding histories are pruned. While these mappings may become dangling, it is safe and cheap to keep them. Additionally, this pull request enhances validation during historical reader construction, ensuring that only canonical historical state will be served.	2025-08-29 15:43:58 +08:00
Zach Brown	2a795c14f4	all: fix problematic function name in comment (#32513 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Windows Build (push) Waiting to run Details / Docker Image (push) Waiting to run Details Fix problematic function name in comment. Do my best to correct them all with a script to avoid spamming PRs.	2025-08-29 08:54:23 +08:00
rjl493456442	95ab643bb8	triedb/pathdb: refactor state history write (#32497 ) This pull request refactors the internal implementation in path database a bit, specifically: - purge the state index data in batch - simplify the logic of state history construction and index, make it more readable	2025-08-26 21:53:55 +08:00

1 2 3

119 commits