This change introduces an iterator for the history index in the pathdb.
It provides sequential access to historical entries, enabling efficient
scanning and future features built on top of historical state traversal.
This is broken off of #31730 to only focus on testing networks that
start with verkle at genesis.
The PR has seen a lot of work since its creation, and it now targets
creating and re-executing tests for a binary tree testnet without the
transition (so it starts at genesis). The transition tree has been moved
to its own package. It also replaces verkle with the binary tree for
this specific application.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
In this PR, several changes have been made:
(a) restructure the trienode history header section
Previously, the offsets of the key and value sections were recorded before
encoding data into these sections. As a result, these offsets referred to the
start position of each chunk rather than the end position.
This caused an issue where the end position of the last chunk was
unknown, making it incompatible with the freezer partial-read APIs.
With this update, all offsets now refer to the end position, and the
start position of the first chunk is always 0.
(b) Enable partial freezer read for trienode data retrieval
The partial freezer read feature is now utilized in trienode data
retrieval, improving efficiency.
In this PR, the database batch for writing the history index data is
pre-allocated.
It's observed that database batch repeatedly grows the size of the
mega-batch,
causing significant memory allocation pressure. This approach can
effectively
mitigate the overhead.
The limit check for `MaxUint32` is done after the cast to `int`. On 64
bits machines, that will work without a problem. On 32 bits machines,
that will always fail. The compiler catches it and refuses to build.
Note that this only fixes the compiler build. ~~If the limit is above
`MaxInt32` but strictly below `MaxUint32` then this will fail at runtime
and we have another issue.~~ I checked and this should not happen during
regular execution, although it might happen in tests.
This PR implements the partial read functionalities in the freezer, optimizing
the state history reader by resolving less data from freezer.
---------
Signed-off-by: jsvisa <delweng@gmail.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
This pull request is based on #32306 , is the second part for shipping
trienode history.
Specifically, this pull request generalize the existing index mechanism,
making is usable
by both state history and trienode history in the near future.
This pull request addresses the corrupted path database with log
indicating:
`history head truncation out of range, tail: 122557, head: 212208,
target: 212557`
This is a rare edge case where the in-memory layers, including the write
buffer
in the disk layer, are fully persisted (e.g., written to file), but the
state history
freezer is not properly closed (e.g., Geth is terminated after
journaling but
before freezer.Close). In this situation, the recent state history
writes will be
truncated on the next startup, while the in-memory layers resolve
correctly.
As a result, the state history falls behind the disk layer (including
the write buffer).
In this pull request, the state history freezer is always synced before
journal,
ensuring the state history writes are always persisted before the
others.
Edit:
It's confirmed that devops team has 10s container termination setting.
It
explains why Geth didn't finish the entire termination without state
history
being closed.
https://github.com/ethpandaops/fusaka-devnets/pull/63/files
Add state size tracking and retrieve api, start geth with `--state.size-tracking`,
the initial bootstrap is required (around 1h on mainnet), after the bootstrap,
use `debug_stateSize()` RPC to retrieve the state size:
```
> debug.stateSize()
{
accountBytes: "0x39681967b",
accountTrienodeBytes: "0xc57939f0c",
accountTrienodes: "0x198b36ac",
accounts: "0x129da14a",
blockNumber: "0x1635e90",
contractCodeBytes: "0x2b63ef481",
contractCodes: "0x1c7b45",
stateRoot: "0x9c36a3ec3745d72eea8700bd27b90dcaa66de0494b187c5600750044151e620a",
storageBytes: "0x18a6e7d3f1",
storageTrienodeBytes: "0x2e7f53fae6",
storageTrienodes: "0x6e49a234",
storages: "0x517859c5"
}
```
---------
Signed-off-by: jsvisa <delweng@gmail.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
This PR is the first step in the trienode history series.
It introduces the `nodeWithOrigin` struct in the path database, which tracks
the original values of dirty nodes to support trienode history construction.
Note, the original value is always empty in this PR, so it won't break the
existing journal for encoding and decoding. The compatibility of journal
should be handled in the following PR.
### Summary
Fixes long-standing ETA calculation errors in progress indicators that
have been present since February 2021. The current implementation
produces increasingly inaccurate estimates due to integer division
precision loss.
### Problem
3aeccadd04/triedb/pathdb/history_indexer.go (L541-L553)
The ETA calculation has two critical issues:
1. **Integer division precision loss**: `speed` is calculated as
`uint64`
2. **Off-by-one**: `speed` uses `+ 1`(2 times) to avoid division by
zero, however it makes mistake in the final calculation
This results in wildly inaccurate time estimates that don't improve as
progress continues.
### Example
Current output during state history indexing:
```
lvl=info msg="Indexing state history" processed=16858580 left=41802252 elapsed=18h22m59.848s eta=11h36m42.252s
```
**Expected calculation:**
- Speed: 16858580 ÷ 66179848ms = 0.255 blocks/ms
- ETA: 41802252 ÷ 0.255 = ~45.6 hours
**Current buggy calculation:**
- Speed: rounds to 1 block/ms
- ETA: 41802252 ÷ 1 = ~11.6 hours ❌
### Solution
- Created centralized `CalculateETA()` function in common package
- Replaced all 8 duplicate code copies across the codebase
### Testing
Verified accurate ETA calculations during archive node reindexing with
significantly improved time estimates.
This pull request preserves the root->ID mappings in the path database
even after the associated state histories are truncated, regardless of
whether the truncation occurs at the head or the tail.
The motivation is to support an additional history type, trienode history.
Since the root->ID mappings are shared between two history instances,
they must not be removed by either one.
As a consequence, the root->ID mappings remain in the database even
after the corresponding histories are pruned. While these mappings may
become dangling, it is safe and cheap to keep them.
Additionally, this pull request enhances validation during historical
reader construction, ensuring that only canonical historical state will be
served.
This pull request refactors the internal implementation in path database
a bit, specifically:
- purge the state index data in batch
- simplify the logic of state history construction and index, make it more readable
This is a internal refactoring PR, renaming the history to stateHistory.
It's a pre-requisite PR for merging trienode history, avoid the name
conflict.
Seems the `signal.result` was not sent back in shorten case, this will
cause a deadlock.
---------
Signed-off-by: jsvisa <delweng@gmail.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
`binary.AppendUvarint` offers better performance than using append
directly, because it avoids unnecessary memory allocation and copying.
In our case, it can increase the performance by +35.8% for the
`blockWriter.append` function:
```
benchmark old ns/op new ns/op delta
BenchmarkBlockWriterAppend-8 5.97 3.83 -35.80%
```
---------
Signed-off-by: jsvisa <delweng@gmail.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
The implementation of `parseIndexBlock` used a reverse loop with slice
appends to build the restart points, which was less cache-friendly and
involved unnecessary allocations and operations. In this PR we change
the implementation to read and validate the restart points in one single
forward loop.
Here is the benchmark test:
```bash
go test -benchmem -bench=BenchmarkParseIndexBlock ./triedb/pathdb/
```
The result as below:
```
benchmark old ns/op new ns/op delta
BenchmarkParseIndexBlock-8 52.9 37.5 -29.05%
```
about 29% improvements
---------
Signed-off-by: jsvisa <delweng@gmail.com>
Introduce file-based state journal in path database, fixing
the Pebble restriction when the journal size exceeds 4GB.
---------
Signed-off-by: jsvisa <delweng@gmail.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Fix the issue after initial snap sync with `gcmode=archive` enabled.
```
NewPayload: inserting block failed error="history indexing is out of order, last: null, requested: 1"
```
---------
Signed-off-by: Delweng <delweng@gmail.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
This pull request tracks the state indexing progress in eth_syncing
RPC response, i.e. we will return non-null syncing status until indexing
has finished.
This pull request fixes a flaw in the PBSS state iterator, which
could return empty account or storage data.
In PBSS, multiple in-memory diff layers and a write buffer are
maintained. These layers are persisted to the database and reloaded after
node restarts. However, since the state data is encoded using RLP, the
distinction between nil and an empty byte slice is lost during the encode/decode
process. As a result, invalid state values such as `[]byte{}` can appear in PBSS
and ultimately be returned by the state iterator.
Checkout
https://github.com/ethereum/go-ethereum/blob/master/triedb/pathdb/iterator_fast.go#L270
for more iterator details.
It's a long-term existent issue and now be activated since the snapshot
integration.
The error `err="range contains deletion"` will occur when Geth tries to
serve other
peers with SNAP protocol request.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This pull request fixes a flaw in PBSS archive mode that significantly
degrades performance when the mode is enabled.
Originally, in hash mode, the dirty trie cache is completely disabled
when archive mode is active, in order to disable the in-memory garbage
collection mechanism. However, the internal logic in path mode differs
significantly, and the dirty trie node cache is essential for maintaining
chain insertion performance. Therefore, the cache is now retained in
path mode.
Previously, PathDB used a single buffer to aggregate database writes,
which needed to be flushed atomically. However, flushing large amounts
of data (e.g., 256MB) caused significant overhead, often blocking the
system for around 3 seconds during the flush.
To mitigate this overhead and reduce performance spikes, a double-buffer
mechanism is introduced. When the active buffer fills up, it is marked
as frozen and a background flushing process is triggered. Meanwhile, a
new buffer is allocated for incoming writes, allowing operations to
continue uninterrupted.
This approach reduces system blocking times and provides flexibility in
adjusting buffer parameters for improved performance.
As https://github.com/ethereum/go-ethereum/pull/31769 defined a global
hash pool, so we can reuse it, and also remove the unnecessary
KeccakState buffering
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
As the preimage will only be stored if `t.preimages != nil`, so no need
to save them into local cache if not enabled. This will reduce the memory
wasted to copy the bytes
---------
Signed-off-by: jsvisa <delweng@gmail.com>
This implements a backing store for chain history based on era1 files.
The new store is integrated with the freezer. Queries for blocks and receipts
below the current freezer tail are handled by the era store.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
Co-authored-by: lightclient <lightclient@protonmail.com>
This pull request introduces a mechanism to improve state lookup
efficiency in pathdb by maintaining a lookup structure that eliminates
unnecessary iteration over diff layers.
The core idea is to track a mutation history for each dirty state entry
residing in the diff layers. This history records the state roots of all layers
in which the entry was modified, sorted from oldest to newest.
During state lookup, this mutation history is queried to find the most
recent layer whose state root either matches the target root or is a
descendant of it. This allows us to quickly identify the layer containing
the relevant data, avoiding the need to iterate through all diff layers from
top to bottom.
Besides, the overhead for state lookup is constant, no matter how many
diff layers are retained in the pathdb, which unlocks the potential to hold
more diff layers.
Of course, maintaining this lookup structure introduces some overhead.
For each state transition, we need to:
(a) update the mutation records for the modified state entries, and
(b) remove stale mutation records associated with outdated layers.
On our benchmark machine, it will introduce around 1ms overhead which is
acceptable.
In this pull request, snapshot generation in pathdb has been ported from
the legacy state snapshot implementation. Additionally, when running in
path mode, legacy state snapshot data is now managed by the pathdb
based snapshot logic.
Note: Existing snapshot data will be re-generated, regardless of whether
it was previously fully constructed.
This PR creates a global hasher pool that can be used by all packages.
It also removes a bunch of the package local pools.
It also updates a few locations to use available hashers or the global
hashing pool to reduce allocations all over the codebase.
This change should reduce global allocation count by ~1%
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
This pull request enhances the block prefetcher by executing transactions
in parallel to warm the cache alongside the main block processor.
Unlike the original prefetcher, which only executes the next block and
is limited to chain syncing, the new implementation can be applied to any
block. This makes it useful not only during chain sync but also for regular
block insertion after the initial sync.
---------
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
This PR fixes an issue that could lead to data corruption.
Writing the state history may fail due to insufficient disk space or
other potential errors. With this change, the entire state insertion
will be aborted instead of silently ignoring the error.
Without this fix, state transitions would continue while the associated
state history is lost. After a restart, the resulting gap would be detected,
making recovery impossible.
This pull request introduces a SyncKeyValue function to the
ethdb.KeyValueStore
interface, providing the ability to forcibly flush all previous writes
to disk.
This functionality is critical for go-ethereum, which internally uses
two independent
database engines: a key-value store (such as Pebble, LevelDB, or
memoryDB for
testing) and a flat-file–based freezer. To ensure write-order
consistency between
these engines, the key-value store must be explicitly synced before
writing to the
freezer and vice versa.
Fixes
- https://github.com/ethereum/go-ethereum/issues/31405
- https://github.com/ethereum/go-ethereum/issues/29819
This PR refactors the `nodeSet` structure in the path database to use
separate maps for account and storage trie nodes, resulting in
performance improvements. The change maintains the same API while
optimizing the internal data structure.