Commit graph

6 commits

Author SHA1 Message Date
rjl493456442
36bcc24fbe
triedb/pathdb: fix journal resolution in pathdb (#32097)
This pull request fixes a flaw in the PBSS state iterator, which
could return empty account or storage data.

In PBSS, multiple in-memory diff layers and a write buffer are
maintained. These layers are persisted to the database and reloaded after
node restarts. However, since the state data is encoded using RLP, the
distinction between nil and an empty byte slice is lost during the encode/decode
process. As a result, invalid state values such as `[]byte{}` can appear in PBSS
and ultimately be returned by the state iterator.


Checkout
https://github.com/ethereum/go-ethereum/blob/master/triedb/pathdb/iterator_fast.go#L270
for more iterator details.

It's a long-term existent issue and now be activated since the snapshot
integration.
The error `err="range contains deletion"` will occur when Geth tries to
serve other
peers with SNAP protocol request.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-06-26 17:19:02 +02:00
rjl493456442
21920207e4
triedb/pathdb, eth: use double-buffer mechanism in pathdb (#30464)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Docker Image (push) Waiting to run
Previously, PathDB used a single buffer to aggregate database writes,
which needed to be flushed atomically. However, flushing large amounts
of data (e.g., 256MB) caused significant overhead, often blocking the
system for around 3 seconds during the flush.

To mitigate this overhead and reduce performance spikes, a double-buffer
mechanism is introduced. When the active buffer fills up, it is marked
as frozen and a background flushing process is triggered. Meanwhile, a
new buffer is allocated for incoming writes, allowing operations to
continue uninterrupted.

This approach reduces system blocking times and provides flexibility in
adjusting buffer parameters for improved performance.
2025-06-22 20:40:54 +08:00
rjl493456442
892a661ee2
core, triedb/pathdb: final integration (snapshot integration pt 5) (#30661)
In this pull request, snapshot generation in pathdb has been ported from 
the legacy state snapshot implementation. Additionally, when running in 
path mode, legacy state snapshot data is now managed by the pathdb
based snapshot logic.

Note: Existing snapshot data will be re-generated, regardless of whether 
it was previously fully constructed.
2025-05-16 18:29:38 +08:00
Shude Li
4ff5093df1
all: use fmt.Appendf instead of fmt.Sprintf where possible (#31301) 2025-03-25 14:53:02 +01:00
rjl493456442
a7f9523ae1
all: implement state history v2 (#30107)
This pull request delivers the new version of the state history, where
the raw storage key is used instead of the hash.

Before the cancun fork, it's supported by protocol to destruct a
specific account and therefore, all the storage slot owned by it should
be wiped in the same transition.

Technically, storage wiping should be performed through storage
iteration, and only the storage key hash will be available for traversal
if the state snapshot is not available. Therefore, the storage key hash
is chosen as the identifier in the old version state history.

Fortunately, account self-destruction has been deprecated by the
protocol since the Cancun fork, and there are no empty accounts eligible
for deletion under EIP-158. Therefore, we can conclude that no storage
wiping should occur after the Cancun fork. In this case, it makes no
sense to keep using hash.

Besides, another big reason for making this change is the current format
state history is unusable if verkle is activated. Verkle tree has a
different key derivation scheme (merkle uses keccak256), the preimage of
key hash must be provided in order to make verkle rollback functional.
This pull request is a prerequisite for landing verkle.

Additionally, the raw storage key is more human-friendly for those who
want to manually check the history, even though Solidity already
performs some hashing to derive the storage location.

---

This pull request doesn't bump the database version, as I believe the
database should still be compatible if users degrade from the new geth
version to old one, the only side effect is the persistent new version
state history will be unusable.

---------

Co-authored-by: Zsolt Felfoldi <zsfelfoldi@gmail.com>
2025-01-17 02:59:02 +01:00
rjl493456442
bc1ec69008
trie/pathdb: state iterator (snapshot integration pt 4) (#30654)
In this pull request, the state iterator is implemented. It's mostly a copy-paste
from the original state snapshot package, but still has some important changes
to highlight here:

(a) The iterator for the disk layer consists of a diff iterator and a disk iterator.

Originally, the disk layer in the state snapshot was a wrapper around the disk, 
and its corresponding iterator was also a wrapper around the disk iterator.
However, due to structural differences, the disk layer iterator is divided into
two parts:

- The disk iterator, which traverses the content stored on disk.
- The diff iterator, which traverses the aggregated state buffer.

Checkout `BinaryIterator` and `FastIterator` for more details.

(b) The staleness management is improved in the diffAccountIterator and
diffStorageIterator

Originally, in the `diffAccountIterator`, the layer’s staleness had to be checked 
within the Next function to ensure the iterator remained usable. Additionally, 
a read lock on the associated diff layer was required to first retrieve the account 
blob. This read lock protection is essential to prevent concurrent map read/write. 
Afterward, a staleness check was performed to ensure the retrieved data was 
not outdated.

The entire logic can be simplified as follows: a loadAccount callback is provided 
to retrieve account data. If the corresponding state is immutable (e.g., diff layers
in the path database), the staleness check can be skipped, and a single account 
data retrieval is sufficient. However, if the corresponding state is mutable (e.g., 
the disk layer in the path database), the callback can operate as follows:

```go
func(hash common.Hash) ([]byte, error) {
    dl.lock.RLock()
    defer dl.lock.RUnlock()

    if dl.stale {
        return nil, errSnapshotStale
    }
    return dl.buffer.states.mustAccount(hash)
}
```

The callback solution can eliminate the complexity for managing
concurrency with the read lock for atomic operation.
2024-12-16 21:10:08 +08:00