Wires the pieces from Commits 1-9 into a running system:
* triedb/pathdb.New: install the bintrieFlatCodec when isVerkle is set,
backed by the same verkle-namespaced db used for trie nodes.
* triedb/pathdb.database.go: drop isVerkle from the noBuild guard so the
bintrie generator (Commit 9) runs on startup, and remove it from the
generateSnapshot call path for the same reason.
* triedb/pathdb.disklayer.revert: hard-fail on bintrie because the
reorg path would replay merkle-shaped origin records against a
per-stem layout. Tracked in BINTRIE_FLAT_STATE_REORG_GAP.md.
* triedb/pathdb.journal: add IsBintrie to journalGenerator (rlp:"optional"
so v3 journals still decode) and make journalProgress a method on
generator so it stamps the active scheme; loadGenerator discards any
journal whose scheme does not match the database, forcing a fresh
regeneration.
* triedb/pathdb.reader: export RawStateReader, a small extension of
database.StateReader that exposes AccountRLP so callers outside the
package can reach the raw flat-state bytes without going through the
slim-RLP decode path that assumes merkle shape.
* core/state.reader: add bintrieFlatReader, the bintrie equivalent of
flatReader. It derives the EIP-7864 stem keys from (addr, slot),
performs two AccountRLP lookups per Account call (BasicData +
CodeHash), and decodes via bintrie.UnpackBasicData. Storage reads go
through a single AccountRLP lookup at the slot's full bintrie key.
* core/state.database.StateReader: dispatch to bintrieFlatReader when
the path database is in verkle mode; merkle path unchanged.
Depends on the lookup sentinel fix in the previous commit; without it
missing-account reads on bintrie misreport as "layer stale".
Route the flatStateCodec from Database through every flat-state call
site so that the trie-specific aspects of persistence and key derivation
live behind a single abstraction. Pure refactor: merkle behavior and
on-disk layout are unchanged because the only codec wired up is
merkleFlatCodec, whose methods are thin wrappers over the existing
rawdb accessors.
Threaded sites:
disklayer.account/storage use codec.{Read,AccountCacheKey,
StorageCacheKey} instead of direct
rawdb calls and bare hash slicing.
flush.writeStates takes a codec parameter; persistence
goes through codec.{Write,Delete}
{Account,Storage}.
buffer.flush carries the codec down into writeStates.
states.write/dbsize takes the codec for prefix-size
accounting.
generate.go (g.codec) the generator owns a codec, used by
generateAccounts/generateStorages
callbacks; the unused top-level
splitMarker helper is removed in favor
of codec.SplitMarker.
context.go the generator context owns the codec
and uses codec.{AccountPrefix,
StoragePrefix,Account/StorageKeyLength}
to construct iterators.
reader.go (HistoricalState) uses codec.{Account,Storage}Key for
caller-side key derivation.
The marker comparisons in writeStates remain merkle-shaped (two-tier
account+storage marker) because the bintrie path will use a separate
writer over single-tier stem markers in a later commit.
All existing pathdb tests pass.
Fixes https://github.com/ethereum/go-ethereum/issues/33907
Notably there is a behavioral change:
- Previously Geth will refuse to restart if the existing trienode
history is gapped with the state data
- With this PR, the gapped trienode history will be entirely reset and
being constructed from scratch
This pull request preserves the root->ID mappings in the path database
even after the associated state histories are truncated, regardless of
whether the truncation occurs at the head or the tail.
The motivation is to support an additional history type, trienode history.
Since the root->ID mappings are shared between two history instances,
they must not be removed by either one.
As a consequence, the root->ID mappings remain in the database even
after the corresponding histories are pruned. While these mappings may
become dangling, it is safe and cheap to keep them.
Additionally, this pull request enhances validation during historical
reader construction, ensuring that only canonical historical state will be
served.
This is a internal refactoring PR, renaming the history to stateHistory.
It's a pre-requisite PR for merging trienode history, avoid the name
conflict.
This pull request introduces a mechanism to improve state lookup
efficiency in pathdb by maintaining a lookup structure that eliminates
unnecessary iteration over diff layers.
The core idea is to track a mutation history for each dirty state entry
residing in the diff layers. This history records the state roots of all layers
in which the entry was modified, sorted from oldest to newest.
During state lookup, this mutation history is queried to find the most
recent layer whose state root either matches the target root or is a
descendant of it. This allows us to quickly identify the layer containing
the relevant data, avoiding the need to iterate through all diff layers from
top to bottom.
Besides, the overhead for state lookup is constant, no matter how many
diff layers are retained in the pathdb, which unlocks the potential to hold
more diff layers.
Of course, maintaining this lookup structure introduces some overhead.
For each state transition, we need to:
(a) update the mutation records for the modified state entries, and
(b) remove stale mutation records associated with outdated layers.
On our benchmark machine, it will introduce around 1ms overhead which is
acceptable.
In this pull request, the state iterator is implemented. It's mostly a copy-paste
from the original state snapshot package, but still has some important changes
to highlight here:
(a) The iterator for the disk layer consists of a diff iterator and a disk iterator.
Originally, the disk layer in the state snapshot was a wrapper around the disk,
and its corresponding iterator was also a wrapper around the disk iterator.
However, due to structural differences, the disk layer iterator is divided into
two parts:
- The disk iterator, which traverses the content stored on disk.
- The diff iterator, which traverses the aggregated state buffer.
Checkout `BinaryIterator` and `FastIterator` for more details.
(b) The staleness management is improved in the diffAccountIterator and
diffStorageIterator
Originally, in the `diffAccountIterator`, the layer’s staleness had to be checked
within the Next function to ensure the iterator remained usable. Additionally,
a read lock on the associated diff layer was required to first retrieve the account
blob. This read lock protection is essential to prevent concurrent map read/write.
Afterward, a staleness check was performed to ensure the retrieved data was
not outdated.
The entire logic can be simplified as follows: a loadAccount callback is provided
to retrieve account data. If the corresponding state is immutable (e.g., diff layers
in the path database), the staleness check can be skipped, and a single account
data retrieval is sufficient. However, if the corresponding state is mutable (e.g.,
the disk layer in the path database), the callback can operate as follows:
```go
func(hash common.Hash) ([]byte, error) {
dl.lock.RLock()
defer dl.lock.RUnlock()
if dl.stale {
return nil, errSnapshotStale
}
return dl.buffer.states.mustAccount(hash)
}
```
The callback solution can eliminate the complexity for managing
concurrency with the read lock for atomic operation.
This pull request ports some changes from the main state snapshot
integration one, specifically introducing the flat state tracking in
pathdb.
Note, the tracked flat state changes are only held in memory and won't
be persisted in the disk. Meanwhile, the correspoding state retrieval in
persistent state is also not supported yet. The states management in
disk is more complicated and will be implemented in a separate pull
request.
Part 1: https://github.com/ethereum/go-ethereum/pull/30752