Commit graph

24 commits

Author SHA1 Message Date
CPerezz
fcc0587ec3
core/state,triedb/pathdb: fix bintrieFlatReader disk-layer shape via per-offset extraction
Addresses review finding C2 (+ I5, S5, T2, T3, T12).

Before this commit, bintrieFlatCodec.ReadAccount returned the FULL
variable-length stem blob from disk while the in-memory diff-layer
buffer stored per-offset 32-byte values. The consumer,
bintrieFlatReader.Account, enforced len(basicBlob)!=32 → error, so
every disk-layer hit produced "bintrie BasicData leaf invalid length"
in production the moment the write buffer flushed. TestBintrieFlatReaderEndToEnd
did not catch this because it never forced a buffer → disk flush.

Fix: make bintrieFlatCodec.ReadAccount extract the offset from the
stem blob (mirroring ReadStorage), so the disk path and the buffer
path return the same 32-byte per-offset shape. Update
AccountCacheKey/StorageCacheKey to embed the full 32-byte key
(prefix + 31-byte stem + 1-byte offset), since caching under a
stem-only key would collapse BasicData and CodeHash into the same
slot and return the wrong value on the second hit. Update
Flush's cache-update loop to store per-offset entries from the
aggregated write set.

Design note: I considered the alternative of introducing a new
StemBlob(stem) interface method that returns the full blob synthesized
from a stem-level lookup index. Rejected because (a) the index is a
new data structure with its own consistency invariants, (b) the
per-offset approach is strictly local to the codec + reader, and (c)
the "1 Pebble read per Account" locality benefit is preserved at the
OS page cache level — both offsets at the same stem live in the same
Pebble block, so the second read is effectively free.

bintrieFlatReader.Account still does two AccountRLP lookups; the
torn-read hazard is gated by a new load-bearing invariant test,
TestBinaryHasherWritesBothBasicAndCodeHash, which asserts that
binaryHasher.updateAccount always emits both BasicData and CodeHash
leaves together. A future code-only update that broke this invariant
would fail the test.

Tests added:
  * TestBintrieFlatReaderEndToEndAfterFlush — explicitly flushes via
    tdb.Commit(root, false) and re-reads through a fresh StateReader.
    This is the smoking-gun regression for C2.
  * TestBintrieFlatReaderMultipleOffsetsPerStem — multiple offsets at
    the same stem (BasicData, CodeHash, header storage slots) all
    round-trip post-flush.
  * TestBintrieCodecCrossFlushRMW — two Flush calls to the same stem
    from different "blocks" correctly merge on disk, with prior
    offsets preserved.
  * TestBinaryHasherWritesBothBasicAndCodeHash — locks down the hasher
    co-write invariant that bintrieFlatReader.Account relies on.

Existing tests updated to match the new per-offset ReadAccount
semantics:
  * TestBintrieCodecAccountRoundTrip, TestBintrieCodecMultipleWritesSameStem,
    TestBintrieCodecDeleteAccount — now read per-offset rather than
    calling extractStemOffset on the raw blob.
  * TestBintrieCodecCacheKeysDisjoint — additionally verifies two
    offsets at the same stem produce distinct cache keys.

Error messages in bintrieFlatReader now include address and length
context (S5).
2026-04-15 15:00:40 +02:00
CPerezz
bfb77d98f6
core/state,triedb/pathdb: enable bintrie flat state reads end-to-end
Wires the pieces from Commits 1-9 into a running system:

* triedb/pathdb.New: install the bintrieFlatCodec when isVerkle is set,
  backed by the same verkle-namespaced db used for trie nodes.
* triedb/pathdb.database.go: drop isVerkle from the noBuild guard so the
  bintrie generator (Commit 9) runs on startup, and remove it from the
  generateSnapshot call path for the same reason.
* triedb/pathdb.disklayer.revert: hard-fail on bintrie because the
  reorg path would replay merkle-shaped origin records against a
  per-stem layout. Tracked in BINTRIE_FLAT_STATE_REORG_GAP.md.
* triedb/pathdb.journal: add IsBintrie to journalGenerator (rlp:"optional"
  so v3 journals still decode) and make journalProgress a method on
  generator so it stamps the active scheme; loadGenerator discards any
  journal whose scheme does not match the database, forcing a fresh
  regeneration.
* triedb/pathdb.reader: export RawStateReader, a small extension of
  database.StateReader that exposes AccountRLP so callers outside the
  package can reach the raw flat-state bytes without going through the
  slim-RLP decode path that assumes merkle shape.
* core/state.reader: add bintrieFlatReader, the bintrie equivalent of
  flatReader. It derives the EIP-7864 stem keys from (addr, slot),
  performs two AccountRLP lookups per Account call (BasicData +
  CodeHash), and decodes via bintrie.UnpackBasicData. Storage reads go
  through a single AccountRLP lookup at the slot's full bintrie key.
* core/state.database.StateReader: dispatch to bintrieFlatReader when
  the path database is in verkle mode; merkle path unchanged.

Depends on the lookup sentinel fix in the previous commit; without it
missing-account reads on bintrie misreport as "layer stale".
2026-04-15 15:00:40 +02:00
Gary Rong
d57dca07b1
core/state: integrate witness collector 2026-04-15 15:00:39 +02:00
Gary Rong
1ae462f08d
core/state: build hasher skeleton 2026-04-15 15:00:38 +02:00
Gary Rong
e2c00d6c96
core/state: add hasher interface definition 2026-04-15 14:59:05 +02:00
CPerezz
4b915af2c3
core/state: avoid Bytes() allocation in flatReader hash computations (#34025)
## Summary

Replace `addr.Bytes()` and `key.Bytes()` with `addr[:]` and `key[:]` in
`flatReader`'s `Account` and `Storage` methods. The former allocates a
copy while the latter creates a zero-allocation slice header over the
existing backing array.

## Benchmark (AMD EPYC 48-core, 500K entries, screening
`--benchtime=1x`)

| Metric | Baseline | Slice syntax | Delta |
|--------|----------|--------------|-------|
| Approve (Mgas/s) | 4.13 | 4.22 | +2.2% |
| BalanceOf (Mgas/s) | 168.3 | 190.0 | **+12.9%** |
2026-03-17 11:42:42 +01:00
rjl493456442
91cec92bf3
core, miner, tests: introduce codedb and simplify cachingDB (#33816)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Keeper Build (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
2026-03-10 08:29:21 +01:00
rjl493456442
c2595381bf
core: extend the code reader statistics (#33659)
Some checks are pending
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
/ Keeper Build (push) Waiting to run
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
This PR extends the statistics of contract code read by adding these
fields:

- **CacheHitBytes**: the total number of bytes served by cache
- **CacheMissBytes**: the total number of bytes read on cache miss
- **CodeReadBytes**: the total number of bytes for contract code read
2026-01-26 11:25:53 +01:00
rjl493456442
9623dcbca2
core/state: add cache statistics of contract code reader (#33532) 2026-01-08 11:48:45 +08:00
Guillaume Ballet
3f641dba87
trie, go.mod: remove all references to go-verkle and go-ipa (#33461)
In order to reduce the amount of code that is embedded into the keeper
binary, I am removing all the verkle code that uses go-verkle and
go-ipa. This will be followed by further PRs that are more like stubs to
replace code when the keeper build is detected.

I'm keeping the binary tree of course. This means that you will still
see `isVerkle` variables all over the codebase, but they will be renamed
when code is touched (i.e. this is not an invitation for 30+ AI slop
PRs).

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
2025-12-30 20:44:04 +08:00
Ng Wei Han
9a346873b8
core/state: fix incorrect contract code state metrics (#33376)
## Description
This PR fixes incorrect contract code state metrics by ensuring
duplicate codes are not counted towards the reported results.

## Rationale
The contract code metrics don't consider database deduplication. The
current implementation assumes that the results are only **slightly
inaccurate**, but this is not true, especially for data collection
efforts that started from the genesis block.
2025-12-10 11:33:59 +08:00
rjl493456442
042c47ce1a
core: log detailed statistics for slow block (#32812)
This PR introduces a new debug feature, logging the slow blocks with
detailed performance statistics, such as state read, EVM execution and
so on.

Notably, the detailed performance statistics of slow blocks won't be
logged during the sync to not overwhelm users. Specifically, the statistics
are only logged if there is a single block processed.

Example output

```
########## SLOW BLOCK #########
Block: 23537063 (0xa7f878611c2dd27f245fc41107d12ebcf06b4e289f1d6acf44d49a169554ee09) txs: 248, mgasps: 202.99

EVM execution: 63.295ms
Validation: 1.130ms
Account read: 6.634ms(648)
Storage read: 17.391ms(1434)
State hash: 6.722ms
DB commit: 3.260ms
Block write: 1.954ms
Total: 99.094ms

State read cache: account (hit: 622, miss: 26), storage (hit: 1325, miss: 109)
##############################
```
2025-12-02 14:43:51 +01:00
Guillaume Ballet
2a2f106a01
cmd/evm/internal/t8ntool, trie: support for verkle-at-genesis, use UBT, and move the transition tree to its own package (#32445)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Keeper Build (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
This is broken off of #31730 to only focus on testing networks that
start with verkle at genesis.

The PR has seen a lot of work since its creation, and it now targets
creating and re-executing tests for a binary tree testnet without the
transition (so it starts at genesis). The transition tree has been moved
to its own package. It also replaces verkle with the binary tree for
this specific application.

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
2025-11-14 15:25:30 +01:00
Zach Brown
2a795c14f4
all: fix problematic function name in comment (#32513)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
Fix problematic function name in comment.
Do my best to correct them all with a script to avoid spamming PRs.
2025-08-29 08:54:23 +08:00
pxwanglu
d0602ba45a
core,trie: fix typo in TransitionTrie (#32491)
Change `NewTransitionTree` to the correct `NewTransitionTrie`.

Signed-off-by: pxwanglu <pxwanglu@icloud.com>
2025-08-25 09:29:58 +02:00
Guillaume Ballet
ea3a71792d
trie, core/state: add the transition tree (verkle transition part 2) (#32366)
This add some of the changes that were missing from #31634. It
introduces the `TransitionTrie`, which is a façade pattern between the
current MPT trie and the overlay tree.

---------

Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
2025-08-15 14:34:32 +08:00
Guillaume Ballet
cf50026466
core/state: introduce the TransitionState object (verkle transition part 1) (#31634)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
This is the first part of #31532 

It maintains a series of conversion maker which are to be updated by the
conversion code (in a follow-up PR, this is a breakdown of a larger PR
to make things easier to review). They can be used in this way:

- During the conversion, by storing the conversion markers when the
block has been processed. This is meant to be written in a function that
isn't currently present, hence [this
TODO](https://github.com/ethereum/go-ethereum/pull/31634/files#diff-89272f61e115723833d498a0acbe59fa2286e3dc7276a676a7f7816f21e248b7R384).

Part of  https://github.com/ethereum/go-ethereum/issues/31583

---------

Signed-off-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
2025-08-05 09:34:12 +08:00
rjl493456442
c7b8924fe4
core/state: expose the state reader stats (#31998)
This pull request introduces a mechanism to expose statistics from the
state reader, specifically related to cache utilization during state prefetching.

To improve state access performance, a pair of state readers is constructed 
with a shared local cache. One reader to execute transactions  ahead of time
to warm up the cache. The other reader is used by the actual chain processing 
logic, which can benefit from the prefetched states.

This PR adds visibility into how effective the cache is by exposing relevant 
usage statistics.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2025-06-21 12:58:04 +08:00
nthumann
cc1293b8f1
all: reuse the global hash buffer (#31839)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Docker Image (push) Waiting to run
As https://github.com/ethereum/go-ethereum/pull/31769 defined a global
hash pool, so we can reuse it, and also remove the unnecessary
KeccakState buffering

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
2025-06-18 15:29:14 +08:00
Marius van der Wijden
0eb2eeea90
all: create global hasher pool (#31769)
This PR creates a global hasher pool that can be used by all packages.
It also removes a bunch of the package local pools.

It also updates a few locations to use available hashers or the global
hashing pool to reduce allocations all over the codebase.
This change should reduce global allocation count by ~1%

---------

Co-authored-by: Gary Rong <garyrong0905@gmail.com>
2025-05-09 13:52:40 +08:00
rjl493456442
485ff4bbff
core: implement in-block prefetcher (#31557)
This pull request enhances the block prefetcher by executing transactions 
in parallel to warm the cache alongside the main block processor.

Unlike the original prefetcher, which only executes the next block and
is limited to chain syncing, the new implementation can be applied to any 
block. This makes it useful not only during chain sync but also for regular 
block insertion after the initial sync.


---------

Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
2025-05-08 22:28:16 +08:00
rjl493456442
03c37cdb2b
core/state: introduce code reader interface (#30816)
This PR introduces a `ContractCodeReader` interface with functions defined:

type ContractCodeReader interface {
	Code(addr common.Address, codeHash common.Hash) ([]byte, error)
	CodeSize(addr common.Address, codeHash common.Hash) (int, error)
}

This interface can be implemented in various ways. Although the codebase
currently includes only one implementation, additional implementations
could be created for different purposes and scenarios, such as a code
reader designed for the Verkle tree approach or one that reads code from
the witness.

*Notably, this interface modifies the function’s semantics. If the
contract code is not found, no error will be returned. An error should
only be returned in the event of an unexpected issue, primarily for
future implementations.*

The original state.Reader interface is extended with ContractCodeReader
methods, it gives us more flexibility to manipulate the reader with additional
logic on top, e.g. Hooks.

type Reader interface {
	ContractCodeReader
	StateReader
}

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2024-11-29 15:32:45 +01:00
rjl493456442
74ef47462f
core/state, triedb/database: refactor state reader (#30712)
Co-authored-by: Martin HS <martin@swende.se>
2024-11-09 08:08:06 +08:00
rjl493456442
623b17ba20
core/state: state reader abstraction (#29761)
This pull request introduces a state.Reader interface for state
accessing.

The interface could be implemented in various ways. It can be pure trie
only reader, or the combination of trie and state snapshot. What's more,
this interface allows us to have more flexibility in the future, e.g.
the
archive reader (for accessing archive state).

Additionally, this pull request removes the following metrics

- `chain/snapshot/account/reads`
- `chain/snapshot/storage/reads`
2024-09-05 13:10:47 +03:00