Moves the snapshot DTOs into statedb.go directly above SnapshotCounts
and SnapshotReads, so the types and their sole constructors live in the
same file. Avoids a single-purpose state_counts.go.
The previous follow-up note: per-tx + pre-tx + post-tx StateDBs each
have their own stateObjects, so summing CodeLoaded/CodeLoadBytes
over-counts contracts whose code body was fetched by multiple phases.
Fix: snapshot per-StateDB the {address: codeLen} map of contracts whose
s.code is populated, plumb through the existing aggregation pipeline,
dedupe by address in resultHandler/prepareExecResult. The merged map's
size and value-sum become CodeLoaded and CodeLoadBytes respectively,
overriding the per-tx-summed values at the wiring site.
Empirical: a 3-tx block touching the same set of system contracts now
reports code=4, code_bytes=1098 (matches single-tx baseline) instead
of code=12, code_bytes=3294 under the prior over-count.
- Replace reader.go:553 line citation in GetStateStats with the function
name; line numbers rot.
- Note the BAL sum-of-CPU-time semantics on the read-time field group
in ExecuteStats so cross-client consumers don't read total ≤ TotalTime
as an invariant.
Replace the {Account, Storage, Code} time.Duration scalars threaded through
ProcessResultWithMetrics, txExecResult, processBlockPreTx and resultHandler
with a single ReadDurations struct + Add merge primitive. Same shape as
StateCounts. Adds (*StateDB).SnapshotReads() helper at the boundary.
Replace the cached AccountReadTime/StorageReadTime fields (which had a
snapshot-staleness bug fixed in 16e98f5d9 by re-calling Metrics()) with
a live ReadTimes() accessor. Metrics() now only returns commit/hash-phase
timings — it no longer touches atomics. blockchain.go reads atomics
directly via stateTransition.ReadTimes(), eliminating the refresh hack.
Also resolves the I1 fragility: Metrics() returning &s.metrics no longer
involves any writes inside the function, so concurrent callers can't race
on the read-time field updates.
Forward prefetchStateReader.Wait() through *reader.WaitPrefetch and call
it before reading the read-time atomics. Eliminates the edge-case where
prefetcher goroutines outlast block execution + commit. For slow blocks
(the metric's target audience) this is a no-op; for fast blocks it
ensures the metric is complete rather than slightly under.
The struct is 80 bytes (10 ints) — value semantics matches the type's
"snapshot, safe to pass by value" thesis stated in its doc comment, and
removes three unnecessary &-takings at call sites. No behavior change.
- Add a comment at the code-mutation gate explaining the deliberate
len(code) > 0 (vs code != nil) match against non-BAL semantics; in
devnet-3 BAL access lists, an empty []byte is non-nil but encodes
"no code install".
- Remove BALStateTransitionMetrics.OriginStorageLoadTime: declared but
never assigned anywhere in the tree. The actual state-transition
read time is captured by AccountReadTime/StorageReadTime added in
the prior commit.
Without this, the inline interface assertion in processBlockWithAccessList
silently fell through (the prefetchReader returned by ReaderEIP7928 is a
*reader wrapper, not the inner *prefetchStateReader), causing the
prefetcher contribution to state_read_ms to drop to zero in production.
Mirrors the existing GetStateStats forwarding pattern. Adds a regression
test that asserts *reader exposes PrefetchReadTimes via the BAL chain,
plus a fallback test for non-prefetch readers.
Adds the StateCounts type that the BAL slow-block work depends on:
- core/state/state_counts.go: 10-field plain-int snapshot type with
Add merge primitive; isolates the live atomic mutation surface from
the value-typed aggregation pipeline.
- core/state/statedb.go: SnapshotCounts() method that converts the
StateDB's atomic counters to a plain StateCounts at the boundary.
- core/blockchain_stats.go: ExecuteStats embeds state.StateCounts;
adds ExecWall/PostProcess/Prefetch BAL extension fields, the
slowBlockBAL JSON struct + BAL field on slowBlockLog, and extracts
buildSlowBlockLog as a pure helper for direct testing.
Without this commit the bal-devnet-3 branch as committed in subsequent
commits would not build for a fresh clone (state.StateCounts undefined).
The BAL reader tracker captures access list reads at the reader level.
When statedb has an account cached the BAL tracker is not informed of
the access. This is ok during the lifetime of a transaction because you
only need to record the access the first time. It is also ok during the
lifetime of a block because BAL reads are block-level (same as statedb
caches).
Where I think the issue can rise is in the miner. Namely when building a
block, if the miner picks up a tx which fails, it drops it and picks up
another tx to include. There might be some edge case here where the
failed tx which is not included poisons the cache and a future block
which is included omits an account because it wasn't aware of the
access.
* add method on StateReaderTracker to clear the accumulated reads
* don't factor the BAL size into the payload size during construction in the miner
* simplify miner code for constructing payloads-with-BALs via the use of aformentioned StateReaderTracker clear method
* clean up the configuration of the BAL execution mode based on the preset flag specified
## Summary
In binary trie mode, `IntermediateRoot` calls `updateTrie()` once per
dirty account. But with the binary trie there is only one unified trie
(`OpenStorageTrie` returns `self`), so each call redundantly does
per-account trie setup: `getPrefetchedTrie`, `getTrie`, slice
allocations for deletions/used, and `prefetcher.used` — all for the same
trie pointer.
This PR replaces the per-account `updateTrie()` calls with a single flat
loop that applies all storage updates directly to `s.trie`. The MPT path
is unchanged. The prefetcher trie replacement is guarded to avoid
overwriting the binary trie that received updates.
This is the phase-1 counterpart to #34021 (H01). H01 fixes the commit
phase (`trie.Commit()` called N+1 times). This PR fixes the update phase
(`updateTrie()` called N times with redundant setup). Same root cause —
unified binary trie operated on per-account — different phases.
## Benchmark (Apple M4 Pro, 500K entries, `--benchtime=10s --count=3`,
on top of #34021)
| Metric | H01 baseline | H01 + this PR | Delta |
|--------|:------------:|:-------------:|:-----:|
| Approve (Mgas/s) | 368 | **414** | **+12.5%** |
| BalanceOf (Mgas/s) | 870 | 875 | +0.6% |
Should be rebased after #34021 is merged.
## Summary
**Bug fix.** In Verkle mode, all state objects share a single unified
trie (`OpenStorageTrie` returns `self`). During `stateDB.commit()`, the
main account trie is committed via `s.trie.Commit(true)`, which calls
`CollectNodes` to traverse and serialize the entire tree. However, each
dirty account's `obj.commit()` also calls `s.trie.Commit(false)` on the
**same trie object**, redundantly traversing and serializing the full
tree once per dirty account.
With N dirty accounts per block, this causes **N+1 full-tree
traversals** instead of 1. On a write-heavy workload (2250 SSTOREs),
this produces ~131 GB of allocations per block from duplicate NodeSet
creation and serialization. It also causes a latent data race from N+1
goroutines concurrently calling `CollectNodes` on shared `InternalNode`
objects.
This commit adds an `IsVerkle()` early return in `stateObject.commit()`
to skip the redundant `trie.Commit()` call.
## Benchmark (AMD EPYC 48-core, 500K entries, `--benchtime=10s
--count=3`)
| Metric | Baseline | Fixed | Delta |
|--------|----------|-------|-------|
| Approve (Mgas/s) | 4.16 ± 0.37 | **220.2 ± 10.1** | **+5190%** |
| BalanceOf (Mgas/s) | 966.2 ± 8.1 | 971.0 ± 3.0 | +0.5% |
| Allocs/op (approve) | 136.4M | 792K | **-99.4%** |
Resolves the TODO in statedb.go about the account trie commit being
"very heavy" and "something's wonky".
---------
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Pebble maintains a batch pool to recycle the batch object. Unfortunately
batch object must be
explicitly returned via `batch.Close` function. This PR extends the
batch interface by adding
the close function and also invoke batch.Close in some critical code
paths.
Memory allocation must be measured before merging this change. What's
more, it's an open
question that whether we should apply batch.Close as much as possible in
every invocation.
From the https://eips.ethereum.org/EIPS/eip-7928
> SELFDESTRUCT (in-transaction): Accounts destroyed within a transaction
MUST be included in AccountChanges without nonce or code changes.
However, if the account had a positive balance pre-transaction, the
balance change to zero MUST be recorded. Storage keys within the self-destructed
contracts that were modified or read MUST be included as a storage_reads
entry.
The storage read against the empty contract (zero storage) should also
be recorded in the BAL's readlist.
The upstream libray has removed the assembly-based implementation of
keccak. We need to maintain our own library to avoid a peformance
regression.
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
I recently went on a longer flight and started profiling the geth block
production pipeline.
This PR contains a bunch of individual fixes split into separate
commits.
I can drop some if necessary.
Benchmarking is not super easy, the benchmark I wrote is a bit
non-deterministic.
I will try to write a better benchmark later
```
goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/miner
cpu: Intel(R) Core(TM) Ultra 7 155U
│ /tmp/old.txt │ /tmp/new.txt │
│ sec/op │ sec/op vs base │
BuildPayload-14 141.5µ ± 3% 146.0µ ± 6% ~ (p=0.346 n=200)
│ /tmp/old.txt │ /tmp/new.txt │
│ B/op │ B/op vs base │
BuildPayload-14 188.2Ki ± 4% 177.4Ki ± 4% -5.71% (p=0.018 n=200)
│ /tmp/old.txt │ /tmp/new.txt │
│ allocs/op │ allocs/op vs base │
BuildPayload-14 2.703k ± 4% 2.453k ± 5% -9.25% (p=0.000 n=200)
```
core/state: add bounds check in heap eviction loop
Add len(h) > 0 check before accessing h[0] to prevent potential panic
and align with existing heap access patterns in txpool, p2p, and mclock
packages.
Implement standardized JSON format for slow block logging to enable
cross-client performance analysis and protocol research.
This change is part of the Cross-Client Execution Metrics initiative
proposed by Gary Rong: https://hackmd.io/dg7rizTyTXuCf2LSa2LsyQ
The standardized metrics enabled data-driven analysis like the EIP-7907
research: https://ethresear.ch/t/data-driven-analysis-on-eip-7907/23850
JSON format includes:
- block: number, hash, gas_used, tx_count
- timing: execution_ms, total_ms
- throughput: mgas_per_sec
- state_reads: accounts, storage_slots, bytecodes, code_bytes
- state_writes: accounts, storage_slots, bytecodes
- cache: account/storage/code hits, misses, hit_rate
This should come after merging #33522
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
This PR extends the statistics of contract code read by adding these
fields:
- **CacheHitBytes**: the total number of bytes served by cache
- **CacheMissBytes**: the total number of bytes read on cache miss
- **CodeReadBytes**: the total number of bytes for contract code read
Fixes#33630
Sort self-destructed addresses before emitting hooks in Finalise() to
ensure deterministic ordering and fix flaky test
TestHooks_OnCodeChangeV2.
---------
Co-authored-by: jwasinger <j-wasinger@hotmail.com>