I recently went on a longer flight and started profiling the geth block
production pipeline.
This PR contains a bunch of individual fixes split into separate
commits.
I can drop some if necessary.
Benchmarking is not super easy, the benchmark I wrote is a bit
non-deterministic.
I will try to write a better benchmark later
```
goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/miner
cpu: Intel(R) Core(TM) Ultra 7 155U
│ /tmp/old.txt │ /tmp/new.txt │
│ sec/op │ sec/op vs base │
BuildPayload-14 141.5µ ± 3% 146.0µ ± 6% ~ (p=0.346 n=200)
│ /tmp/old.txt │ /tmp/new.txt │
│ B/op │ B/op vs base │
BuildPayload-14 188.2Ki ± 4% 177.4Ki ± 4% -5.71% (p=0.018 n=200)
│ /tmp/old.txt │ /tmp/new.txt │
│ allocs/op │ allocs/op vs base │
BuildPayload-14 2.703k ± 4% 2.453k ± 5% -9.25% (p=0.000 n=200)
```
core/state: add bounds check in heap eviction loop
Add len(h) > 0 check before accessing h[0] to prevent potential panic
and align with existing heap access patterns in txpool, p2p, and mclock
packages.
Heartbeats are used to drop non-executable transactions from the queue.
The timeout mechanism was not clearly documented, and it was updates
also when not necessary.
Implement standardized JSON format for slow block logging to enable
cross-client performance analysis and protocol research.
This change is part of the Cross-Client Execution Metrics initiative
proposed by Gary Rong: https://hackmd.io/dg7rizTyTXuCf2LSa2LsyQ
The standardized metrics enabled data-driven analysis like the EIP-7907
research: https://ethresear.ch/t/data-driven-analysis-on-eip-7907/23850
JSON format includes:
- block: number, hash, gas_used, tx_count
- timing: execution_ms, total_ms
- throughput: mgas_per_sec
- state_reads: accounts, storage_slots, bytecodes, code_bytes
- state_writes: accounts, storage_slots, bytecodes
- cache: account/storage/code hits, misses, hit_rate
This should come after merging #33522
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Based on [EIP-7864](https://eips.ethereum.org/EIPS/eip-7864), the tree
index should be 32 bytes instead of 31 bytes.
```
def get_tree_key(address: Address32, tree_index: int, sub_index: int):
# Assumes STEM_SUBTREE_WIDTH = 256
return tree_hash(address + tree_index.to_bytes(32, "little"))[:31] + bytes(
[sub_index]
)
```
This PR extends the statistics of contract code read by adding these
fields:
- **CacheHitBytes**: the total number of bytes served by cache
- **CacheMissBytes**: the total number of bytes read on cache miss
- **CodeReadBytes**: the total number of bytes for contract code read
Calling `pool.priced.Removed` is needed to keep is sync with
`pool.all.Remove`.
It was called in other occurances, but not here.
The counter is used for internal heap management. It was working even without this, just not calling reheap at the intended frequency.
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
This PR adds metrics that count the number of accounts having transactions
in the txpool. Together with the transaction count this can be used as a
simple indicator of the diversity of transactions in the pool.
Note: as an alternative implementation, we could use a periodic or event
driven update of these Gauges using len.
I've preferred this implementation to match what we have for the pool
sizes.
---------
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Fixes#33630
Sort self-destructed addresses before emitting hooks in Finalise() to
ensure deterministic ordering and fix flaky test
TestHooks_OnCodeChangeV2.
---------
Co-authored-by: jwasinger <j-wasinger@hotmail.com>
This PR reverts a part of changes brought by https://github.com/ethereum/go-ethereum/pull/33281/changes
Specifically, read-only protection should always be enforced at the opcode level,
regardless of whether the check has already been performed during gas metering.
It should act as a gatekeeper, otherwise, it is easy to introduce errors by adding
new gas measurement logic without consistently applying the read-only protection.
The core part of this PR that we need to adopt is to move the code and
nonce change hook invocations to occur at tx finalization, instead of
when the selfdestruct opcode is called.
Additionally:
* remove `SelfDestruct6780` now that it is essentially the same as
`SelfDestruct` just gated by `is new contract`
* don't duplicate `BalanceIncreaseSelfdestruct` (transfer to recipient
of selfdestruct) in the hooked statedb and in the opcode handler for the
selfdestruct opcode.
* balance is burned immediately when the beneficiary of the selfdestruct
is the sender, and the contract was created in the same transaction.
Previously we emit two balance increases to the recipient (see above
point), and a balance decrease from the sender.
---------
Co-authored-by: Sina Mahmoodi <itz.s1na@gmail.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Co-authored-by: lightclient <lightclient@protonmail.com>
There's no need to perform the subsequent state access on the target if
we already know that we are out of gas.
This aligns the state access behavior of selfdestruct with EIP-7928
This PR causes execution to terminate at the gas handler in the case of
sstore/call if they are invoked in a static execution context.
This aligns the behavior with EIP 7928 by ensuring that we don't record
any state reads in the access list from an SSTORE/CALL in this
circumstance.
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
This PR fixes an issue where the tx indexer would repeatedly try to
“unindex” a block with a missing body, causing a spike in CPU usage.
This change skips these blocks and advances the index tail. The fix was
verified both manually on a local development chain and with a new test.
resolves#33371
Allow the blobpool to accept blobs out of nonce order
Previously, we were dropping blobs that arrived out-of-order. However,
since fetch decisions are done on receiver side,
out-of-order delivery can happen, leading to inefficiencies.
This PR:
- adds an in-memory blob tx storage, similar to the queue in the
legacypool
- a limited number of received txs can be added to this per account
- txs waiting in the gapped queue are not processed further and not
propagated further until they are unblocked by adding the previos nonce
to the blobpool
The size of the in-memory storage is currently limited per account,
following a slow-start logic.
An overall size limit, and a TTL is also enforced for DoS protection.
---------
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: MariusVanDerWijden <m.vanderwijden@live.de>
### Description
Add a new `OnStateUpdate` hook which gets invoked after state is
committed.
### Rationale
For our particular use case, we need to obtain the state size metrics at
every single block when fuly syncing from genesis. With the current
state sizer, whenever the node is stopped, the background process must
be freshly initialized. During this re-initialization, it can skip some
blocks while the node continues executing blocks, causing gaps in the
recorded metrics.
Using this state update hook allows us to customize our own data
persistence logic, and we would never skip blocks upon node restart.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Adds missing trienode freezer case to InspectFreezerTable, making it
consistent with InspectFreezer which already supports it.
Co-authored-by: m6xwzzz <maskk.weller@gmail.com>
Fix#33390
`setHeadBeyondRoot` was failing to invalidate finalized blocks because
it compared against the original head instead of the rewound root. This
fix updates the comparison to use the post-rewind block number,
preventing the node from reporting a finalized block that no longer
exists. Also added relevant test cases for it.
In order to reduce the amount of code that is embedded into the keeper
binary, I am removing all the verkle code that uses go-verkle and
go-ipa. This will be followed by further PRs that are more like stubs to
replace code when the keeper build is detected.
I'm keeping the binary tree of course. This means that you will still
see `isVerkle` variables all over the codebase, but they will be renamed
when code is touched (i.e. this is not an invitation for 30+ AI slop
PRs).
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Blobs are stored per transaction in the pool, so we need billy to handle
up to the per-tx limit, not to the per-block limit.
The per-block limit was larger than the per-tx limit, so it not a bug,
we just created and handled a few billy files for no reason.
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
This PR removes the legacy sidecar conversion logic.
After the Osaka fork, the blobpool will accept only blob sidecar version
1.
Any remaining version 0 blob transactions, if they still exist, will no
longer
be eligible for inclusion.
Note that conversion at the RPC layer is still supported, and version 0
blob
transactions will be automatically converted to version 1 there.
When iterating over a map with value types in Go, the loop variable is a
copy. In `markCodeExistence`, assigning to `code.exists` modified only
the local copy, not the actual map entry, causing the existence flag to
always remain false.
This resulted in overcounting contract codes in state size statistics,
as codes that already existed in the database were incorrectly counted
as new.
Fix by changing `codes` from `map[common.Address]contractCode` to
`map[common.Address]*contractCode`, so mutations apply directly to the
struct.
This PR fixes the bug reported in #33365.
The impact of the bug is not catastrophic. After a transaction is
ultimately fetched, validation and propagation will be performed based
on the fetched body, and any response with a mismatched type is treated
as a protocol violation. An attacker could only waste the limited
portion of victim’s bandwidth at most.
However, the reasons for submitting this PR are as follows
1. Fetching a transaction announced with an arbitrary type is a weird
behavior.
2. It aligns with efforts such as EIP-8077 and #33119 to make the
fetcher smarter and reduce bandwidth waste.
Regarding the `FilterType` function, it could potentially be implemented
by modifying the Filter function's parameter itself, but I wasn’t sure
whether changing that function is acceptable, so I left it as is.
## Description
This PR fixes incorrect contract code state metrics by ensuring
duplicate codes are not counted towards the reported results.
## Rationale
The contract code metrics don't consider database deduplication. The
current implementation assumes that the results are only **slightly
inaccurate**, but this is not true, especially for data collection
efforts that started from the genesis block.
The EIP says to increment PC by 2 _instead of_ the standard increment by
1. The opcode handlers added in #33095 result in incrementing PC by 3,
because they ignored the increment already present in `interpreter.go`.
Does this need to be better specified in the EIP? I've added a [new test
case](https://github.com/ethereum/EIPs/pull/10859) for it anyway.
Found by @0xriptide.
The original condition `deleted && !logPrinted || time.Since(...)` was
incorrectly grouping due to operator precedence, causing logs to print
every 10 seconds even when no deletion was happening (deleted=false).
According to SafeDeleteRange documentation, the 'deleted' parameter is
"true if entries have actually been deleted already". The logging should
only happen when deletion is active.
Fixed by adding parentheses: `deleted && (!logPrinted ||
time.Since(...))`Now logs print only when items are being deleted AND
either it's the first log or 10+ seconds have passed since the last one.
This PR exposes the state size statistics to the metrics, making them
easier to demonstrate.
Note that the contract code included in the metrics is not
de-duplicated, so the reported size
will appear larger than the actual storage footprint.
This PR introduces a new debug feature, logging the slow blocks with
detailed performance statistics, such as state read, EVM execution and
so on.
Notably, the detailed performance statistics of slow blocks won't be
logged during the sync to not overwhelm users. Specifically, the statistics
are only logged if there is a single block processed.
Example output
```
########## SLOW BLOCK #########
Block: 23537063 (0xa7f878611c2dd27f245fc41107d12ebcf06b4e289f1d6acf44d49a169554ee09) txs: 248, mgasps: 202.99
EVM execution: 63.295ms
Validation: 1.130ms
Account read: 6.634ms(648)
Storage read: 17.391ms(1434)
State hash: 6.722ms
DB commit: 3.260ms
Block write: 1.954ms
Total: 99.094ms
State read cache: account (hit: 622, miss: 26), storage (hit: 1325, miss: 109)
##############################
```
This change fixes a stall in the legacy blob sidecar conversion pipeline
where tasks that arrived during an active batch could remain unprocessed
indefinitely after that batch completed, unless a new external event
arrived.
The root cause was that the loop did not restart processing in
the case <-done: branch even when txTasks had accumulated work, relying
instead on a future event to retrigger the scheduler. This behavior is
inconsistent with the billy task pipeline, which immediately chains to
the next task via runNextBillyTask() without requiring an external trigger.
The fix adds a symmetric restart path in `case <-done`: that checks
`len(txTasks) > 0`, clones the accumulated tasks, clears the queue, and
launches a new run with a fresh done and interrupt.
This preserves batching semantics, prevents indefinite blocking of callers
of convert(), and remains safe during shutdown since the quit path
still interrupts and awaits the active batch. No public interfaces or logging
were changed.
Looks like (in some very EVM specific tests) we spent a lot of time
resizing memory. If the underlying array is big enough, we can speed it
up a bit by simply slicing the memory.
goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/vm
cpu: Intel(R) Core(TM) Ultra 7 155U
│ /tmp/old.txt │ /tmp/new.txt │
│ sec/op │ sec/op vs base │
Resize-14 6.145n ± 9% 1.854n ± 14% -69.83% (p=0.000 n=10)
│ /tmp/old.txt │ /tmp/new.txt │
│ B/op │ B/op vs base │
Resize-14 5.000 ± 0% 5.000 ± 0% ~ (p=1.000 n=10)
│ /tmp/old.txt │ /tmp/new.txt │
│ allocs/op │ allocs/op vs base │
Resize-14 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹
From the blocktest benchmark:
620ms 10.93s (flat, cum) 9.92% of Total
. . 80:func (m *Memory) Resize(size uint64) {
30ms 60ms 81: if uint64(m.Len()) < size {
590ms 10.87s 82: m.store = append(m.store, make([]byte, size-uint64(m.Len()))...)
. . 83: }
. . 84:}
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
EIP-8024: Backward compatible SWAPN, DUPN, EXCHANGE
Introduces additional instructions for manipulating the stack which
allow accessing the stack at higher depths. This is an initial implementation
of the EIP, which is still in Review stage.