This PR adds a cmd tool fetchpayload which connects to a
node and gets all the information in order to create a serialized
payload that can then be passed to the zkvm.
This PR allows users to prune their nodes up to the Prague fork. It
indirectly depends on #32157 and can't really be merged before eraE
files are widely available for download.
The `--history.chain` flag becomes mandatory for `prune-history`
command. Here I've listed all the edge cases that can happen and how we
behave:
## prune-history Behavior
| From | To | Result |
|-------------|--------------|--------------------------|
| full | postmerge | ✅ prunes |
| full | postprague | ✅ prunes |
| postmerge | postprague | ✅ prunes further |
| postprague | postmerge | ❌ can't unprune |
| any | all | ❌ use import-history |
## Node Startup Behavior
| DB State | Flag | Result |
|-------------|--------------|----------------------------------------------------------------|
| fresh | postprague | ✅ syncs from Prague |
| full | postprague | ❌ "run prune-history first" |
| postmerge | postprague | ❌ "run prune-history first" |
| postprague | postmerge | ❌ "can't unprune, use import-history or fix
flag" |
| pruned | all | ✅ accepts known prune points |
The `--remove.chain` flag incorrectly described itself as selecting
"state data" for removal, which could mislead operators into removing
the wrong data category. This corrects the description to accurately
reflect that the flag targets chain data (block bodies and receipts).
This PR contains two changes:
Firstly, the finalized header will be resolved from local chain if it's
not recently announced via the `engine_newPayload`.
What's more importantly is, in the downloader, originally there are two
code paths to push forward the pivot point block, one in the beacon
header fetcher (`fetchHeaders`), and another one is in the snap content
processer (`processSnapSyncContent`).
Usually if there are new blocks and local pivot block becomes stale, it
will firstly be detected by the `fetchHeaders`. `processSnapSyncContent`
is fully driven by the beacon headers and will only detect the stale pivot
block after synchronizing the corresponding chain segment. I think the
detection here is redundant and useless.
We got a report for a bug in the tracing journal which has the
responsibility to emit events for all state that must be reverted.
The edge case is as follows: on CREATE operations the nonce is
incremented. When a create frame reverts, the nonce increment associated
with it does **not** revert. This works fine on master. Now one step
further: if the parent frame reverts tho, the nonce **should** revert
and there is the bug.
This PR fixes a regression introduced in https://github.com/ethereum/go-ethereum/pull/33836/changes
Before PR 33836, running mainnet would automatically bump the cache size
to 4GB and trigger a cache re-calculation, specifically setting the key-value
database cache to 2GB.
After PR 33836, this logic was removed, and the cache value is no longer
recomputed if no command line flags are specified. The default key-value
database cache is 512MB.
This PR bumps the default key-value database cache size alongside the
default cache size for other components (such as snapshot) accordingly.
I observed failing tests in Hive `engine-withdrawals`:
-
https://hive.ethpandaops.io/#/test/generic/1772351960-ad3e3e460605c670efe1b4f4178eb422?testnumber=146
-
https://hive.ethpandaops.io/#/test/generic/1772351960-ad3e3e460605c670efe1b4f4178eb422?testnumber=147
```shell
DEBUG (Withdrawals Fork on Block 2): NextPayloadID before getPayloadV2:
id=0x01487547e54e8abe version=1
>> engine_getPayloadV2("0x01487547e54e8abe")
<< error: {"code":-38005,"message":"Unsupported fork"}
FAIL: Expected no error on EngineGetPayloadV2: error=Unsupported fork
```
The same failure pattern occurred for Block 3.
Per Shanghai engine_getPayloadV2 spec, pre-Shanghai payloads should be
accepted via V2 and returned as ExecutionPayloadV1:
- executionPayload: ExecutionPayloadV1 | ExecutionPayloadV2
- ExecutionPayloadV1 MUST be returned if payload timestamp < Shanghai
timestamp
- ExecutionPayloadV2 MUST be returned if payload timestamp >= Shanghai
timestamp
Reference:
-
https://github.com/ethereum/execution-apis/blob/main/src/engine/shanghai.md#engine_getpayloadv2
Current implementation only allows GetPayloadV2 on the Shanghai fork
window (`[]forks.Fork{forks.Shanghai}`), so pre-Shanghai payloads are
rejected with Unsupported fork.
If my interpretation of the spec is incorrect, please let me know and I
can adjust accordingly.
---------
Co-authored-by: muzry.li <muzry.li1@ambergroup.io>
Updates go-eth-kzg to
https://github.com/crate-crypto/go-eth-kzg/releases/tag/v1.5.0
Significantly reduces the allocations in VerifyCellProofBatch which is
around ~5% of all allocations on my node
---------
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Reduce allocations in calculation of tx cost.
---------
Co-authored-by: weixie.cui <weixie.cui@okg.com>
Co-authored-by: Sina M <1591639+s1na@users.noreply.github.com>
`GenerateChain` commits trie nodes asynchronously, and it can happen
that some nodes aren't making it to the db in time for `GenerateChain`
to open it and find the data it is looking for.
This is an optimization that existed for verkle and the MPT, but that
got dropped during the rebase.
Mark the nodes that were modified as needing recomputation, and skip the
hash computation if this is not needed. Otherwise, the whole tree is
hashed, which kills performance.
The computation of `MAIN_STORAGE_OFFSET` was incorrect, causing the last
byte of the stem to be dropped. This means that there would be a
collision in the hash computation (at the preimage level, not a hash
collision of course) if two keys were only differing at byte 31.
Eth currently has a flaky test, related to the tx fetcher.
The issue seems to happen when Unsubscribe is called while sub is nil.
It seems that chain.Stop() may be invoked before the loop starts in some
tests, but the exact cause is still under investigation through repeated
runs. I think this change will at least prevent the error.
The BatchSpanProcessor queue size was incorrectly set to
DefaultMaxExportBatchSize (512) instead of DefaultMaxQueueSize (2048).
I noticed the issue on bloatnet when analyzing the block building
traces. During a particular run, the miner was including 1000
transactions in a single block. When telemetry is enabled, the miner
creates a span for each transaction added to the block. With the queue
capped at 512, spans were silently dropped when production outpaced the
span export, resulting in incomplete traces with orphaned spans. While
this doesn't eliminate the possibility of drops under extreme
load, using the correct default restores the 4x buffer between queue
capacity and export batch size that the SDK was designed around.
Pebble maintains a batch pool to recycle the batch object. Unfortunately
batch object must be
explicitly returned via `batch.Close` function. This PR extends the
batch interface by adding
the close function and also invoke batch.Close in some critical code
paths.
Memory allocation must be measured before merging this change. What's
more, it's an open
question that whether we should apply batch.Close as much as possible in
every invocation.
For bal-devnet-3 we need to update the EIP-8024 implementation to the
latest spec changes: https://github.com/ethereum/EIPs/pull/11306
> Note: I deleted tests not specified in the EIP bc maintaining them
through EIP changes is too error prone.
Return the Amsterdam instruction set from `LookupInstructionSet` when
`IsAmsterdam` is true, so Amsterdam rules no longer fall through to the
Osaka jump table.
---------
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
In `buildPayload()`, the background goroutine uses a `select` to wait on
the recommit timer, the stop channel, and the end timer. When both
`timer.C` and `payload.stop` are ready simultaneously, Go's `select`
picks a case non-deterministically. This means the loop can enter the
`timer.C` case and perform an unnecessary `generateWork` call even after
the payload has been resolved.
Add a non-blocking check of `payload.stop` at the top of the `timer.C`
case to exit immediately when the payload has already been delivered.
We got a report that after v1.17.0 a geth-teku node starts to time out
on engine_getBlobsV2 after around 3h of operation. The culprit seems to
be our optional http2 service which Teku attempts first. The exact cause
of the timeout is still unclear.
This PR is more of a workaround than proper fix until we figure out the
underlying issue. But I don't expect http2 to particularly benefit
engine API throughput and latency. Hence it should be fine to disable it
for now.
The payload rebuild loop resets the timer with the full Recommit
duration after generateWork returns, making the actual interval
generateWork_elapsed + Recommit instead of Recommit alone.
Since fillTransactions uses Recommit (2s) as its timeout ceiling, the
effective rebuild interval can reach ~4s under heavy blob workloads —
only 1–2 rebuilds in a 6s half-slot window instead of the intended 3.
Fix by subtracting elapsed time from the timer reset.
### Before this fix
```
t=0s timer fires, generateWork starts
t=2s fillTransactions times out, timer.Reset(2s)
t=4s second rebuild starts
t=6s CL calls getPayload — gets the t=2s result (1 effective rebuild)
```
### After
```
t=0s timer fires, generateWork starts
t=2s fillTransactions times out, timer.Reset(2s - 2s = 0)
t=2s second rebuild starts immediately
t=4s timer.Reset(0), third rebuild starts
t=6s CL calls getPayload — gets the t=4s result (3 effective rebuilds)
```
This PR introduces a threshold (relative to current market base fees),
below which we suppress the diffusion of low fee transactions. Once base
fees go down, and if the transactions were not evicted in the meantime,
we release these transactions.
The PR also updates the bucketing logic to be more sensitive, removing
the extra logarithm. Blobpool description is also
updated to reflect the new behavior.
EIP-7918 changed the maximim blob fee decrease that can happen in a
slot. The PR also updates fee jump calculation to reflect this.
---------
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
With this, we are dropping support for protocol version eth/68. The only supported
version is eth/69 now. The p2p receipt encoding logic can be simplified a lot, and
processing of receipts during sync gets a little faster because we now transform
the network encoding into the database encoding directly, without decoding the
receipts first.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
fix the flaky test found in
https://ci.appveyor.com/project/ethereum/go-ethereum/builds/53601688/job/af5ccvufpm9usq39
1. increase the timeout from 3+1s to 15s, and use timer instead of
sleep(in the CI env, it may need more time to sync the 1024 blocks)
2. add `synced.Load()` to ensure the full async chain is finished
Signed-off-by: Delweng <delweng@gmail.com>
We didn't upgrade to 1.25, so this jumps over one version. I want to
upgrade all builds to Go 1.26 soon, but let's start with the Docker
build to get a sense of any possible issues.
Previously, handshake timeouts were recorded as generic peer errors
instead of timeout errors. waitForHandshake passed a raw
p2p.DiscReadTimeout into markError, but markError classified errors only
via errors.Unwrap(err), which returns nil for non-wrapped errors. As a
result, the timeoutError meter was never incremented and all such
failures fell into the peerError bucket.
This change makes markError switch on the base error, using
errors.Unwrap(err) when available and falling back to the original error
otherwise. With this adjustment, p2p.DiscReadTimeout is correctly mapped
to timeoutError, while existing behaviour for the other wrapped sentinel
errors remains unchanged
---------
Co-authored-by: lightclient <lightclient@protonmail.com>