Live testing on bal-devnet-2 confirmed that computed roots DO diverge from
header roots. Block 75315 computed root 0xe909c7.. vs header root
0x9acbbe.. — untracked contracts' storage roots in the local trie are from
snap sync time and differ from the actual current roots, even when the
storage root resolver successfully queries peers.
This means subsequent blocks must chain off the computed root (via
partialState.Root()), not the header root (via parent.Root()). Restore
the stateRoot field using atomic.Pointer[common.Hash] instead of the
previous sync.RWMutex for lock-free concurrent access.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Apply review fixes: BAL iterator start (Fix 2), fatal root mismatch when
all storage resolved (Fix 3), WriteBlockWithoutState error handling (Fix 4),
contract filter construction order (Fix 5), canonical hash backfill (Fix 6),
underflow guard in gap processing (Fix 8), O(n²) prepend fix (Fix 9),
ReadBALHistory corruption detection (Fix 11), incomplete resolution error
(Fix 13), RLP encode panic (Fix 14), gap processing log level (Fix 16),
TriggerPartialResync message (Fix 18), and comment accuracy fixes.
Remove the stateRoot field and sync.RWMutex from PartialState entirely.
Since partial state maintains the full account trie, the computed root
always matches the header root (assuming storage root resolution succeeds).
ProcessBlockWithBAL now derives parent root from parent.Root() directly,
matching how full nodes derive state root from currentBlock headers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix several interacting issues that prevented partial state nodes from
syncing and following the chain on bal-devnet-2:
1. Stale pivot deadlock: Replace unconditional pivot suppression with
rate-limited advances (2-minute cooldown). This prevents the restart
loop bug while allowing recovery when the initial pivot is too stale
for peers to serve.
2. Storage root resolution: Add snap-based resolver that queries peers
for untracked contracts' storage roots during BAL processing. This
lets the computed state root converge toward the header root.
3. SetCanonical for partial state: When the computed root differs from
the header root (expected when untracked contracts have unresolved
storage roots), check HasState(partialState.Root()) instead of only
HasState(block.Root()). Guard against zero root during snap sync.
4. Canonical hash backfill: AdvancePartialHead now writes canonical
hashes for all blocks between the pivot and snap head, fixing the
"final block not in canonical chain" error caused by
InsertReceiptChain skipping blocks whose bodies already exist.
5. Gap block processing: After snap sync completes, process accumulated
blocks between the sync head and chain tip using their persisted BALs
before entering steady-state chain following.
6. Computed root chaining: Use partialState.Root() (actual computed root)
as parentRoot for subsequent blocks, not the header root. This ensures
correct trie chaining when computed != header root.
Tested end-to-end on bal-devnet-2: snap sync completes, gap blocks
processed, canonical head advances at chain tip (~1 block/12s).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix the post-sync deadlock where blocks validated via BAL in newPayload
were never written to the database, causing ForkchoiceUpdated to fail
finding them and triggering infinite sync cycles.
Changes:
- Export WriteBlockWithoutState and call it after ProcessBlockWithBAL
in newPayload, so FCU can find blocks via GetBlockByHash
- Guard SetCanonical against recoverAncestors for partial state nodes
(they can't re-execute blocks, only apply BAL diffs)
- Auto-disable log indexing when partial state is enabled (no receipts)
- Fix BAL type field accesses to match upstream bal-devnet-2 types
(StorageChanges, CodeChanges, BalanceChanges, Validate signature)
- Update newPayload signature (BAL now comes from ExecutableData params)
- Add partial sync scripts and documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add chain retention for partial state mode: only the most recent N blocks
(default 1024) retain bodies and receipts. During sync, older blocks are
skipped entirely. After sync, the freezer enforces a rolling window.
Add engine API support for Block Access Lists (EIP-7928): NewPayloadV5
accepts BAL data alongside execution payloads, enabling partial state
nodes to receive per-block storage access information from the CL.
Fix beacon backfilling failure caused by dynamic chain cutoff not
clearing the cutoff hash (which remained at the genesis hash).
Add partial state awareness to eth_call/eth_estimateGas to return clear
errors when accessing untracked contract storage.
This PR contains two changes:
Firstly, the finalized header will be resolved from local chain if it's
not recently announced via the `engine_newPayload`.
What's more importantly is, in the downloader, originally there are two
code paths to push forward the pivot point block, one in the beacon
header fetcher (`fetchHeaders`), and another one is in the snap content
processer (`processSnapSyncContent`).
Usually if there are new blocks and local pivot block becomes stale, it
will firstly be detected by the `fetchHeaders`. `processSnapSyncContent`
is fully driven by the beacon headers and will only detect the stale pivot
block after synchronizing the corresponding chain segment. I think the
detection here is redundant and useless.
I observed failing tests in Hive `engine-withdrawals`:
-
https://hive.ethpandaops.io/#/test/generic/1772351960-ad3e3e460605c670efe1b4f4178eb422?testnumber=146
-
https://hive.ethpandaops.io/#/test/generic/1772351960-ad3e3e460605c670efe1b4f4178eb422?testnumber=147
```shell
DEBUG (Withdrawals Fork on Block 2): NextPayloadID before getPayloadV2:
id=0x01487547e54e8abe version=1
>> engine_getPayloadV2("0x01487547e54e8abe")
<< error: {"code":-38005,"message":"Unsupported fork"}
FAIL: Expected no error on EngineGetPayloadV2: error=Unsupported fork
```
The same failure pattern occurred for Block 3.
Per Shanghai engine_getPayloadV2 spec, pre-Shanghai payloads should be
accepted via V2 and returned as ExecutionPayloadV1:
- executionPayload: ExecutionPayloadV1 | ExecutionPayloadV2
- ExecutionPayloadV1 MUST be returned if payload timestamp < Shanghai
timestamp
- ExecutionPayloadV2 MUST be returned if payload timestamp >= Shanghai
timestamp
Reference:
-
https://github.com/ethereum/execution-apis/blob/main/src/engine/shanghai.md#engine_getpayloadv2
Current implementation only allows GetPayloadV2 on the Shanghai fork
window (`[]forks.Fork{forks.Shanghai}`), so pre-Shanghai payloads are
rejected with Unsupported fork.
If my interpretation of the spec is incorrect, please let me know and I
can adjust accordingly.
---------
Co-authored-by: muzry.li <muzry.li1@ambergroup.io>
This PR removes the legacy sidecar conversion logic.
After the Osaka fork, the blobpool will accept only blob sidecar version
1.
Any remaining version 0 blob transactions, if they still exist, will no
longer
be eligible for inclusion.
Note that conversion at the RPC layer is still supported, and version 0
blob
transactions will be automatically converted to version 1 there.
This moves the tracking of the current syncmode into the downloader, fixing an
issue where the syncmode being requested through the engine API could go
out-of-sync with the actual mode being performed by downloader.
Fixes#32629
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This adds checks into getPayload to ensure the correct version is called
for the fork which applies to the payload.
---------
Co-authored-by: jsvisa <delweng@gmail.com>
Addresses https://github.com/ethereum/go-ethereum/issues/32630
This pull request enables the stateless engine APIs for Osaka and the
following BPOs. Apart from that, a few more descriptions have been added
in the engine APIs, making it easier to follow the spec change.
Fixes an issue I accidentally introduced in #32579. Essentially, because
we gate the engine methods based on particular forks and I did not add
the BPOs as allowed forks to the method.
Another getBlobs PR on top of
https://github.com/ethereum/go-ethereum/pull/32190 to avoid some minor
regressions.
- bring back more log messages from before
- continue processing also on some internal errors
- ensure v2 complies with spec even if there are internal errors
---------
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
This pull request fixes a regression, introduced in #32190
Specifically, in GetBlobsV1 engine API, if any blob is missing, the null
should be placed in
response, unfortunately a behavioral change was introduced in #32190,
returning an error
instead.
What's more, a more comprehensive test suite is added to cover both
`GetBlobsV1` and
`GetBlobsV2` endpoints.
- If all the `vhashes` are in the same `sidecar`, then it will load the
same blob tx many times. This PR aims to upgrade this.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
The main purpose of this change is to enforce the version setting when
constructing the blobSidecar, avoiding creating sidecar with wrong/default
version tag.
This situation was failing quietly for me recently when I had a partial
data corruption issue. Changing the log level to Error would increase
visibility for me.
This PR contains three refactors:
- refactor the latest fork check that we use quite extensively
- refactor the nil checks in NewPayloads
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
The total difficulty is the sum of all block difficulties from genesis
to a certain block. This value was used in PoW for deciding which chain
is heavier, and thus which chain to select. Since PoS has a different
fork selection algorithm, all blocks since the merge have a difficulty
of 0, and all total difficulties are the same for the past 2 years.
Whilst the TDs are mostly useless nowadays, there was never really a
reason to mess around removing them since they are so tiny. This
reasoning changes when we go down the path of pruned chain history. In
order to reconstruct any TD, we **must** retrieve all the headers from
chain head to genesis and then iterate all the difficulties to compute
the TD.
In a world where we completely prune past chain segments (bodies,
receipts, headers), it is not possible to reconstruct the TD at all. In
a world where we still keep chain headers and prune only the rest,
reconstructing it possible as long as we process (or download) the chain
forward from genesis, but trying to snap sync the head first and
backfill later hits the same issue, the TD becomes impossible to
calculate until genesis is backfilled.
All in all, the TD is a messy out-of-state, out-of-consensus computed
field that is overall useless nowadays, but code relying on it forces
the client into certain modes of operation and prevents other modes or
other optimizations. This PR completely nukes out the TD from the node.
It doesn't compute it, it doesn't operate on it, it's as if it didn't
even exist.
Caveats:
- Whenever we have APIs that return TD (devp2p handshake, tracer, etc.)
we return a TD of 0.
- For era files, we recompute the TD during export time (fairly quick)
to retain the format content.
- It is not possible to "verify" the merge point (i.e. with TD gone, TTD
is useless). Since we're not verifying PoW any more, just blindly trust
it, not verifying but blindly trusting the many year old merge point
seems just the same trust model.
- Our tests still need to be able to generate pre and post merge blocks,
so they need a new way to split the merge without TTD. The PR introduces
a settable ttdBlock field on the consensus object which is used by tests
as the block where originally the TTD happened. This is not needed for
live nodes, we never want to generate old blocks.
- One merge transition consensus test was disabled. With a
non-operational TD, testing how the client reacts to TTD is useless, it
cannot react.
Questions:
- Should we also drop total terminal difficulty from the genesis json?
It's a number we cannot react on any more, so maybe it would be cleaner
to get rid of even more concepts.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Lots of packages depend on eth/downloader just for the SyncMode type.
Since we have a dedicated package for eth protocol configuration, it
makes more sense to define SyncMode there, turning eth/downloader into
more of a leaf package.
* unify `staterunner` and `blockrunner` CLI flags, especially around
tracing
* added support for struct logger or json logging (although having issue
#30658)
* new --cross-check flag to validate the stateless witness collection
/ execution matches stateful
* adds support for tracing the stateless execution when a tracer is set
(to more easily debug differences)
* --human for more readable test summary
* directory or file input, so if you pass tests/spec-tests/fixtures/blockchain_tests it will execute all
blockchain tests