d.partialSyncComplete is consulted by beaconBackfiller.resume() to skip
redundant downloader cycles after the initial partial-state sync has
finished. It was an in-memory atomic.Bool, so every process restart
reset it to false, and the next forkchoiceUpdated from the CL would
re-enter the sync loop.
Persist the flag in leveldb via a new PartialSyncComplete marker:
- Add ReadPartialSyncComplete / WritePartialSyncComplete /
DeletePartialSyncComplete accessors in core/rawdb/accessors_chain.go
backed by a single-byte value under the PartialSyncComplete key.
- Write the marker in the downloader right after AdvancePartialHead
succeeds (same spot we flip the in-memory flag).
- Rehydrate the in-memory flag from leveldb in Downloader.New() so a
freshly-started process with a completed partial-state sync keeps
the resume short-circuit active from the first beacon forkchoice.
Without this, the restart invariant relied on HasState(header.Root)
accidentally returning false to reroute the downloader back to
SnapSync; with this the resume guard is the primary protection
regardless of how header-root convergence evolves.
AdvancePartialHead's backfill loop used a strictly-greater condition, so it
wrote canonical-hash keys only for blocks above the pivot. Combined with
the Engine API path persisting the pivot via WriteBlockWithoutState (which
writes header+body but not the canonical-hash key) and InsertReceiptChain.writeLive
skipping the pivot because HasBlock already returned true, the pivot block
ended up without an H<num>n entry in leveldb. After the freezer advanced
past finalized, startup's gap check at rawdb/database.go:279 rejected the
datadir with "gap in the chain between ancients [0 - #N-1] and leveldb
[#N+1 - #head]".
Fix: explicitly write the canonical hash for currentHead at the start of
AdvancePartialHead's backfill, covering the pivot inclusively.
Also add a defensive guard in the chain-retention freezer path so that
TruncateTail never prunes past lastPivotNumber. Partial-state mode relies
on the pivot block as the anchor for state reconstruction; pruning its
body from ancients would make a future reorg spanning the pivot
unrecoverable.
Ship with a regression test that asserts AdvancePartialHead writes the
currentHead's canonical hash (covers the bug precondition directly), plus
an idempotency check and a small post-advance sanity test.
Verified end-to-end on bal-devnet-3:
- Before fix: Fatal on restart
- After fix: restart succeeds, BAL processing resumes within seconds,
verify_partial_sync_devnet3.sh passes 16/16 checks.
Upstream bal-devnet-3 replaced the CodeChange struct with raw []byte
in ConstructionAccountAccesses.CodeChanges (map[uint16][]byte). Update
the test builder accordingly so the package compiles against the
new API.
Match upstream BALStateTransition behavior: only call UpdateAccount for
accounts that were actually modified (balance, nonce, code, or storage
changes). Previously, all accounts in the BAL (including read-only ones)
were written back to the trie, which could cause root mismatches if the
re-encoded RLP differed from the original encoding.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Trim leading zeros from storage values before passing to UpdateStorage,
matching the upstream BALStateTransition behavior. UpdateStorage
RLP-encodes the value internally, so passing untrimmed 32-byte values
(e.g. [0,0,...,5]) produces different trie nodes than trimmed values
([5]), causing systematic state root mismatches on every BAL-processed
block.
BuildStateSet already correctly trimmed values for the pathdb layer;
this fix aligns the trie update path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After the second snap sync completes, AdvancePartialHead moves the head
markers forward but never initialized partialState.Root(). This caused
ProcessBlockWithBAL to fall back to the parent's header root, which
doesn't match the computed trie root from BAL processing — resulting in
a state root mismatch on the first block after sync.
Fix: call SetRoot(root) and SetLastProcessedBlock() in AdvancePartialHead
so subsequent BAL processing chains from the correct state root.
Also add diagnostic logging to ProcessBlockWithBAL for easier debugging.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Live testing on bal-devnet-2 confirmed that computed roots DO diverge from
header roots. Block 75315 computed root 0xe909c7.. vs header root
0x9acbbe.. — untracked contracts' storage roots in the local trie are from
snap sync time and differ from the actual current roots, even when the
storage root resolver successfully queries peers.
This means subsequent blocks must chain off the computed root (via
partialState.Root()), not the header root (via parent.Root()). Restore
the stateRoot field using atomic.Pointer[common.Hash] instead of the
previous sync.RWMutex for lock-free concurrent access.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Apply review fixes: BAL iterator start (Fix 2), fatal root mismatch when
all storage resolved (Fix 3), WriteBlockWithoutState error handling (Fix 4),
contract filter construction order (Fix 5), canonical hash backfill (Fix 6),
underflow guard in gap processing (Fix 8), O(n²) prepend fix (Fix 9),
ReadBALHistory corruption detection (Fix 11), incomplete resolution error
(Fix 13), RLP encode panic (Fix 14), gap processing log level (Fix 16),
TriggerPartialResync message (Fix 18), and comment accuracy fixes.
Remove the stateRoot field and sync.RWMutex from PartialState entirely.
Since partial state maintains the full account trie, the computed root
always matches the header root (assuming storage root resolution succeeds).
ProcessBlockWithBAL now derives parent root from parent.Root() directly,
matching how full nodes derive state root from currentBlock headers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix several interacting issues that prevented partial state nodes from
syncing and following the chain on bal-devnet-2:
1. Stale pivot deadlock: Replace unconditional pivot suppression with
rate-limited advances (2-minute cooldown). This prevents the restart
loop bug while allowing recovery when the initial pivot is too stale
for peers to serve.
2. Storage root resolution: Add snap-based resolver that queries peers
for untracked contracts' storage roots during BAL processing. This
lets the computed state root converge toward the header root.
3. SetCanonical for partial state: When the computed root differs from
the header root (expected when untracked contracts have unresolved
storage roots), check HasState(partialState.Root()) instead of only
HasState(block.Root()). Guard against zero root during snap sync.
4. Canonical hash backfill: AdvancePartialHead now writes canonical
hashes for all blocks between the pivot and snap head, fixing the
"final block not in canonical chain" error caused by
InsertReceiptChain skipping blocks whose bodies already exist.
5. Gap block processing: After snap sync completes, process accumulated
blocks between the sync head and chain tip using their persisted BALs
before entering steady-state chain following.
6. Computed root chaining: Use partialState.Root() (actual computed root)
as parentRoot for subsequent blocks, not the header root. This ensures
correct trie chaining when computed != header root.
Tested end-to-end on bal-devnet-2: snap sync completes, gap blocks
processed, canonical head advances at chain tip (~1 block/12s).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix the post-sync deadlock where blocks validated via BAL in newPayload
were never written to the database, causing ForkchoiceUpdated to fail
finding them and triggering infinite sync cycles.
Changes:
- Export WriteBlockWithoutState and call it after ProcessBlockWithBAL
in newPayload, so FCU can find blocks via GetBlockByHash
- Guard SetCanonical against recoverAncestors for partial state nodes
(they can't re-execute blocks, only apply BAL diffs)
- Auto-disable log indexing when partial state is enabled (no receipts)
- Fix BAL type field accesses to match upstream bal-devnet-2 types
(StorageChanges, CodeChanges, BalanceChanges, Validate signature)
- Update newPayload signature (BAL now comes from ExecutableData params)
- Add partial sync scripts and documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Geth has two independent snapshot tiers, each with its own disable
mechanism:
1. In-memory snapshot cache: controlled by SnapshotLimit (derived from
ethconfig.SnapshotCache). Setting SnapshotCache=0 disables it.
2. On-disk snapshot generator: a background goroutine in pathdb that
iterates the entire state trie to build flat key-value snapshots.
Controlled by pathdb.Config.SnapshotNoBuild.
The partial state configuration (cmd/utils/flags.go) already set
SnapshotCache=0 to disable the in-memory cache. However, SnapshotNoBuild
was never set, so pathdb.Enable() — called after snap sync completes —
still launched the background generator goroutine.
This generator immediately hits missing storage tries for untracked
contracts (whose storage was intentionally skipped during partial sync),
logs "Trie missing, snapshotting paused", and blocks forever on its
abort channel — a permanent goroutine leak with no recovery path.
Additionally, BlockChainConfig.SnapshotNoBuild was never propagated to
pathdb.Config.SnapshotNoBuild in the triedbConfig() conversion. The
field only reached the hash-scheme snapshot module (core/blockchain.go
setupSnapshot), which is already skipped for path-scheme databases. This
plumbing gap meant pathdb.Config.SnapshotNoBuild was never set in
production code — only in tests.
Fix both issues:
- Set SnapshotNoBuild=true when partial state is enabled
- Propagate BlockChainConfig.SnapshotNoBuild into pathdb.Config
Freeze the pivot header for partial state nodes to ensure stable state
sync progress:
- Suppress pivot movement in fetchHeaders() (beaconsync.go)
- Suppress pivot movement in processSnapSyncContent() (downloader.go)
- Reuse existing pivot across sync cycle restarts in syncToHead()
After initial snap sync completes, bridge the gap from pivot to HEAD:
- Import post-pivot blocks with receipts (no execution needed since
untracked contracts have empty storage tries)
- Run second state sync to download HEAD state root
- Add AdvancePartialHead to update currentBlock without re-execution
Guard the backfiller for partial state mode:
- suspend() skips Cancel() during active snap sync to prevent
constant cancel/restart cycles from beacon head updates
- resume() skips new sync cycles after partial sync completes
Add chain retention for partial state mode: only the most recent N blocks
(default 1024) retain bodies and receipts. During sync, older blocks are
skipped entirely. After sync, the freezer enforces a rolling window.
Add engine API support for Block Access Lists (EIP-7928): NewPayloadV5
accepts BAL data alongside execution payloads, enabling partial state
nodes to receive per-block storage access information from the CL.
Fix beacon backfilling failure caused by dynamic chain cutoff not
clearing the cutoff hash (which remained at the genesis hash).
Add partial state awareness to eth_call/eth_estimateGas to return clear
errors when accessing untracked contract storage.
Implement Block Access List (BAL) processing for partial statefulness
per EIP-7928. This enables nodes to update state without re-executing
transactions by applying BAL diffs directly to the trie.
Key additions:
- ApplyBALAndComputeRoot: Core BAL processing with correct commit ordering
(storage trie → account Root → account trie)
- ProcessBlockWithBAL: Blockchain-level entry point for BAL processing
- HandlePartialReorg: Chain reorganization support using BAL history
- Comprehensive test coverage (31 tests):
* Unit tests for edge cases (storage deletion, EIP-161, buildStateSet)
* Blockchain integration tests (ProcessBlockWithBAL, HandlePartialReorg)
* Both HashScheme and PathScheme coverage
Devnet Testing (2-node setup):
- Full node: dev mode with --dev.period 2, creates blocks
- Partial node: --partial-state mode, syncs via P2P
- Test results: Block sync verified, balance queries match between nodes,
state roots consistent. Database size reduction observed for partial node.
Extends ContractFilter interface with hash-based methods (ShouldSyncStorageByHash,
ShouldSyncCodeByHash) for efficient filtering during snap sync when only account
hashes are available.
Adds NewPartialStateSync() function that accepts filter callbacks to control which
accounts have their storage/code synced during healing. This prevents the healing
phase from re-syncing storage for accounts that were intentionally skipped during
initial sync.
Part of partial statefulness Phase 2.
Implements EIP-7928 BAL-based partial statefulness infrastructure:
- Add PartialStateConfig to eth/ethconfig with CLI flags
- Add ContractFilter interface in core/state/partial/
- Add BAL history database accessors in core/rawdb/
- Add PartialState and BALHistory managers
This enables nodes to track only configured contracts' storage
while maintaining full account trie integrity.
The BAL reader tracker captures access list reads at the reader level.
When statedb has an account cached the BAL tracker is not informed of
the access. This is ok during the lifetime of a transaction because you
only need to record the access the first time. It is also ok during the
lifetime of a block because BAL reads are block-level (same as statedb
caches).
Where I think the issue can rise is in the miner. Namely when building a
block, if the miner picks up a tx which fails, it drops it and picks up
another tx to include. There might be some edge case here where the
failed tx which is not included poisons the cache and a future block
which is included omits an account because it wasn't aware of the
access.
To check whether a transaction can be applied, we validate that
`blockGasLimit > txGasLimit + (cumulativeRegularGasUsed +
cumulativeStateGasUsed)`. However, the check should only be applied to
the bottleneck resource, i.e. `blockGasLimit >
max(txRegularGasUsed+cumulativeRegularGasUsed, txStateGasUsed+
cumulativeStateGasUsed)`.
The changes here break multiple tests. I am trying to determine why.
---------
Co-authored-by: qu0b <stefan@starflinger.eu>
* add method on StateReaderTracker to clear the accumulated reads
* don't factor the BAL size into the payload size during construction in the miner
* simplify miner code for constructing payloads-with-BALs via the use of aformentioned StateReaderTracker clear method
* clean up the configuration of the BAL execution mode based on the preset flag specified
CopyHeader copies all pointer-typed header fields (WithdrawalsHash,
RequestsHash, SlotNumber, etc.) but was missing the deep copy for
BlockAccessListHash added by EIP-7928. This caused the BAL hash
to be silently shared between the original and the copy, leading
to potential data races and incorrect nil-checks on copied headers.
Add persistent storage for Block Access Lists (BALs) in `core/rawdb/`.
This provides read/write/delete accessors for BALs in the active
key-value store.
---------
Co-authored-by: Jared Wasinger <j-wasinger@hotmail.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
`pool.signer.Sender(tx)` bypasses the sender cache used by types.Sender,
which can force an extra signature recovery for every promotable tx
(promotion runs frequently). Use `types.Sender(pool.signer, tx)` here to
keep sender derivation cached and consistent.