In this PR, two things have been fixed:
---
(a) truncate the stale beacon headers with latest snap block
Originally, b.filled is used as the indicator for deleting stale beacon headers.
This field is set only after synchronization has been scheduled, under the
assumption that the skeleton chain is already linked to the local chain.
However, the local chain can be mutated via `debug_setHead`, which may
cause `b.filled` outdated. For instance, `b.filled` refers to the last head snap block
in the last sync cycle while after `debug_setHead`, the head snap block has been
rewounded to 1.
As a result, Geth can enter an unintended loop: it repeatedly downloads
the missing beacon headers for the skeleton chain and attempts to schedule the
actual synchronization, but in the final step, all recently fetched headers are removed
by `cleanStales` due to the stale `b.filled` value.
This issue is addressed by always using the latest snap block as the indicator,
without relying on any cached value. However, note that before the skeleton
chain is linked to the local chain, the latest snap block will always be below
skeleton.tail, and this condition should not be treated as an error.
---
(b) merge the subchains once the skeleton chain links to local chain
Once the skeleton chain links with local one, it will try to schedule the
synchronization by fetching the missing blocks and import them then.
It's possible the last subchain already overwrites the previous subchain and
results in having two subchains leftover. As a result, an error log will printed
https://github.com/ethereum/go-ethereum/blob/master/eth/downloader/skeleton.go#L1074
This pull request fixes the flay test TestSkeletonSyncRetrievals. In this test, we first
trigger a sync cycle and wait for it to meet certain expectations. We then inject a new
head and potentially also a new peer, then perform another final sync. The test now
performs the newPeer addition before launching the final sync, and waits a bit for that
peer to get registered. This fixes the logic race that made the test fail sometimes.
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
* eth/downloader: remove stale beacon headers as backfilling progresses
* eth/downloader: remove leftover from a previous design
* eth/downloader: do partial beacon cleanups if chain is large
* eth/downloader: linter != heart
* eth/downloader: implement beacon sync
* eth/downloader: fix a crash if the beacon chain is reduced in length
* eth/downloader: fix beacon sync start/stop thrashing data race
* eth/downloader: use a non-nil pivot even in degenerate sync requests
* eth/downloader: don't touch internal state on beacon Head retrieval
* eth/downloader: fix spelling mistakes
* eth/downloader: fix some typos
* eth: integrate legacy/beacon sync switchover and UX
* eth: handle UX wise being stuck on post-merge TTD
* core, eth: integrate the beacon client with the beacon sync
* eth/catalyst: make some warning messages nicer
* eth/downloader: remove Ethereum 1&2 notions in favor of merge
* core/beacon, eth: clean up engine API returns a bit
* eth/downloader: add skeleton extension tests
* eth/catalyst: keep non-kiln spec, handle mining on ttd
* eth/downloader: add beacon header retrieval tests
* eth: fixed spelling, commented failing tests out
* eth/downloader: review fixes
* eth/downloader: drop peers failing to deliver beacon headers
* core/rawdb: track beacon sync data in db inspect
* eth: fix review concerns
* internal/web3ext: nit
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>