handleChainHead resolves the head block via GetBlock(hash, number) so
that a stale head event after a reorg cannot credit transactions from
the wrong block. The existing mockChain ignored the hash argument, so a
regression to GetBlockByNumber would have gone undetected.
Make mockChain hash-aware: store blocks keyed by hash with a separate
canonical-by-number index for the finalization path, and have sendHead
emit the real block's hash. Add TestReorgSafety with two blocks at the
same height to exercise the hash selector directly.
The original assertions read stats["peerA"].Finalized and .RecentIncluded,
both of which return zero for a missing key — so the test would pass even
if NotifyAccepted were a complete no-op, contradicting its stated purpose.
Assert that exactly one peer entry exists, peerA is present, and the
internal txs map and FIFO order slice are populated as NotifyAccepted is
meant to do.
Rename eth/dropper/protected to eth/dropper/skipped and mark it on
every skip (fast-path headroom, all candidates un-droppable, or
protection emptied the list). Add eth/dropper/skipped_protected to
count the subset of skips where at least one otherwise-droppable
peer was kept only because of inclusion protection.
The pair lets operators see both the total churn-miss rate and how
often peer protection specifically is the cause.
When neither the dialed nor inbound peer pool is close to capacity,
every non-trusted/non-static peer is already marked do-not-drop by
the pool-threshold rules in selectDoNotDrop, so the droppable set is
guaranteed empty regardless of inclusion protection.
Return early in that case to avoid the wasted peerStatsFunc call,
per-direction split, and per-category sort in protectedPeers.
Fix goimports violations: align callback field comments in
tx_fetcher.go, sort import of eth/txtracker after eth/protocols
in handler.go. Add missing onAccepted parameter (nil) to the
txfetcher fuzzer.
Tests used time.Sleep(50ms) to wait for async chain head processing,
making the suite slow (~2s) and flaky under CI load.
Add a step channel (buffered 1) to the Tracker, sent after each
event in the loop. Tests wait on the channel with a 1s timeout.
Suite now completes in <10ms.
NotifyPeerDrop deleted t.peers[peer] but left t.txs entries pointing
to that peer. When those txs later finalized, checkFinalization
recreated the peer entry, and the EMA loop decayed it forever.
Fix: create peer entries in NotifyAccepted (when txs are first
accepted), not in handleChainHead or checkFinalization. Both chain
event handlers now skip peers with no entry — disconnected peers
whose entries were deleted by NotifyPeerDrop stay deleted.
order = order[1:] reslices without releasing the backing array.
After N total insertions the array retains N hashes (32 bytes each)
but only the last maxTracked are live. On a long-running node
processing ~100 txs/s this leaks ~275 MB/day.
Compact by copying to a fresh array when capacity exceeds 2×maxTracked.
Remove the custom topN generic function. Use slices.SortedFunc
(creates a sorted copy from an iterator) + slices.DeleteFunc (filters
score <= 0) from the standard library. No custom generics needed.
Replace peerWithStats wrapper, manual slice copying, and protectTopN
closure with a generic topN[T] function that sorts by score and
returns top elements. protectedPeers now works directly with
[]*p2p.Peer slices, building per-category score functions that close
over the stats map.
Compute the protected peer set once in dropRandomPeer via
protectedPeers(), then include protection as a condition in
selectDoNotDrop alongside trusted/static/recent checks. This
eliminates the separate filterProtectedPeers post-pass and the
awkward "all protected → skip" branch.
Rename filterProtectedPeers to protectedPeers, returning
map[*p2p.Peer]bool instead of filtering a slice. The map is
checked directly in selectDoNotDrop via protected[p].
enqueueAndTrack used pool.Has() after Enqueue to determine accepted
txs. Under concurrent delivery of the same tx from two peers, both
could see Has()==true, making attribution non-deterministic.
Add an onAccepted callback to the fetcher, called from Enqueue with
(peer, acceptedHashes) immediately after pool.Add returns for each
batch. Attribution happens atomically inside Enqueue using the per-tx
error from addTxs (nil = accepted), before another goroutine can
race.
Remove the enqueueAndTrack helper from handler_eth.go — the fetcher
now handles notification directly.
protectTopN used maxPeers (configured capacity) to compute the
number of peers to protect. With small droppable sets this could
protect everyone, permanently disabling churn.
Use len(entries) (current droppable count in each category) instead.
With 20 droppable dialed peers and 10% fraction, 2 are protected.
With 3 droppable peers, 0 are protected — churn is never blocked.
Peer stats were never pruned, so the peers map grew with every peer
ever seen. The EMA decay loop and stats copy iterated all historical
peers on every block/query.
Add NotifyPeerDrop(peer) that deletes the peer's stats entry. Called
from handler.unregisterPeer alongside txFetcher.Drop.
handleChainHead fetched the block by number only. If the tracker
goroutine lagged and that height was reorged before processing,
the EMA was computed from the wrong canonical block.
Use GetBlock(hash, number) with the header hash from the event to
fetch the exact block the event refers to, not whatever is currently
canonical at that height.
NotifyReceived was called before pool validation, allowing a peer
to claim deliverer credit by replaying already-included txs or
sending invalid packets.
Rename to NotifyAccepted (takes hashes, not full txs). Call it from
a new enqueueAndTrack helper in handler_eth.go that runs after
Enqueue and checks pool.Has to identify accepted txs. Only accepted
txs are credited to the delivering peer.
lastFinalNum started at 0, so the first checkFinalization after
startup iterated from block 1 to the current finalized head (~20M
blocks on mainnet) under the mutex, stalling the tracker and
potentially awarding bogus credit for ancient txs whose hashes
happened to match recently-received ones.
Seed lastFinalNum from chain.CurrentFinalBlock() in Start() so only
blocks finalized after startup are processed.
txtracker tests (7 tests):
- NotifyReceived: stats empty before chain events
- InclusionEMA: EMA increases on inclusion, decays on empty blocks
- Finalization: Finalized counter credited after finalization
- MultiplePeers: each peer credited for own txs only
- FirstDelivererWins: duplicate delivery ignored
- NoFinalizationCredit: no credit without finalization
- EMADecay: EMA approaches zero after 30 empty blocks
dropper tests (6 tests):
- FilterProtectedNoStats: nil stats → all droppable
- FilterProtectedEmptyStats: empty map → all droppable
- FilterProtectedTopPeer: top-scored peers removed from droppable
- FilterProtectedZeroScore: zero scores → no protection
- FilterProtectedOverlap: peer top in both categories → counted once
- FilterProtectedAllProtected: all droppable protected → empty list
Also fix: create peer entries during EMA update for peers with
inclusions in the current block (previously only created during
finalization, so EMA was not tracked before first finalization).
Expand the txtracker package doc to describe the tracking flow
(NotifyReceived → chain head → finalization → peer credit) and its
role as stats provider for the dropper.
Rewrite the dropper struct comment to document the full behavior
including the inclusion-based peer protection: two scoring categories
(total finalized + recent EMA), top 10% per pool, union of protected
sets.
Change the long-term protection category from total inclusions to
total finalized inclusions. Finalized txs are harder to game (require
actual block finality, not just inclusion) and represent confirmed
on-chain value.
The recent-inclusion EMA stays on chain head inclusions for
responsiveness — a peer delivering txs that appear in the latest
blocks gets quick protection without waiting for finalization.
The tracker now checks CurrentFinalBlock() on each chain head event
and credits delivering peers for all newly finalized blocks since
the last check.
Minimal txtracker that records which peer delivered each transaction
and credits peers when their transactions appear on chain. Provides
the PeerInclusionStats needed by the dropper's protection logic.
Design:
- NotifyReceived(peer, txs): records deliverer per tx hash (called
from handler_eth.go when tx bodies arrive via P2P)
- Subscribes to ChainHeadEvent, fetches block txs, credits the
delivering peer for each included tx
- Per-peer EMA of recent inclusions (alpha=0.05), updated every block
- LRU eviction at 262K entries to bound memory
- Mutex-based (not channel-based) for simplicity — the hot path
(NotifyReceived) is a fast map insert
Wired into the dropper via an adapter callback in backend.go that
converts txtracker.PeerStats to the dropper's PeerInclusionStats.
The dropper periodically disconnects random peers to create churn.
This was blind to peer quality. Add inclusion-based peer protection
using two categories:
1. Total inclusions: protects peers with the highest cumulative
count of delivered txs that were included on chain
2. Recent inclusions (EMA): protects peers with the best recent
inclusion rate, giving newly productive peers faster protection
Each category independently protects the top 10% of inbound and
top 10% of dialed peers. The union of both sets is protected. Only
peers with positive scores qualify.
The dropper defines its own PeerInclusionStats struct and callback
type (getPeerInclusionStatsFunc) so any stats provider (e.g. a
transaction tracker) can plug in without a package dependency. The
callback is nil by default (protection disabled until wired).
The protectionCategories slice is designed for easy extension —
adding a new category requires only appending a struct with a name,
scoring function, and protection fraction.
This PR refactors the encoding rules for `AccessListsPacket` in the wire
protocol. Specifically:
- The response is now encoded as a list of `rlp.RawValue`
- `rlp.EmptyString` is used as a placeholder for unavailable BAL objects
In this PR, the Database interface in `core/state` has been extended
with one more function:
```go
// Iteratee returns a state iteratee associated with the specified state root,
// through which the account iterator and storage iterator can be created.
Iteratee(root common.Hash) (Iteratee, error)
```
With this additional abstraction layer, the implementation details can be hidden
behind the interface. For example, state traversal can now operate directly on
the flat state for Verkle or binary trees, which do not natively support traversal.
Moreover, state dumping will now prefer using the flat state iterator as
the primary option, offering better efficiency.
Edit: this PR also fixes a tiny issue in the state dump, marshalling the
next field in the correct way.
This is a breaking change in the opcode (structLog) tracer. Several fields
will have a slight formatting difference to conform to the newly established
spec at: https://github.com/ethereum/execution-apis/pull/762. The differences
include:
- `memory`: words will have the 0x prefix. Also last word of memory will be padded to 32-bytes.
- `storage`: keys and values will have the 0x prefix.
---------
Co-authored-by: Sina M <1591639+s1na@users.noreply.github.com>
In this PR, we add support for protocol version eth/70, defined by EIP-7975.
Overall changes:
- Each response is buffered in the peer’s receipt buffer when the
`lastBlockIncomplete` field is true.
- Continued request uses the same request id of its original
request(`RequestPartialReceipts`).
- Partial responses are verified in `validateLastBlockReceipt`.
- Even if all receipts for partial blocks of the request are collected,
those partial results are not sinked to the downloader, to avoid
complexity. This assumes that partial response and buffering occur only
in exceptional cases.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR introduces a new type HistoryPolicy which captures user intent
as opposed to pruning point stored in the blockchain which persists the
actual tail of data in the database.
It is in preparation for the rolling history expiry feature.
It comes with a semantic change: if database was pruned and geth is
running without a history mode flag (or explicit keep all flag) geth
will emit a warning but continue running as opposed to stopping the
world.
`TestSubscribePendingTxHashes` hangs indefinitely because pending tx
events are permanently missed due to a race condition in
`NewPendingTransactions` (and `NewHeads`). Both handlers called their
event subscription functions (`SubscribePendingTxs`,
`SubscribeNewHeads`) inside goroutines, so the RPC handler returned the
subscription ID to the client before the filter was installed in the
event loop. When the client then sent a transaction, the event fired but
no filter existed to catch it — the event was silently lost.
- Move `SubscribePendingTxs` and `SubscribeNewHeads` calls out of
goroutines so filters are installed synchronously before the RPC
response is sent, matching the pattern already used by `Logs` and
`TransactionReceipts`
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 We'd love your input! Share your thoughts on Copilot coding agent in
our [2 minute survey](https://gh.io/copilot-coding-agent-survey).
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: s1na <1591639+s1na@users.noreply.github.com>
This PR contains two changes:
Firstly, the finalized header will be resolved from local chain if it's
not recently announced via the `engine_newPayload`.
What's more importantly is, in the downloader, originally there are two
code paths to push forward the pivot point block, one in the beacon
header fetcher (`fetchHeaders`), and another one is in the snap content
processer (`processSnapSyncContent`).
Usually if there are new blocks and local pivot block becomes stale, it
will firstly be detected by the `fetchHeaders`. `processSnapSyncContent`
is fully driven by the beacon headers and will only detect the stale pivot
block after synchronizing the corresponding chain segment. I think the
detection here is redundant and useless.
This PR fixes a regression introduced in https://github.com/ethereum/go-ethereum/pull/33836/changes
Before PR 33836, running mainnet would automatically bump the cache size
to 4GB and trigger a cache re-calculation, specifically setting the key-value
database cache to 2GB.
After PR 33836, this logic was removed, and the cache value is no longer
recomputed if no command line flags are specified. The default key-value
database cache is 512MB.
This PR bumps the default key-value database cache size alongside the
default cache size for other components (such as snapshot) accordingly.
I observed failing tests in Hive `engine-withdrawals`:
-
https://hive.ethpandaops.io/#/test/generic/1772351960-ad3e3e460605c670efe1b4f4178eb422?testnumber=146
-
https://hive.ethpandaops.io/#/test/generic/1772351960-ad3e3e460605c670efe1b4f4178eb422?testnumber=147
```shell
DEBUG (Withdrawals Fork on Block 2): NextPayloadID before getPayloadV2:
id=0x01487547e54e8abe version=1
>> engine_getPayloadV2("0x01487547e54e8abe")
<< error: {"code":-38005,"message":"Unsupported fork"}
FAIL: Expected no error on EngineGetPayloadV2: error=Unsupported fork
```
The same failure pattern occurred for Block 3.
Per Shanghai engine_getPayloadV2 spec, pre-Shanghai payloads should be
accepted via V2 and returned as ExecutionPayloadV1:
- executionPayload: ExecutionPayloadV1 | ExecutionPayloadV2
- ExecutionPayloadV1 MUST be returned if payload timestamp < Shanghai
timestamp
- ExecutionPayloadV2 MUST be returned if payload timestamp >= Shanghai
timestamp
Reference:
-
https://github.com/ethereum/execution-apis/blob/main/src/engine/shanghai.md#engine_getpayloadv2
Current implementation only allows GetPayloadV2 on the Shanghai fork
window (`[]forks.Fork{forks.Shanghai}`), so pre-Shanghai payloads are
rejected with Unsupported fork.
If my interpretation of the spec is incorrect, please let me know and I
can adjust accordingly.
---------
Co-authored-by: muzry.li <muzry.li1@ambergroup.io>
Eth currently has a flaky test, related to the tx fetcher.
The issue seems to happen when Unsubscribe is called while sub is nil.
It seems that chain.Stop() may be invoked before the loop starts in some
tests, but the exact cause is still under investigation through repeated
runs. I think this change will at least prevent the error.
With this, we are dropping support for protocol version eth/68. The only supported
version is eth/69 now. The p2p receipt encoding logic can be simplified a lot, and
processing of receipts during sync gets a little faster because we now transform
the network encoding into the database encoding directly, without decoding the
receipts first.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
fix the flaky test found in
https://ci.appveyor.com/project/ethereum/go-ethereum/builds/53601688/job/af5ccvufpm9usq39
1. increase the timeout from 3+1s to 15s, and use timer instead of
sleep(in the CI env, it may need more time to sync the 1024 blocks)
2. add `synced.Load()` to ensure the full async chain is finished
Signed-off-by: Delweng <delweng@gmail.com>
Previously, handshake timeouts were recorded as generic peer errors
instead of timeout errors. waitForHandshake passed a raw
p2p.DiscReadTimeout into markError, but markError classified errors only
via errors.Unwrap(err), which returns nil for non-wrapped errors. As a
result, the timeoutError meter was never incremented and all such
failures fell into the peerError bucket.
This change makes markError switch on the base error, using
errors.Unwrap(err) when available and falling back to the original error
otherwise. With this adjustment, p2p.DiscReadTimeout is correctly mapped
to timeoutError, while existing behaviour for the other wrapped sentinel
errors remains unchanged
---------
Co-authored-by: lightclient <lightclient@protonmail.com>