Adds NotifyRequestLatency(peer, latency) and a slow per-peer EMA
(alpha=0.01, ~70-sample half-life) that the dropper will use as a
new protection signal. The first sample seeds the EMA directly so
fresh peers don't ramp up from zero. RequestSamples is exposed
alongside the EMA so consumers can apply a minimum-samples bootstrap
guard before trusting the value.
Includes design notes for the broader peerdrop-latency feature.
The total-finalized protection category ranked peers by a monotonic
cumulative count, so a peer that had been productive in the past kept
a high score forever — even if they had since gone silent — and held
a protected slot without contributing.
Replace txtracker.PeerStats.Finalized (int64 cumulative) with
RecentFinalized (float64 EMA). On each chain head, finalization
credits accumulated over the newly-finalized range are folded into a
slow EMA (alpha=0.0001, half-life ~6930 blocks ≈ 23 hours on 12s
mainnet blocks). Peers that continue contributing keep a high score;
peers that stop decay toward zero over roughly a day.
The dropper category renames to "recent-finalized" accordingly. The
type's docstring is rewritten to describe both categories as EMAs
with different time horizons (slow finalized, fast included).
Refactors checkFinalization to return a per-peer credits map rather
than mutating state directly, so both EMAs update in the same loop
over tracked peers.
PeerInclusionStats was declared identically to txtracker.PeerStats as a
decoupling abstraction: any stats provider could implement the dropper's
callback by returning this shape. In practice there's one provider and
the two types were kept in sync by a rote copy adapter in backend.go.
Delete PeerInclusionStats, have the dropper consume txtracker.PeerStats
directly via getPeerStatsFunc. backend.go now passes
txTracker.GetAllPeerStats as the callback with no adapter.
If a second stats provider ever appears, the abstraction can come back;
until then, one fewer type and 8 fewer lines of ceremony.
The skipped_protected metric (added earlier on this branch) counted the
subset of drop skips where inclusion protection was the cause. The
signal can be inferred from rising dropSkipped rate plus the existing
"Protecting high-value peers" debug log, which wasn't worth the second
metric, the causality-check loop over the protected set, and the
baseNotDrop closure extracted solely to share the predicate.
Collapse baseNotDrop back into selectDoNotDrop and remove the metric.
dropSkipped still fires on every skip (fast-path headroom + all-filtered).
The protection feature promises top-N per inbound/dialed pool, but
every existing test constructed peers via p2p.NewPeer (which produces
no-flag peers), so all test peers landed in the dialed pool and the
per-pool split was never validated.
Extract the selection logic from protectedPeers into a pure helper
protectedPeersByPool(inbound, dialed, stats) that accepts pre-split
pools. This sidesteps the unexported p2p.connFlag types and makes the
interesting behavior directly testable. Add three tests covering:
- exact top-N selected independently in each pool
- cross-category union with overlap deduplication
- per-pool independence: top dialed peers stay protected even when
every inbound peer scores higher globally
handleChainHead resolves the head block via GetBlock(hash, number) so
that a stale head event after a reorg cannot credit transactions from
the wrong block. The existing mockChain ignored the hash argument, so a
regression to GetBlockByNumber would have gone undetected.
Make mockChain hash-aware: store blocks keyed by hash with a separate
canonical-by-number index for the finalization path, and have sendHead
emit the real block's hash. Add TestReorgSafety with two blocks at the
same height to exercise the hash selector directly.
The original assertions read stats["peerA"].Finalized and .RecentIncluded,
both of which return zero for a missing key — so the test would pass even
if NotifyAccepted were a complete no-op, contradicting its stated purpose.
Assert that exactly one peer entry exists, peerA is present, and the
internal txs map and FIFO order slice are populated as NotifyAccepted is
meant to do.
Rename eth/dropper/protected to eth/dropper/skipped and mark it on
every skip (fast-path headroom, all candidates un-droppable, or
protection emptied the list). Add eth/dropper/skipped_protected to
count the subset of skips where at least one otherwise-droppable
peer was kept only because of inclusion protection.
The pair lets operators see both the total churn-miss rate and how
often peer protection specifically is the cause.
When neither the dialed nor inbound peer pool is close to capacity,
every non-trusted/non-static peer is already marked do-not-drop by
the pool-threshold rules in selectDoNotDrop, so the droppable set is
guaranteed empty regardless of inclusion protection.
Return early in that case to avoid the wasted peerStatsFunc call,
per-direction split, and per-category sort in protectedPeers.
Fix goimports violations: align callback field comments in
tx_fetcher.go, sort import of eth/txtracker after eth/protocols
in handler.go. Add missing onAccepted parameter (nil) to the
txfetcher fuzzer.
Tests used time.Sleep(50ms) to wait for async chain head processing,
making the suite slow (~2s) and flaky under CI load.
Add a step channel (buffered 1) to the Tracker, sent after each
event in the loop. Tests wait on the channel with a 1s timeout.
Suite now completes in <10ms.
NotifyPeerDrop deleted t.peers[peer] but left t.txs entries pointing
to that peer. When those txs later finalized, checkFinalization
recreated the peer entry, and the EMA loop decayed it forever.
Fix: create peer entries in NotifyAccepted (when txs are first
accepted), not in handleChainHead or checkFinalization. Both chain
event handlers now skip peers with no entry — disconnected peers
whose entries were deleted by NotifyPeerDrop stay deleted.
order = order[1:] reslices without releasing the backing array.
After N total insertions the array retains N hashes (32 bytes each)
but only the last maxTracked are live. On a long-running node
processing ~100 txs/s this leaks ~275 MB/day.
Compact by copying to a fresh array when capacity exceeds 2×maxTracked.
Remove the custom topN generic function. Use slices.SortedFunc
(creates a sorted copy from an iterator) + slices.DeleteFunc (filters
score <= 0) from the standard library. No custom generics needed.
Replace peerWithStats wrapper, manual slice copying, and protectTopN
closure with a generic topN[T] function that sorts by score and
returns top elements. protectedPeers now works directly with
[]*p2p.Peer slices, building per-category score functions that close
over the stats map.
Compute the protected peer set once in dropRandomPeer via
protectedPeers(), then include protection as a condition in
selectDoNotDrop alongside trusted/static/recent checks. This
eliminates the separate filterProtectedPeers post-pass and the
awkward "all protected → skip" branch.
Rename filterProtectedPeers to protectedPeers, returning
map[*p2p.Peer]bool instead of filtering a slice. The map is
checked directly in selectDoNotDrop via protected[p].
enqueueAndTrack used pool.Has() after Enqueue to determine accepted
txs. Under concurrent delivery of the same tx from two peers, both
could see Has()==true, making attribution non-deterministic.
Add an onAccepted callback to the fetcher, called from Enqueue with
(peer, acceptedHashes) immediately after pool.Add returns for each
batch. Attribution happens atomically inside Enqueue using the per-tx
error from addTxs (nil = accepted), before another goroutine can
race.
Remove the enqueueAndTrack helper from handler_eth.go — the fetcher
now handles notification directly.
protectTopN used maxPeers (configured capacity) to compute the
number of peers to protect. With small droppable sets this could
protect everyone, permanently disabling churn.
Use len(entries) (current droppable count in each category) instead.
With 20 droppable dialed peers and 10% fraction, 2 are protected.
With 3 droppable peers, 0 are protected — churn is never blocked.
Peer stats were never pruned, so the peers map grew with every peer
ever seen. The EMA decay loop and stats copy iterated all historical
peers on every block/query.
Add NotifyPeerDrop(peer) that deletes the peer's stats entry. Called
from handler.unregisterPeer alongside txFetcher.Drop.
handleChainHead fetched the block by number only. If the tracker
goroutine lagged and that height was reorged before processing,
the EMA was computed from the wrong canonical block.
Use GetBlock(hash, number) with the header hash from the event to
fetch the exact block the event refers to, not whatever is currently
canonical at that height.
NotifyReceived was called before pool validation, allowing a peer
to claim deliverer credit by replaying already-included txs or
sending invalid packets.
Rename to NotifyAccepted (takes hashes, not full txs). Call it from
a new enqueueAndTrack helper in handler_eth.go that runs after
Enqueue and checks pool.Has to identify accepted txs. Only accepted
txs are credited to the delivering peer.
lastFinalNum started at 0, so the first checkFinalization after
startup iterated from block 1 to the current finalized head (~20M
blocks on mainnet) under the mutex, stalling the tracker and
potentially awarding bogus credit for ancient txs whose hashes
happened to match recently-received ones.
Seed lastFinalNum from chain.CurrentFinalBlock() in Start() so only
blocks finalized after startup are processed.
txtracker tests (7 tests):
- NotifyReceived: stats empty before chain events
- InclusionEMA: EMA increases on inclusion, decays on empty blocks
- Finalization: Finalized counter credited after finalization
- MultiplePeers: each peer credited for own txs only
- FirstDelivererWins: duplicate delivery ignored
- NoFinalizationCredit: no credit without finalization
- EMADecay: EMA approaches zero after 30 empty blocks
dropper tests (6 tests):
- FilterProtectedNoStats: nil stats → all droppable
- FilterProtectedEmptyStats: empty map → all droppable
- FilterProtectedTopPeer: top-scored peers removed from droppable
- FilterProtectedZeroScore: zero scores → no protection
- FilterProtectedOverlap: peer top in both categories → counted once
- FilterProtectedAllProtected: all droppable protected → empty list
Also fix: create peer entries during EMA update for peers with
inclusions in the current block (previously only created during
finalization, so EMA was not tracked before first finalization).
Expand the txtracker package doc to describe the tracking flow
(NotifyReceived → chain head → finalization → peer credit) and its
role as stats provider for the dropper.
Rewrite the dropper struct comment to document the full behavior
including the inclusion-based peer protection: two scoring categories
(total finalized + recent EMA), top 10% per pool, union of protected
sets.
Change the long-term protection category from total inclusions to
total finalized inclusions. Finalized txs are harder to game (require
actual block finality, not just inclusion) and represent confirmed
on-chain value.
The recent-inclusion EMA stays on chain head inclusions for
responsiveness — a peer delivering txs that appear in the latest
blocks gets quick protection without waiting for finalization.
The tracker now checks CurrentFinalBlock() on each chain head event
and credits delivering peers for all newly finalized blocks since
the last check.
Minimal txtracker that records which peer delivered each transaction
and credits peers when their transactions appear on chain. Provides
the PeerInclusionStats needed by the dropper's protection logic.
Design:
- NotifyReceived(peer, txs): records deliverer per tx hash (called
from handler_eth.go when tx bodies arrive via P2P)
- Subscribes to ChainHeadEvent, fetches block txs, credits the
delivering peer for each included tx
- Per-peer EMA of recent inclusions (alpha=0.05), updated every block
- LRU eviction at 262K entries to bound memory
- Mutex-based (not channel-based) for simplicity — the hot path
(NotifyReceived) is a fast map insert
Wired into the dropper via an adapter callback in backend.go that
converts txtracker.PeerStats to the dropper's PeerInclusionStats.
The dropper periodically disconnects random peers to create churn.
This was blind to peer quality. Add inclusion-based peer protection
using two categories:
1. Total inclusions: protects peers with the highest cumulative
count of delivered txs that were included on chain
2. Recent inclusions (EMA): protects peers with the best recent
inclusion rate, giving newly productive peers faster protection
Each category independently protects the top 10% of inbound and
top 10% of dialed peers. The union of both sets is protected. Only
peers with positive scores qualify.
The dropper defines its own PeerInclusionStats struct and callback
type (getPeerInclusionStatsFunc) so any stats provider (e.g. a
transaction tracker) can plug in without a package dependency. The
callback is nil by default (protection disabled until wired).
The protectionCategories slice is designed for easy extension —
adding a new category requires only appending a struct with a name,
scoring function, and protection fraction.
This PR refactors the encoding rules for `AccessListsPacket` in the wire
protocol. Specifically:
- The response is now encoded as a list of `rlp.RawValue`
- `rlp.EmptyString` is used as a placeholder for unavailable BAL objects
In this PR, the Database interface in `core/state` has been extended
with one more function:
```go
// Iteratee returns a state iteratee associated with the specified state root,
// through which the account iterator and storage iterator can be created.
Iteratee(root common.Hash) (Iteratee, error)
```
With this additional abstraction layer, the implementation details can be hidden
behind the interface. For example, state traversal can now operate directly on
the flat state for Verkle or binary trees, which do not natively support traversal.
Moreover, state dumping will now prefer using the flat state iterator as
the primary option, offering better efficiency.
Edit: this PR also fixes a tiny issue in the state dump, marshalling the
next field in the correct way.
This is a breaking change in the opcode (structLog) tracer. Several fields
will have a slight formatting difference to conform to the newly established
spec at: https://github.com/ethereum/execution-apis/pull/762. The differences
include:
- `memory`: words will have the 0x prefix. Also last word of memory will be padded to 32-bytes.
- `storage`: keys and values will have the 0x prefix.
---------
Co-authored-by: Sina M <1591639+s1na@users.noreply.github.com>
In this PR, we add support for protocol version eth/70, defined by EIP-7975.
Overall changes:
- Each response is buffered in the peer’s receipt buffer when the
`lastBlockIncomplete` field is true.
- Continued request uses the same request id of its original
request(`RequestPartialReceipts`).
- Partial responses are verified in `validateLastBlockReceipt`.
- Even if all receipts for partial blocks of the request are collected,
those partial results are not sinked to the downloader, to avoid
complexity. This assumes that partial response and buffering occur only
in exceptional cases.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR introduces a new type HistoryPolicy which captures user intent
as opposed to pruning point stored in the blockchain which persists the
actual tail of data in the database.
It is in preparation for the rolling history expiry feature.
It comes with a semantic change: if database was pruned and geth is
running without a history mode flag (or explicit keep all flag) geth
will emit a warning but continue running as opposed to stopping the
world.
`TestSubscribePendingTxHashes` hangs indefinitely because pending tx
events are permanently missed due to a race condition in
`NewPendingTransactions` (and `NewHeads`). Both handlers called their
event subscription functions (`SubscribePendingTxs`,
`SubscribeNewHeads`) inside goroutines, so the RPC handler returned the
subscription ID to the client before the filter was installed in the
event loop. When the client then sent a transaction, the event fired but
no filter existed to catch it — the event was silently lost.
- Move `SubscribePendingTxs` and `SubscribeNewHeads` calls out of
goroutines so filters are installed synchronously before the RPC
response is sent, matching the pattern already used by `Logs` and
`TransactionReceipts`
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 We'd love your input! Share your thoughts on Copilot coding agent in
our [2 minute survey](https://gh.io/copilot-coding-agent-survey).
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: s1na <1591639+s1na@users.noreply.github.com>
This PR contains two changes:
Firstly, the finalized header will be resolved from local chain if it's
not recently announced via the `engine_newPayload`.
What's more importantly is, in the downloader, originally there are two
code paths to push forward the pivot point block, one in the beacon
header fetcher (`fetchHeaders`), and another one is in the snap content
processer (`processSnapSyncContent`).
Usually if there are new blocks and local pivot block becomes stale, it
will firstly be detected by the `fetchHeaders`. `processSnapSyncContent`
is fully driven by the beacon headers and will only detect the stale pivot
block after synchronizing the corresponding chain segment. I think the
detection here is redundant and useless.
This PR fixes a regression introduced in https://github.com/ethereum/go-ethereum/pull/33836/changes
Before PR 33836, running mainnet would automatically bump the cache size
to 4GB and trigger a cache re-calculation, specifically setting the key-value
database cache to 2GB.
After PR 33836, this logic was removed, and the cache value is no longer
recomputed if no command line flags are specified. The default key-value
database cache is 512MB.
This PR bumps the default key-value database cache size alongside the
default cache size for other components (such as snapshot) accordingly.
I observed failing tests in Hive `engine-withdrawals`:
-
https://hive.ethpandaops.io/#/test/generic/1772351960-ad3e3e460605c670efe1b4f4178eb422?testnumber=146
-
https://hive.ethpandaops.io/#/test/generic/1772351960-ad3e3e460605c670efe1b4f4178eb422?testnumber=147
```shell
DEBUG (Withdrawals Fork on Block 2): NextPayloadID before getPayloadV2:
id=0x01487547e54e8abe version=1
>> engine_getPayloadV2("0x01487547e54e8abe")
<< error: {"code":-38005,"message":"Unsupported fork"}
FAIL: Expected no error on EngineGetPayloadV2: error=Unsupported fork
```
The same failure pattern occurred for Block 3.
Per Shanghai engine_getPayloadV2 spec, pre-Shanghai payloads should be
accepted via V2 and returned as ExecutionPayloadV1:
- executionPayload: ExecutionPayloadV1 | ExecutionPayloadV2
- ExecutionPayloadV1 MUST be returned if payload timestamp < Shanghai
timestamp
- ExecutionPayloadV2 MUST be returned if payload timestamp >= Shanghai
timestamp
Reference:
-
https://github.com/ethereum/execution-apis/blob/main/src/engine/shanghai.md#engine_getpayloadv2
Current implementation only allows GetPayloadV2 on the Shanghai fork
window (`[]forks.Fork{forks.Shanghai}`), so pre-Shanghai payloads are
rejected with Unsupported fork.
If my interpretation of the spec is incorrect, please let me know and I
can adjust accordingly.
---------
Co-authored-by: muzry.li <muzry.li1@ambergroup.io>
Eth currently has a flaky test, related to the tx fetcher.
The issue seems to happen when Unsubscribe is called while sub is nil.
It seems that chain.Stop() may be invoked before the loop starts in some
tests, but the exact cause is still under investigation through repeated
runs. I think this change will at least prevent the error.