The total-finalized protection category ranked peers by a monotonic
cumulative count, so a peer that had been productive in the past kept
a high score forever — even if they had since gone silent — and held
a protected slot without contributing.
Replace txtracker.PeerStats.Finalized (int64 cumulative) with
RecentFinalized (float64 EMA). On each chain head, finalization
credits accumulated over the newly-finalized range are folded into a
slow EMA (alpha=0.0001, half-life ~6930 blocks ≈ 23 hours on 12s
mainnet blocks). Peers that continue contributing keep a high score;
peers that stop decay toward zero over roughly a day.
The dropper category renames to "recent-finalized" accordingly. The
type's docstring is rewritten to describe both categories as EMAs
with different time horizons (slow finalized, fast included).
Refactors checkFinalization to return a per-peer credits map rather
than mutating state directly, so both EMAs update in the same loop
over tracked peers.
PeerInclusionStats was declared identically to txtracker.PeerStats as a
decoupling abstraction: any stats provider could implement the dropper's
callback by returning this shape. In practice there's one provider and
the two types were kept in sync by a rote copy adapter in backend.go.
Delete PeerInclusionStats, have the dropper consume txtracker.PeerStats
directly via getPeerStatsFunc. backend.go now passes
txTracker.GetAllPeerStats as the callback with no adapter.
If a second stats provider ever appears, the abstraction can come back;
until then, one fewer type and 8 fewer lines of ceremony.
The skipped_protected metric (added earlier on this branch) counted the
subset of drop skips where inclusion protection was the cause. The
signal can be inferred from rising dropSkipped rate plus the existing
"Protecting high-value peers" debug log, which wasn't worth the second
metric, the causality-check loop over the protected set, and the
baseNotDrop closure extracted solely to share the predicate.
Collapse baseNotDrop back into selectDoNotDrop and remove the metric.
dropSkipped still fires on every skip (fast-path headroom + all-filtered).
The protection feature promises top-N per inbound/dialed pool, but
every existing test constructed peers via p2p.NewPeer (which produces
no-flag peers), so all test peers landed in the dialed pool and the
per-pool split was never validated.
Extract the selection logic from protectedPeers into a pure helper
protectedPeersByPool(inbound, dialed, stats) that accepts pre-split
pools. This sidesteps the unexported p2p.connFlag types and makes the
interesting behavior directly testable. Add three tests covering:
- exact top-N selected independently in each pool
- cross-category union with overlap deduplication
- per-pool independence: top dialed peers stay protected even when
every inbound peer scores higher globally
Rename eth/dropper/protected to eth/dropper/skipped and mark it on
every skip (fast-path headroom, all candidates un-droppable, or
protection emptied the list). Add eth/dropper/skipped_protected to
count the subset of skips where at least one otherwise-droppable
peer was kept only because of inclusion protection.
The pair lets operators see both the total churn-miss rate and how
often peer protection specifically is the cause.
When neither the dialed nor inbound peer pool is close to capacity,
every non-trusted/non-static peer is already marked do-not-drop by
the pool-threshold rules in selectDoNotDrop, so the droppable set is
guaranteed empty regardless of inclusion protection.
Return early in that case to avoid the wasted peerStatsFunc call,
per-direction split, and per-category sort in protectedPeers.
Remove the custom topN generic function. Use slices.SortedFunc
(creates a sorted copy from an iterator) + slices.DeleteFunc (filters
score <= 0) from the standard library. No custom generics needed.
Replace peerWithStats wrapper, manual slice copying, and protectTopN
closure with a generic topN[T] function that sorts by score and
returns top elements. protectedPeers now works directly with
[]*p2p.Peer slices, building per-category score functions that close
over the stats map.
Compute the protected peer set once in dropRandomPeer via
protectedPeers(), then include protection as a condition in
selectDoNotDrop alongside trusted/static/recent checks. This
eliminates the separate filterProtectedPeers post-pass and the
awkward "all protected → skip" branch.
Rename filterProtectedPeers to protectedPeers, returning
map[*p2p.Peer]bool instead of filtering a slice. The map is
checked directly in selectDoNotDrop via protected[p].
protectTopN used maxPeers (configured capacity) to compute the
number of peers to protect. With small droppable sets this could
protect everyone, permanently disabling churn.
Use len(entries) (current droppable count in each category) instead.
With 20 droppable dialed peers and 10% fraction, 2 are protected.
With 3 droppable peers, 0 are protected — churn is never blocked.
Expand the txtracker package doc to describe the tracking flow
(NotifyReceived → chain head → finalization → peer credit) and its
role as stats provider for the dropper.
Rewrite the dropper struct comment to document the full behavior
including the inclusion-based peer protection: two scoring categories
(total finalized + recent EMA), top 10% per pool, union of protected
sets.
Change the long-term protection category from total inclusions to
total finalized inclusions. Finalized txs are harder to game (require
actual block finality, not just inclusion) and represent confirmed
on-chain value.
The recent-inclusion EMA stays on chain head inclusions for
responsiveness — a peer delivering txs that appear in the latest
blocks gets quick protection without waiting for finalization.
The tracker now checks CurrentFinalBlock() on each chain head event
and credits delivering peers for all newly finalized blocks since
the last check.
The dropper periodically disconnects random peers to create churn.
This was blind to peer quality. Add inclusion-based peer protection
using two categories:
1. Total inclusions: protects peers with the highest cumulative
count of delivered txs that were included on chain
2. Recent inclusions (EMA): protects peers with the best recent
inclusion rate, giving newly productive peers faster protection
Each category independently protects the top 10% of inbound and
top 10% of dialed peers. The union of both sets is protected. Only
peers with positive scores qualify.
The dropper defines its own PeerInclusionStats struct and callback
type (getPeerInclusionStatsFunc) so any stats provider (e.g. a
transaction tracker) can plug in without a package dependency. The
callback is nil by default (protection disabled until wired).
The protectionCategories slice is designed for easy extension —
adding a new category requires only appending a struct with a name,
scoring function, and protection fraction.
As of now, Geth disconnects peers only on protocol error or timeout,
meaning once connection slots are filled, the peerset is largely fixed.
As mentioned in https://github.com/ethereum/go-ethereum/issues/31321,
Geth should occasionally disconnect peers to ensure some churn.
What/when to disconnect could depend on:
- the state of geth (e.g. sync or not)
- current number of peers
- peer level metrics
This PR adds a very slow churn using a random drop.
---------
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>