The original CrawlIterator on the discv4 path generated FINDNODE targets
by grinding random pubkeys until their Keccak256 had a specific top-N-bit
prefix matching a per-call rotation index, then sending them. The aim was
to anchor each peer's response to a different /16 region of the global
keyspace.
Empirically (3 x 5-minute runs against mainnet bootnodes):
mode total mean ± std mainnet mean ± std
fast (grind) 5714 ± 117 549 ± 33
fast-random 5306 ± 366 521 ± 124
Means are within 1σ of each other. The grind's only measurable benefit
is reduced run-to-run variance, not higher yield. For long-running
curated crawls (the production use case for cmd/devp2p) the variance
amortises away, so the simplification is worth taking.
Replace the grind with a plain crand.Read on the v4 target, drop the
randomTargetWithPrefix helper, log2Pow2 helper, and the v4-side
prefix-bit math from withDefaults. Drange becomes a v5-only knob and
its doc is updated to say so; the power-of-two requirement is gone.
discv5 is unchanged: it uses native distance rotation, not target
hashes, and was never affected by the grind.
Add an enode.Iterator that drives discovery by issuing a single
FINDNODE per discovered peer, rotating the target through Drange
sub-regions of the keyspace. Compared to RandomNodes (which wraps an
alpha=3 Kademlia lookup that converges on a single target), this
shape is geared for breadth: each peer is asked about a different
slice of the keyspace, so aggregate coverage grows quickly without
per-peer overlap.
The two protocols expose different FINDNODE primitives, so the
iterator threads a per-protocol queryFn:
* discv5 takes a list of distances natively, so we just pass
[256-d] for d in 0..Drange-1.
* discv4 takes a target NodeID and replies with the K closest. To
get an equivalent rotation, we pick a random pubkey whose
Keccak256 starts with the desired prefix nibble. With Drange=16
that's ~16 random draws per call -- negligible compared to the
network round trip.
Concurrency is bounded by Workers (default 16). There is intentionally
no rate limit: pacing is RTT-driven, ~Workers/RTT on the wire.
Termination is implicit: when the work queue is empty AND no FINDNODE
is in flight, the iterator closes its output and Next returns false.
Close() short-circuits this for callers that want to bail early.
Adapts the algorithm from github.com/cskiraly/fast-ethereum-crawler
(dcrawl.nim) -- the prefix-rotation idea -- but drops its 1000 req/s
rate limit in favour of the bounded worker pool.