Two related changes to CrawlIterator:
(1) Add a file-level commentary block explaining why the iterator uses a
FIFO queue (BFS over the FINDNODE-response graph) and what it is *not*
suitable for (target-directed lookup -- use RandomNodes() / the alpha=3
lookup iterator for that). The choice was inherited from dcrawl.nim
without explicit reasoning; making it visible avoids future readers
re-deriving the survey-vs-lookup distinction.
The BFS rationale is two-fold:
- Coverage: BFS reaches every peer within N hops of the seeds in
order, so a time-bounded run produces a representative sample of the
reachable graph rather than a deep tendril through one sub-region.
- Adversarial resilience: a peer returning malicious "neighbour"
claims, dead-end peers, or eclipse-style sub-graphs cannot
monopolise the worker pool, because pending work from other branches
sits ahead of the attacker's responses in the queue. DFS would
amplify each of these attacks.
(2) Add a RandomWorkers field to CrawlOptions. Of the Workers-sized
worker pool, the first (Workers - RandomWorkers) workers pop the FIFO
front (BFS), while RandomWorkers workers pop a uniform-random queue
index via swap-and-pop (O(1)). Total worker count is unchanged.
Default RandomWorkers = Workers / 4 (4 of 16 with the default
parallelism). At this ratio:
- Cold-start cost is negligible: 12 of 16 workers still drain FIFO,
so the first ~1s of a fresh crawl behaves like pure BFS.
- 25% of pops break strict FIFO ordering, providing a mild
anti-fingerprint defence against an attacker who could otherwise
predict our processing order from the contents of their own
FINDNODE responses.
Operators can override per-run via the new --random-workers CLI flag
on `devp2p discv4 crawl` and `discv5 crawl`. Negative value forces
pure BFS; positive value selects an explicit count.
The new TestCrawlIteratorRandomWorkers covers four pop-policy
configurations (all-fifo, all-random, half-half, default) and
asserts the iterator still terminates and emits each node exactly
once in each.
The original CrawlIterator on the discv4 path generated FINDNODE targets
by grinding random pubkeys until their Keccak256 had a specific top-N-bit
prefix matching a per-call rotation index, then sending them. The aim was
to anchor each peer's response to a different /16 region of the global
keyspace.
Empirically (3 x 5-minute runs against mainnet bootnodes):
mode total mean ± std mainnet mean ± std
fast (grind) 5714 ± 117 549 ± 33
fast-random 5306 ± 366 521 ± 124
Means are within 1σ of each other. The grind's only measurable benefit
is reduced run-to-run variance, not higher yield. For long-running
curated crawls (the production use case for cmd/devp2p) the variance
amortises away, so the simplification is worth taking.
Replace the grind with a plain crand.Read on the v4 target, drop the
randomTargetWithPrefix helper, log2Pow2 helper, and the v4-side
prefix-bit math from withDefaults. Drange becomes a v5-only knob and
its doc is updated to say so; the power-of-two requirement is gone.
discv5 is unchanged: it uses native distance rotation, not target
hashes, and was never affected by the grind.
Add an enode.Iterator that drives discovery by issuing a single
FINDNODE per discovered peer, rotating the target through Drange
sub-regions of the keyspace. Compared to RandomNodes (which wraps an
alpha=3 Kademlia lookup that converges on a single target), this
shape is geared for breadth: each peer is asked about a different
slice of the keyspace, so aggregate coverage grows quickly without
per-peer overlap.
The two protocols expose different FINDNODE primitives, so the
iterator threads a per-protocol queryFn:
* discv5 takes a list of distances natively, so we just pass
[256-d] for d in 0..Drange-1.
* discv4 takes a target NodeID and replies with the K closest. To
get an equivalent rotation, we pick a random pubkey whose
Keccak256 starts with the desired prefix nibble. With Drange=16
that's ~16 random draws per call -- negligible compared to the
network round trip.
Concurrency is bounded by Workers (default 16). There is intentionally
no rate limit: pacing is RTT-driven, ~Workers/RTT on the wire.
Termination is implicit: when the work queue is empty AND no FINDNODE
is in flight, the iterator closes its output and Next returns false.
Close() short-circuits this for callers that want to bail early.
Adapts the algorithm from github.com/cskiraly/fast-ethereum-crawler
(dcrawl.nim) -- the prefix-rotation idea -- but drops its 1000 req/s
rate limit in favour of the bounded worker pool.