Two related changes to CrawlIterator:
(1) Add a file-level commentary block explaining why the iterator uses a
FIFO queue (BFS over the FINDNODE-response graph) and what it is *not*
suitable for (target-directed lookup -- use RandomNodes() / the alpha=3
lookup iterator for that). The choice was inherited from dcrawl.nim
without explicit reasoning; making it visible avoids future readers
re-deriving the survey-vs-lookup distinction.
The BFS rationale is two-fold:
- Coverage: BFS reaches every peer within N hops of the seeds in
order, so a time-bounded run produces a representative sample of the
reachable graph rather than a deep tendril through one sub-region.
- Adversarial resilience: a peer returning malicious "neighbour"
claims, dead-end peers, or eclipse-style sub-graphs cannot
monopolise the worker pool, because pending work from other branches
sits ahead of the attacker's responses in the queue. DFS would
amplify each of these attacks.
(2) Add a RandomWorkers field to CrawlOptions. Of the Workers-sized
worker pool, the first (Workers - RandomWorkers) workers pop the FIFO
front (BFS), while RandomWorkers workers pop a uniform-random queue
index via swap-and-pop (O(1)). Total worker count is unchanged.
Default RandomWorkers = Workers / 4 (4 of 16 with the default
parallelism). At this ratio:
- Cold-start cost is negligible: 12 of 16 workers still drain FIFO,
so the first ~1s of a fresh crawl behaves like pure BFS.
- 25% of pops break strict FIFO ordering, providing a mild
anti-fingerprint defence against an attacker who could otherwise
predict our processing order from the contents of their own
FINDNODE responses.
Operators can override per-run via the new --random-workers CLI flag
on `devp2p discv4 crawl` and `discv5 crawl`. Negative value forces
pure BFS; positive value selects an explicit count.
The new TestCrawlIteratorRandomWorkers covers four pop-policy
configurations (all-fifo, all-random, half-half, default) and
asserts the iterator still terminates and emits each node exactly
once in each.
Wire the new discover.CrawlIterator into devp2p discv4/discv5 crawl
behind a --mode flag (default 'lookup', i.e. existing behaviour).
devp2p discv4 crawl --mode=fast --timeout 30s nodes.json
devp2p discv5 crawl --mode=fast --timeout 30s nodes.json
Smoke test against mainnet bootnodes for 30s on a residential link
yields ~2.4x more nodes under --mode=fast (587 vs 240 in one run),
with the new per-tick LogDist log showing a much more uniform
distribution of query distances. Workers default to the existing
--parallel value (16); pacing is RTT-driven.
The 'lookup' default keeps existing behaviour byte-identical for any
operator running the saved devp2p discv4 crawl from a script.
Fixes#31093
Here we add some API functions on the UDPv5 object for the purpose of implementing
the Portal Network JSON-RPC API in the shisui client.
---------
Signed-off-by: Chen Kai <281165273grape@gmail.com>
This PR makes the tool use the --bootnodes list as the input to devp2p crawl.
The flag will take effect if the input/output.json file is missing or empty.
The new flag allows configuring an explicit endpoint which is to be
announced in the DHT. This feature was originally developed for the
discv5 wormhole experiment (#25798), but it's useful in other contexts
as well.
This change updates our urfave/cli dependency to the v2 branch of the library.
There are some Go API changes in cli v2:
- Flag values can now be accessed using the methods ctx.Bool,
ctx.Int, ctx.String, ... regardless of whether the flag is 'local' or
'global'.
- v2 has built-in support for flag categories. Our home-grown category
system is removed and the categories of flags are assigned as part of
the flag definition.
For users, there is only one observable difference with cli v2: flags must now
strictly appear before regular arguments. For example, the following command is
now invalid:
geth account import mykey.json --password file.txt
Instead, the command must be invoked as follows:
geth account import --password file.txt mykey.json
TAP is a text format for test results. Parsers for it are available in many languages,
making it easy to consume. I want TAP output from our protocol tests because the
Hive wrapper around them needs to know about the test names and their individual
results and logs. It would also be possible to just write this info as JSON, but I don't
want to invent a new format.
This also improves the normal console output for tests (when running without --tap).
It now prints -- RUN lines before any output from the test, and indents the log output
by one space.
This adds an implementation of the current discovery v5 spec.
There is full integration with cmd/devp2p and enode.Iterator in this
version. In theory we could enable the new protocol as a replacement of
discovery v4 at any time. In practice, there will likely be a few more
changes to the spec and implementation before this can happen.