Two related changes to CrawlIterator:
(1) Add a file-level commentary block explaining why the iterator uses a
FIFO queue (BFS over the FINDNODE-response graph) and what it is *not*
suitable for (target-directed lookup -- use RandomNodes() / the alpha=3
lookup iterator for that). The choice was inherited from dcrawl.nim
without explicit reasoning; making it visible avoids future readers
re-deriving the survey-vs-lookup distinction.
The BFS rationale is two-fold:
- Coverage: BFS reaches every peer within N hops of the seeds in
order, so a time-bounded run produces a representative sample of the
reachable graph rather than a deep tendril through one sub-region.
- Adversarial resilience: a peer returning malicious "neighbour"
claims, dead-end peers, or eclipse-style sub-graphs cannot
monopolise the worker pool, because pending work from other branches
sits ahead of the attacker's responses in the queue. DFS would
amplify each of these attacks.
(2) Add a RandomWorkers field to CrawlOptions. Of the Workers-sized
worker pool, the first (Workers - RandomWorkers) workers pop the FIFO
front (BFS), while RandomWorkers workers pop a uniform-random queue
index via swap-and-pop (O(1)). Total worker count is unchanged.
Default RandomWorkers = Workers / 4 (4 of 16 with the default
parallelism). At this ratio:
- Cold-start cost is negligible: 12 of 16 workers still drain FIFO,
so the first ~1s of a fresh crawl behaves like pure BFS.
- 25% of pops break strict FIFO ordering, providing a mild
anti-fingerprint defence against an attacker who could otherwise
predict our processing order from the contents of their own
FINDNODE responses.
Operators can override per-run via the new --random-workers CLI flag
on `devp2p discv4 crawl` and `discv5 crawl`. Negative value forces
pure BFS; positive value selects an explicit count.
The new TestCrawlIteratorRandomWorkers covers four pop-policy
configurations (all-fifo, all-random, half-half, default) and
asserts the iterator still terminates and emits each node exactly
once in each.
Wire the new discover.CrawlIterator into devp2p discv4/discv5 crawl
behind a --mode flag (default 'lookup', i.e. existing behaviour).
devp2p discv4 crawl --mode=fast --timeout 30s nodes.json
devp2p discv5 crawl --mode=fast --timeout 30s nodes.json
Smoke test against mainnet bootnodes for 30s on a residential link
yields ~2.4x more nodes under --mode=fast (587 vs 240 in one run),
with the new per-tick LogDist log showing a much more uniform
distribution of query distances. Workers default to the existing
--parallel value (16); pacing is RTT-driven.
The 'lookup' default keeps existing behaviour byte-identical for any
operator running the saved devp2p discv4 crawl from a script.
Fixes#31093
Here we add some API functions on the UDPv5 object for the purpose of implementing
the Portal Network JSON-RPC API in the shisui client.
---------
Signed-off-by: Chen Kai <281165273grape@gmail.com>
Node discovery periodically revalidates the nodes in its table by sending PING, checking
if they are still alive. I recently noticed some issues with the implementation of this
process, which can cause strange results such as nodes dropping unexpectedly, certain
nodes not getting revalidated often enough, and bad results being returned to incoming
FINDNODE queries.
In this change, the revalidation process is improved with the following logic:
- We maintain two 'revalidation lists' containing the table nodes, named 'fast' and 'slow'.
- The process chooses random nodes from each list on a randomized interval, the interval being
faster for the 'fast' list, and performs revalidation for the chosen node.
- Whenever a node is newly inserted into the table, it goes into the 'fast' list.
Once validation passes, it transfers to the 'slow' list. If a request fails, or the
node changes endpoint, it transfers back into 'fast'.
- livenessChecks is incremented by one for successful checks. Unlike the old implementation,
we will not drop the node on the first failing check. We instead quickly decay the
livenessChecks give it another chance.
- Order of nodes in bucket doesn't matter anymore.
I am also adding a debug API endpoint to dump the node table content.
Co-authored-by: Martin HS <martin@swende.se>
This PR makes the tool use the --bootnodes list as the input to devp2p crawl.
The flag will take effect if the input/output.json file is missing or empty.
The new flag allows configuring an explicit endpoint which is to be
announced in the DHT. This feature was originally developed for the
discv5 wormhole experiment (#25798), but it's useful in other contexts
as well.
This change updates our urfave/cli dependency to the v2 branch of the library.
There are some Go API changes in cli v2:
- Flag values can now be accessed using the methods ctx.Bool,
ctx.Int, ctx.String, ... regardless of whether the flag is 'local' or
'global'.
- v2 has built-in support for flag categories. Our home-grown category
system is removed and the categories of flags are assigned as part of
the flag definition.
For users, there is only one observable difference with cli v2: flags must now
strictly appear before regular arguments. For example, the following command is
now invalid:
geth account import mykey.json --password file.txt
Instead, the command must be invoked as follows:
geth account import --password file.txt mykey.json
TAP is a text format for test results. Parsers for it are available in many languages,
making it easy to consume. I want TAP output from our protocol tests because the
Hive wrapper around them needs to know about the test names and their individual
results and logs. It would also be possible to just write this info as JSON, but I don't
want to invent a new format.
This also improves the normal console output for tests (when running without --tap).
It now prints -- RUN lines before any output from the test, and indents the log output
by one space.
This adds a test suite for discovery v4. The test suite is a port of the Hive suite for
discovery, and will replace the current suite on Hive soon-ish. The tests can be
run locally with this command:
devp2p discv4 test -remote enode//...
Co-authored-by: Felix Lange <fjl@twurst.com>
This adds an implementation of the current discovery v5 spec.
There is full integration with cmd/devp2p and enode.Iterator in this
version. In theory we could enable the new protocol as a replacement of
discovery v4 at any time. In practice, there will likely be a few more
changes to the spec and implementation before this can happen.
This adds an implementation of node discovery via DNS TXT records to the
go-ethereum library. The implementation doesn't match EIP-1459 exactly,
the main difference being that this implementation uses separate merkle
trees for tree links and ENRs. The EIP will be updated to match p2p/dnsdisc.
To maintain DNS trees, cmd/devp2p provides a frontend for the p2p/dnsdisc
library. The new 'dns' subcommands can be used to create, sign and deploy DNS
discovery trees.
* p2p/discover: export Ping and RequestENR
These two are useful for checking the status of a node.
* cmd/devp2p: add devp2p debug tool
This is a new tool for debugging p2p issues. It supports a few
basic tasks for now, but many more things can and will be added
in the near future.
devp2p enrdump -- prints ENRs readably
devp2p discv4 ping -- checks if a node is up
devp2p discv4 requestenr -- gets a node's record
devp2p discv4 resolve -- finds a node through the DHT