The refactor from `for el := plist.Front(); ...; el = el.Next()` to the
new `iterList` iterator in #34743 silently dropped two things needed by
resetTimeout:
1. `nextTimeout = el.Value.(*replyMatcher)` at the top of the loop. This
assignment is what gives `nextTimeout` its documented meaning ("head of
plist when timeout was last reset"), and what makes the early-return
optimization at the top of resetTimeout work. Without it, nextTimeout is
only ever written to nil, so `nextTimeout == plist.Front().Value` is
always false and the optimization is dead.
2. `nextTimeout.errc <- errClockWarp` in the clock-warp branch now reads
a stale or nil pointer. Prior to the refactor, the inner assignment kept
nextTimeout pointing at the current matcher so its errc was the right
channel to receive the errClockWarp signal. After the refactor, on first
entry into the clock-warp branch nextTimeout is nil, which panics the
UDPv4 loop goroutine with a nil pointer deref and takes discv4 down.
Re-assign `nextTimeout = p` at the head of the loop (restoring the
documented invariant) and send the clock-warp error on `p.errc` rather
than the now-stale `nextTimeout.errc`.
The clock-warp branch triggers only when the system clock jumps backward
after a deadline is assigned (deadline - time.Now() >= 2*respTimeout,
i.e. at least ~500ms backward jump), which is why this regression
slipped past CI - it is not exercised by any existing unit test, and
writing one would require plumbing a clock through the loop.
Save `el.Next()` before calling `plist.Remove(el)` so iteration
continues correctly. Previously the loop exited after removing the first
expired matcher because `Remove` invalidates the element's links.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Fixes#34108
The UDPv5 test harness (`newUDPV5Test`) uses the default `PingInterval`
of 3 seconds. When tests like `TestUDPv5_findnodeHandling` insert nodes
into the routing table via `fillTable`, the table's revalidation loop
may schedule PING packets for those nodes. Under the race detector or on
slow CI runners, the test runs long enough for revalidation to fire,
causing background pings to be written to the test pipe. The `close()`
method then finds these as unmatched packets and fails.
The fix sets `PingInterval` to a very large value in the test harness so
revalidation never fires during tests.
Verified locally: 100 iterations with `-race -count=100` pass reliably,
where previously the test would fail within ~50 iterations.
This changes the challenge resend logic again to use the existing
`ChallengeData` field of `v5wire.Whoareyou` instead of storing a second
copy of the packet in `Whoareyou.Encoded`. It's more correct this way
since `ChallengeData` is supposed to be the data that is used by the ID
verification procedure.
Also adapts the cross-client test to verify this behavior.
Follow-up to #31543
This fixes a regression introduced in #32518. In that PR, we removed the
slowdown logic that would throttle lookups when the table runs empty.
Said logic was originally added in #20389.
Usually it's fine, but there exist pathological cases, such as hive
tests, where the node can only discover one other node, so it can only
ever query that node and won't get any results. In cases like these, we
need to throttle the creation of lookups to avoid crazy CPU usage.
The lookup would add self into the replyBuffer if returned by another node.
Avoid doing that by marking self as seen.
With the changed initialization behavior of lookup, the lookupIterator needs to yield the
buffer right after creation. This fixes the smallNetConvergence test, where all results
are straight out of the local table.
Refresh is doing some lookups and thus it could block for some time. We
do not want the initializer of an iterator to block. If there is
something blocking, it should happen when calling Next.
Here, next will start a lookup, which will wait if needed (no nodes),
making sure the iterator's Next is not creating a busy loop.
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Supersedes #32470.
### What
- snap: shorten stall watchdog in `eth/protocols/snap/sync_test.go` from
1m to 10s.
- discover/v5: consolidate FINDNODE negative tests into a single
table-driven test:
- `TestUDPv5_findnodeCall_InvalidNodes` covers:
- invalid IP (unspecified `0.0.0.0`) → ignored
- low UDP port (`<=1024`) → ignored
### Why
- Addresses TODOs:
- “Make tests smaller” (reduce long 1m timeout).
- “check invalid IPs”; also cover low port per `verifyResponseNode`
rules (UDP must be >1024).
### How it’s validated
- Test-only changes; no production code touched.
- Local runs:
- `go test ./p2p/discover -count=1 -timeout=300s` → ok
- `go test ./eth/protocols/snap -count=1 -timeout=600s` → ok
- Lint:
- `go run build/ci.go lint` → 0 issues on modified files.
### Notes
- The test harness uses `enode.ValidSchemesForTesting` (which includes
the “null” scheme), so records signed with `enode.SignNull` are
signature-valid; failures here are due to IP/port validation in
`verifyResponseNode` and `netutil.CheckRelayAddr`.
- Tests are written as a single table-driven function for clarity; no
helpers or environment switching.
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
This is for the implementation of Portal Network in the Shisui client.
Their handler needs access to the node object in order to send further
calls to the requesting node. This is a breaking API change but it
should be fine, since there are basically no known users of TALKREQ
outside of Portal network.
---------
Signed-off-by: thinkAfCod <q315xia@163.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
When resending the WHOAREYOU packet, a new nonce and random IV should not
be generated. The sent packet needs to match the previously-sent one exactly
in order to make the handshake retry work.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This fixes the handshake in a scenario where the remote end sends two unknown
packets in a row. When this happens, we would previously respond to both with
a WHOAREYOU challenge, but keep only the latest sent challenge. Transmission is
assumed to be unreliable, so any client that sends two request packets simultaneously
has to be prepared to follow up on whichever request leads to a handshake. With
this fix, we force them to do the handshake that we can actually complete.
Fixes#30581
Fixes#31093
Here we add some API functions on the UDPv5 object for the purpose of implementing
the Portal Network JSON-RPC API in the shisui client.
---------
Signed-off-by: Chen Kai <281165273grape@gmail.com>
This PR modifies how the metrics library handles `Enabled`: previously,
the package `init` decided whether to serve real metrics or just
dummy-types.
This has several drawbacks:
- During pkg init, we need to determine whether metrics are enabled or
not. So we first hacked in a check if certain geth-specific
commandline-flags were enabled. Then we added a similar check for
geth-env-vars. Then we almost added a very elaborate check for
toml-config-file, plus toml parsing.
- Using "real" types and dummy types interchangeably means that
everything is hidden behind interfaces. This has a performance penalty,
and also it just adds a lot of code.
This PR removes the interface stuff, uses concrete types, and allows for
the setting of Enabled to happen later. It is still assumed that
`metrics.Enable()` is invoked early on.
The somewhat 'heavy' operations, such as ticking meters and exp-decay,
now checks the enable-flag to prevent resource leak.
The change may be large, but it's mostly pretty trivial, and from the
last time I gutted the metrics, I ensured that we have fairly good test
coverage.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Changelog: https://golangci-lint.run/product/changelog/#1610
Removes `exportloopref` (no longer needed), replaces it with
`copyloopvar` which is basically the opposite.
Also adds:
- `durationcheck`
- `gocheckcompilerdirectives`
- `reassign`
- `mirror`
- `tenv`
---------
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
This PR fixes two tests, which had a tendency to sometimes write to the `*testing.T` `log` facility after the test function had completed, which is not allowed. This PR fixes it by using waitgroups to ensure that the handler/logwriter terminates before the test exits.
closes#30505
`WriteToUDP` was never called, since `meteredUdpConn` exposed directly
all the methods from the underlying `UDPConn` interface.
This fixes the `discover/egress` metric never being updated.
## Issue
If `nextTime` has passed, but all nodes are excluded, `get` would return
`nil` and `run` would therefore not invoke `schedule`. Then, we schedule
a timer for the past, as neither `nextTime` value has been updated. This
creates a busy loop, as the timer immediately returns.
## Fix
With this PR, revalidation will be also rescheduled when all nodes are
excluded.
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
enode.Node was recently changed to store a cache of endpoint information. The IP address in the cache is a netip.Addr. I chose that type over net.IP because it is just better. netip.Addr is meant to be used as a value type. Copying it does not allocate, it can be compared with ==, and can be used as a map key.
This PR changes most uses of Node.IP() into Node.IPAddr(), which returns the cached value directly without allocating.
While there are still some public APIs left where net.IP is used, I have converted all code used internally by p2p/discover to the new types. So this does change some public Go API, but hopefully not APIs any external code actually uses.
There weren't supposed to be any semantic differences resulting from this refactoring, however it does introduce one: In package p2p/netutil we treated the 0.0.0.0/8 network (addresses 0.x.y.z) as LAN, but netip.Addr.IsPrivate() doesn't. The treatment of this particular IP address range is controversial, with some software supporting it and others not. IANA lists it as special-purpose and invalid as a destination for a long time, so I don't know why I put it into the LAN list. It has now been marked as special in p2p/netutil as well.