Commit graph

750 commits

Author SHA1 Message Date
Csaba Kiraly
4927e89647
p2p/enode: fix asyncfilter comment (#32823)
just finisher the sentence

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2025-10-02 17:27:35 +02:00
zzzckck
f0dc47aae3
p2p/enode: fix discovery AyncFilter deadlock on shutdown (#32572)
Description:
We found a occasionally node hang issue on BSC, I think Geth may
also have the issue, so pick the fix patch here.
The fix on BSC repo: https://github.com/bnb-chain/bsc/pull/3347

When the hang occurs, there are two routines stuck.
- routine 1: AsyncFilter(...)
On node start, it will run part of the DiscoveryV4 protocol, which could
take considerable time, here is its hang callstack:
```
goroutine 9711 [chan receive]:  // this routine was stuck on read channel: `<-f.slots`
github.com/ethereum/go-ethereum/p2p/enode.AsyncFilter.func1()
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:206 +0x125
created by github.com/ethereum/go-ethereum/p2p/enode.AsyncFilter in goroutine 1
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:192 +0x205

```

- Routine 2: Node Stop
It is the main routine to shutdown the process, but it got stuck when it
tries to shutdown the discovery components, as it tries to drain the
channel of `<-f.slots`, but the extra 1 slot will never have chance to
be resumed.
```
goroutine 11796 [chan receive]: 
github.com/ethereum/go-ethereum/p2p/enode.(*asyncFilterIter).Close.func1()
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:248 +0x5c
sync.(*Once).doSlow(0xc032a97cb8?, 0xc032a97d18?)
	sync/once.go:78 +0xab
sync.(*Once).Do(...)
	sync/once.go:69
github.com/ethereum/go-ethereum/p2p/enode.(*asyncFilterIter).Close(0xc092ff8d00?)
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:244 +0x36
github.com/ethereum/go-ethereum/p2p/enode.(*bufferIter).Close.func1()
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:299 +0x24
sync.(*Once).doSlow(0x11a175f?, 0x2bfe63e?)
	sync/once.go:78 +0xab
sync.(*Once).Do(...)
	sync/once.go:69
github.com/ethereum/go-ethereum/p2p/enode.(*bufferIter).Close(0x30?)
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:298 +0x36
github.com/ethereum/go-ethereum/p2p/enode.(*FairMix).Close(0xc0004bfea0)
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:379 +0xb7
github.com/ethereum/go-ethereum/eth.(*Ethereum).Stop(0xc000997b00)
	github.com/ethereum/go-ethereum/eth/backend.go:960 +0x4a
github.com/ethereum/go-ethereum/node.(*Node).stopServices(0xc0001362a0, {0xc012e16330, 0x1, 0xc000111410?})
	github.com/ethereum/go-ethereum/node/node.go:333 +0xb3
github.com/ethereum/go-ethereum/node.(*Node).Close(0xc0001362a0)
	github.com/ethereum/go-ethereum/node/node.go:263 +0x167
created by github.com/ethereum/go-ethereum/cmd/utils.StartNode.func1.1 in goroutine 9729
	github.com/ethereum/go-ethereum/cmd/utils/cmd.go:101 +0x78
```

The rootcause of the hang is caused by the extra 1 slot, which was
designed to make sure the routines in `AsyncFilter(...)` can be
finished. This PR fixes it by making sure the extra 1 shot can always be
resumed when node shutdown.
2025-10-02 12:43:31 +02:00
Zach Brown
f9756bb885
p2p: fix error message in test (#32804) 2025-09-30 19:30:47 +08:00
cui
64c6de7747
p2p: using testing.B.Loop (#32664) 2025-09-19 16:38:36 -06:00
Csaba Kiraly
de9fb9722b
revert to using table parameter
using it.lookup.tab inside is unsafe

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2025-09-17 09:04:41 +02:00
Csaba Kiraly
3589c0d59b
p2p/discover: expose timeout in lookupFailed
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>

# Conflicts:
#	p2p/discover/lookup.go
2025-09-16 14:03:11 +02:00
Felix Lange
0643427965 p2p/discover: continue 2025-09-12 12:50:07 +02:00
Felix Lange
68c18ede06
Update lookup.go 2025-09-12 11:34:44 +02:00
Csaba Kiraly
97afa2815b
Revert "p2p/discover: add test for lookup returning immediately"
This reverts commit 3eab4616a6.
2025-09-12 11:29:43 +02:00
Csaba Kiraly
3eab4616a6
p2p/discover: add test for lookup returning immediately
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2025-09-12 10:59:29 +02:00
Csaba Kiraly
72d3e881b3
p2p/discover: clarify lookup behavior on empty table
We have changed this behavior, better clarify in comment.

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2025-09-12 10:52:53 +02:00
Felix Lange
a9f9e0d589 p2p/discover: add imports in test 2025-09-10 20:10:51 +02:00
Felix Lange
3133fd369a p2p/discover: remove print in test 2025-09-10 20:10:51 +02:00
Felix Lange
3946708935 p2p/discover: fix two bugs in lookup iterator
The lookup would add self into the replyBuffer if returned by another node.
Avoid doing that by marking self as seen.

With the changed initialization behavior of lookup, the lookupIterator needs to yield the
buffer right after creation. This fixes the smallNetConvergence test, where all results
are straight out of the local table.
2025-09-10 20:10:51 +02:00
Felix Lange
cf0503da7c p2p/discover: track missing nodes in test 2025-09-10 20:10:51 +02:00
Felix Lange
721c8de738 p2p/discover: trigger refresh in lookupIterator 2025-09-10 20:10:51 +02:00
Felix Lange
e58e7f7927 p2p/discover: fix bug in lookup 2025-09-10 20:10:51 +02:00
Felix Lange
4ed8f5ee2b p2p/discover: improve iterator 2025-09-10 20:10:51 +02:00
Felix Lange
f4046b0cfb p2p/discover: move wait condition to lookupIterator 2025-09-10 20:10:51 +02:00
Felix Lange
f8e0e8dc55 p2p/discover: add context in waitForNodes 2025-09-10 20:10:51 +02:00
Felix Lange
46e4f0b5c1 p2p/discover: add waitForNodes 2025-09-10 20:10:51 +02:00
Csaba Kiraly
1f7f95d718
p2p/discover: remove delay from discv5 RandomNodes (#32517)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
Refresh is doing some lookups and thus it could block for some time. We
do not want the initializer of an iterator to block. If there is
something blocking, it should happen when calling Next.

Here, next will start a lookup, which will wait if needed (no nodes),
making sure the iterator's Next is not creating a busy loop.

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2025-09-10 19:51:04 +02:00
Zach Brown
2a795c14f4
all: fix problematic function name in comment (#32513)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
Fix problematic function name in comment.
Do my best to correct them all with a script to avoid spamming PRs.
2025-08-29 08:54:23 +08:00
cui
9b2e8e7ce3
p2p: use slices.Clone (#32428)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
Replaces a helper method with slices.Clone
2025-08-25 11:30:51 +02:00
Ocenka
276ed4848c
p2p/discover: add discv5 invalid findnodes result test cases (#32481)
Some checks failed
/ Linux Build (push) Has been cancelled
/ Linux Build (arm) (push) Has been cancelled
/ Windows Build (push) Has been cancelled
/ Docker Image (push) Has been cancelled
Supersedes #32470.

### What
- snap: shorten stall watchdog in `eth/protocols/snap/sync_test.go` from
1m to 10s.
- discover/v5: consolidate FINDNODE negative tests into a single
table-driven test:
  - `TestUDPv5_findnodeCall_InvalidNodes` covers:
    - invalid IP (unspecified `0.0.0.0`) → ignored
    - low UDP port (`<=1024`) → ignored

### Why
- Addresses TODOs:
  - “Make tests smaller” (reduce long 1m timeout).
- “check invalid IPs”; also cover low port per `verifyResponseNode`
rules (UDP must be >1024).

### How it’s validated
- Test-only changes; no production code touched.
- Local runs:
  - `go test ./p2p/discover -count=1 -timeout=300s` → ok
  - `go test ./eth/protocols/snap -count=1 -timeout=600s` → ok
- Lint:
  - `go run build/ci.go lint` → 0 issues on modified files.

### Notes
- The test harness uses `enode.ValidSchemesForTesting` (which includes
the “null” scheme), so records signed with `enode.SignNull` are
signature-valid; failures here are due to IP/port validation in
`verifyResponseNode` and `netutil.CheckRelayAddr`.
- Tests are written as a single table-driven function for clarity; no
helpers or environment switching.

---------

Co-authored-by: lightclient <lightclient@protonmail.com>
2025-08-22 11:44:11 -06:00
cui
f3467d1e63
p2p: remove todo comment, as it's unnecessary (#32397)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
as metioned in https://github.com/ethereum/go-ethereum/pull/32351, I
think this comment is unnecessary.
2025-08-21 15:48:46 -06:00
cui
997dff4fae
p2p: using math.MaxInt32 from go std lib (#32357)
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-08-20 16:22:21 -06:00
Klimov Sergei
62ac0e05b6
p2p: update MaxPeers comment (#32414) 2025-08-19 20:14:11 +08:00
cui
2b38daa48c
p2p: refactor to use time.Now().UnixMilli() in golang std lib (#32402) 2025-08-14 16:28:57 +08:00
cui
e979438a55
p2p/enode: use atomic.Pointer in LocalNode (#32360) 2025-08-07 15:03:18 +02:00
Micke
a7efdcbf09
p2p/rlpx: optimize XOR operation using bitutil.XORBytes (#32217)
Replace manual byte-by-byte XOR implementation with the optimized
bitutil.XORBytes function. This improves performance by using word-sized
operations on supported architectures while maintaining the same
functionality. The optimized version processes data in bulk rather than
one byte at a time

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-07-22 23:06:48 +02:00
asamuj
d7db10ddbd
eth/protocols/snap, p2p/discover: improve zero time checks (#32214)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
2025-07-15 14:20:45 +08:00
Csaba Kiraly
4bb097b7ff
eth, p2p: improve dial speed by pre-fetching dial candidates (#31944)
Some checks are pending
/ Linux Build (arm) (push) Waiting to run
/ Linux Build (push) Waiting to run
/ Docker Image (push) Waiting to run
This PR improves the speed of Disc/v4 and Disc/v5 based discovery by
adding a prefetch buffer to discovery sources, eliminating slowdowns
due to timeouts and rate mismatch between the two processes.

Since we now want to filter the discv4 nodes iterator, it is being removed
from the default discovery mix in p2p.Server. To keep backwards-compatibility,
the default unfiltered discovery iterator will be utilized by the server when
no protocol-specific discovery is configured.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-06-05 12:14:35 +02:00
Felix Lange
228803c1a2
p2p/enode: add support for naming iterator sources (#31779)
This adds support for naming the source iterators of FairMix, like so:

  mix.AddSource(enode.WithSourceName("mySource", iter))

The source that produced the latest node is returned by the new NodeSource method.
2025-05-15 14:17:58 +02:00
Csaba Kiraly
0d5de826da
p2p: add metrics for inbound connection errors (#31652)
Add metics detailing reasons we reject inbound connections for, and
reasons these connections fail during the handshake.
2025-05-07 15:34:52 +02:00
Csaba Kiraly
6928ec5d92
p2p: fix dial metrics not picking up the right error (#31621)
Our metrics related to dial errors were off. The original error was not
wrapped, so the caller function had no chance of picking it up.
Therefore the most common error, which is "TooManyPeers", was not
correctly counted.

The metrics were originally introduced in
https://github.com/ethereum/go-ethereum/pull/27621

I was thinking of various possible solutions.
- the one proposed here wraps both the new error and the origial error.
It is not a pattern we use in other parts of the code, but works. This
is maybe the smallest possible change.
- as an alternate, I could write a proper `errProtoHandshakeError` with
it's own wrapped error
- finally, I'm not even sure we need `errProtoHandshakeError`, maybe we
could just pass up the original error.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-15 20:40:30 +02:00
Csaba Kiraly
c5c75977ab
eth: add logic to drop peers randomly when saturated (#31476)
As of now, Geth disconnects peers only on protocol error or timeout,
meaning once connection slots are filled, the peerset is largely fixed.

As mentioned in https://github.com/ethereum/go-ethereum/issues/31321,
Geth should occasionally disconnect peers to ensure some churn.
What/when to disconnect could depend on:
- the state of geth (e.g. sync or not)
- current number of peers
- peer level metrics

This PR adds a very slow churn using a random drop.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-14 12:45:27 +02:00
Csaba Kiraly
ecd5c18610
p2p: better dial/serve success metrics (#31629)
Our previous success metrics gave success even if a peer disconnected
right after connection. These metrics only count peers that stayed
connected for at least 1 min. The 1 min limit is an arbitrary choice. We do
not use this for decision logic, only statistics.
2025-04-14 10:13:45 +02:00
Csaba Kiraly
a7f24c26c0
p2p/nat: fix UPnP port reset (#31566)
Make UPnP more robust

- Once a random port was mapped, we try to stick to it even if a UPnP
refresh fails. Previously we were immediately moving back to try the
default port, leading to frequent ENR changes.

- We were deleting port mappings before refresh as a possible
workaround. This created issues in some UPnP servers. The UPnP (and PMP)
specification is explicit about the refresh requirements, and delete is
clearly not needed (see
https://github.com/ethereum/go-ethereum/pull/30265#issuecomment-2766987859).
From now on we only delete when closing.

- We were trying to add port mappings only once, and then moved on to
random ports. Now we insist a bit more, so that a simple failed request
won't lead to ENR changes.

Fixes https://github.com/ethereum/go-ethereum/issues/31418

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-09 11:28:29 +02:00
Nathan Jo
ff365afc63
p2p/nat: remove forceful port mapping in upnp (#30265)
Here we are modifying the port mapping logic so that existing port
mappings will only be removed when they were previously created by geth.

The AddAnyPortMapping functionality has been adapted to work consistently
between the IGDv1 and IGDv2 backends.
2025-04-04 10:56:55 +02:00
thinkAfCod
d2176f463b
p2p/discover: pass node instead of node ID to TALKREQ handler (#31075)
This is for the implementation of Portal Network in the Shisui client.
Their handler needs access to the node object in order to send further
calls to the requesting node. This is a breaking API change but it
should be fine, since there are basically no known users of TALKREQ
outside of Portal network.

---------

Signed-off-by: thinkAfCod <q315xia@163.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-02 14:56:21 +02:00
thinkAfCod
3e4fbce034
p2p/discover: repeat exact encoding when resending WHOAREYOU packet (#31543)
When resending the WHOAREYOU packet, a new nonce and random IV should not
be generated. The sent packet needs to match the previously-sent one exactly
in order to make the handshake retry work.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-02 13:47:44 +02:00
John
1bd70ba57a
p2p/nat: improve AddMapping code (#31486)
It introduces a new variable to store the external port returned by the
addAnyPortMapping function and ensures that the correct external port is
returned even in case of an error.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-01 14:07:47 +02:00
Shude Li
4ff5093df1
all: use fmt.Appendf instead of fmt.Sprintf where possible (#31301) 2025-03-25 14:53:02 +01:00
Felix Lange
9eb610f0a9
p2p/discover: repeat WHOAREYOU challenge when handshake in progress (#31356)
This fixes the handshake in a scenario where the remote end sends two unknown
packets in a row. When this happens, we would previously respond to both with
a WHOAREYOU challenge, but keep only the latest sent challenge. Transmission is
assumed to be unreliable, so any client that sends two request packets simultaneously
has to be prepared to follow up on whichever request leads to a handshake. With
this fix, we force them to do the handshake that we can actually complete.

Fixes #30581
2025-03-20 17:11:40 +01:00
Chen Kai
5117f77af9
p2p/discover: expose discv5 functions for portal JSON-RPC interface (#31117)
Fixes #31093

Here we add some API functions on the UDPv5 object for the purpose of implementing
the Portal Network JSON-RPC API in the shisui client.

---------

Signed-off-by: Chen Kai <281165273grape@gmail.com>
2025-03-13 15:16:01 +01:00
Martin HS
767c202e47
all: drop x/exp direct dependency (#30558)
This is a not-particularly-important "cleanliness" PR. It removes the
last remnants of the `x/exp` package, where we used the `maps.Keys`
function.

The original returned the keys in a slice, but when it became 'native'
the signature changed to return an iterator, so the new idiom is
`slices.Collect(maps.Keys(theMap))`, unless of course the raw iterator
can be used instead.

In some cases, where we previously collect into slice and then sort, we
can now instead do `slices.SortXX` on the iterator instead, making the
code a bit more concise.

This PR might be _slighly_ less optimal, because the original `x/exp`
implementation allocated the slice at the correct size off the bat,
which I suppose the new code won't.

Putting it up for discussion.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-02-27 15:53:52 +01:00
Felix Lange
9e6f924671
eth: report error from setupDiscovery at startup (#31233)
I ran into this while trying to debug a discv5 thing. I tried to disable
DNS discovery using `--discovery.dns=false`, which doesn't work.
Annoyingly, geth started anyway and discarded the error silently. I
eventually found my mistake, but it took way longer than it should have.

Also including a small change to the error message for invalid DNS URLs
here. The user actually needs to see the URL to make sense of the error.
2025-02-23 17:38:32 +01:00
Felix Lange
2a81bbaa4f
p2p/nat: remove test with default servers (#31225)
The test occasionally fails when network connectivity is bad or if it
hits the wrong server. We usually don't add tests with external network
dependency so I'm removing them.

Fixes #31220
2025-02-21 10:42:54 +08:00
Felix Lange
c113e3b5b1
p2p: fix marshaling of NAT in TOML (#31192)
This fixes an issue where a nat.Interface unmarshaled from the TOML
config file could not be re-marshaled to TOML correctly.

Fixes #31183
2025-02-17 09:47:12 +01:00