Commit graph

765 commits

Author SHA1 Message Date
Felix Lange
00cbd2e6f4
p2p/discover/v5wire: use Whoareyou.ChallengeData instead of storing encoded packet (#31547)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Keeper Build (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
This changes the challenge resend logic again to use the existing
`ChallengeData` field of `v5wire.Whoareyou` instead of storing a second
copy of the packet in `Whoareyou.Encoded`. It's more correct this way
since `ChallengeData` is supposed to be the data that is used by the ID
verification procedure.

Also adapts the cross-client test to verify this behavior.

Follow-up to #31543
2026-02-22 21:58:47 +01:00
Felix Lange
0cba803fba
eth/protocols/eth, eth/protocols/snap: delayed p2p message decoding (#33835)
Some checks failed
/ Linux Build (push) Has been cancelled
/ Linux Build (arm) (push) Has been cancelled
/ Keeper Build (push) Has been cancelled
/ Windows Build (push) Has been cancelled
/ Docker Image (push) Has been cancelled
This changes the p2p protocol handlers to delay message decoding. It's
the first part of a larger change that will delay decoding all the way
through message processing. For responses, we delay the decoding until
it is confirmed that the response matches an active request and does not
exceed its limits.

In order to make this work, all messages have been changed to use
rlp.RawList instead of a slice of the decoded item type. For block
bodies specifically, the decoding has been delayed all the way until
after verification of the response hash.

The role of p2p/tracker.Tracker changes significantly in this PR. The
Tracker's original purpose was to maintain metrics about requests and
responses in the peer-to-peer protocols. Each protocol maintained a
single global Tracker instance. As of this change, the Tracker is now
always active (regardless of metrics collection), and there is a
separate instance of it for each peer. Whenever a response arrives, it
is first verified that a request exists for it in the tracker. The
tracker is also the place where limits are kept.
2026-02-15 21:21:16 +08:00
Felix Lange
8e1de223ad
crypto/keccak: vendor in golang.org/x/crypto/sha3 (#33323)
The upstream libray has removed the assembly-based implementation of
keccak. We need to maintain our own library to avoid a peformance
regression.

---------

Co-authored-by: lightclient <lightclient@protonmail.com>
2026-02-03 14:55:27 -07:00
fengjian
c974722dc0
crypto/ecies: fix ECIES invalid-curve handling (#33669)
Some checks are pending
/ Docker Image (push) Waiting to run
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Keeper Build (push) Waiting to run
/ Windows Build (push) Waiting to run
Fix ECIES invalid-curve handling in RLPx handshake (reject invalid
ephemeral pubkeys early)
- Add curve validation in crypto/ecies.GenerateShared to reject invalid
public keys before ECDH.
- Update RLPx PoC test to assert invalid curve points fail with
ErrInvalidPublicKey.
 
Motivation / Context
RLPx handshake uses ECIES decryption on unauthenticated network input.
Prior to this change, an invalid-curve ephemeral public key would
proceed into ECDH and only fail at MAC verification, returning
ErrInvalidMessage. This allows an oracle on decrypt success/failure and
leaves the code path vulnerable to invalid-curve/small-subgroup attacks.
The fix enforces IsOnCurve validation up front.
2026-01-29 10:56:12 +01:00
kurahin
13a8798fa3
p2p/tracker: fix head detection in Fulfil to avoid unnecessary timer reschedules (#33370) 2025-12-10 16:09:07 +08:00
cui
31f9c9ff75
common/bitutil: deprecate XORBytes in favor of stdlib crypto/subtle (#33331)
XORBytes was added to package crypto/subtle in Go 1.20, and it's faster 
than our bitutil.XORBytes. There is only one use of this function
across go-ethereum so we can simply deprecate the custom implementation.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-12-08 17:40:59 +01:00
Snezhkko
af47d9b472
p2p/nat: fix err shadowing in UPnP addAnyPortMapping (#33355)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Keeper Build (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
The random-port retry loop in addAnyPortMapping shadowed the err
variable, causing the function to return (0, nil) when all attempts
failed. This change removes the shadowing and preserves the last error
across both the fixed-port and random-port retries, ensuring failures
are reported to callers correctly.
2025-12-08 15:02:24 +01:00
oxBoni
1468331f9d
p2p/discover/v5wire: remove redundant bytes clone in WHOAREYOU encoding (#33180)
head.AuthData is assigned later in the function, so the earlier assignment
can safely be removed.
2025-11-26 15:34:11 +01:00
Delweng
5dd0fe2f53
p2p: cleanup v4 if v5 failed (#33005)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Keeper Build (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
Clean the previous resource (v4) if the latter (v5) failed.
2025-10-29 10:34:19 +01:00
Delweng
2bb3d9a330
p2p: silence on listener shutdown (#33001)
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-10-23 10:44:54 +02:00
Felix Lange
7c107c2691
p2p/discover: remove hot-spin in table refresh trigger (#32912)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
This fixes a regression introduced in #32518. In that PR, we removed the
slowdown logic that would throttle lookups when the table runs empty.
Said logic was originally added in #20389.

Usually it's fine, but there exist pathological cases, such as hive
tests, where the node can only discover one other node, so it can only
ever query that node and won't get any results. In cases like these, we
need to throttle the creation of lookups to avoid crazy CPU usage.
2025-10-15 11:51:33 +02:00
Delweng
6337577434
p2p/discover: wait for bootstrap to be done (#32881)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
This ensures the node is ready to accept other nodes into the
table before it is used in a test.

Closes #32863
2025-10-13 19:58:50 +02:00
cui
b87581f297
p2p/enode: optimize DistCmp (#32888)
This speeds up DistCmp by 75% through using 64-bit operations instead of
byte-wise XOR.
2025-10-13 16:16:07 +02:00
cui
5c6ba6b400
p2p/enode: optimize LogDist (#32887)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
This speeds up LogDist by 75% using 64-bit operations instead
of byte-wise XOR.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-10-13 14:00:43 +02:00
Delweng
85e9977fae
p2p: rm unused var seedMinTableTime (#32876) 2025-10-13 16:40:08 +08:00
Csaba Kiraly
4927e89647
p2p/enode: fix asyncfilter comment (#32823)
just finisher the sentence

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2025-10-02 17:27:35 +02:00
zzzckck
f0dc47aae3
p2p/enode: fix discovery AyncFilter deadlock on shutdown (#32572)
Description:
We found a occasionally node hang issue on BSC, I think Geth may
also have the issue, so pick the fix patch here.
The fix on BSC repo: https://github.com/bnb-chain/bsc/pull/3347

When the hang occurs, there are two routines stuck.
- routine 1: AsyncFilter(...)
On node start, it will run part of the DiscoveryV4 protocol, which could
take considerable time, here is its hang callstack:
```
goroutine 9711 [chan receive]:  // this routine was stuck on read channel: `<-f.slots`
github.com/ethereum/go-ethereum/p2p/enode.AsyncFilter.func1()
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:206 +0x125
created by github.com/ethereum/go-ethereum/p2p/enode.AsyncFilter in goroutine 1
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:192 +0x205

```

- Routine 2: Node Stop
It is the main routine to shutdown the process, but it got stuck when it
tries to shutdown the discovery components, as it tries to drain the
channel of `<-f.slots`, but the extra 1 slot will never have chance to
be resumed.
```
goroutine 11796 [chan receive]: 
github.com/ethereum/go-ethereum/p2p/enode.(*asyncFilterIter).Close.func1()
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:248 +0x5c
sync.(*Once).doSlow(0xc032a97cb8?, 0xc032a97d18?)
	sync/once.go:78 +0xab
sync.(*Once).Do(...)
	sync/once.go:69
github.com/ethereum/go-ethereum/p2p/enode.(*asyncFilterIter).Close(0xc092ff8d00?)
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:244 +0x36
github.com/ethereum/go-ethereum/p2p/enode.(*bufferIter).Close.func1()
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:299 +0x24
sync.(*Once).doSlow(0x11a175f?, 0x2bfe63e?)
	sync/once.go:78 +0xab
sync.(*Once).Do(...)
	sync/once.go:69
github.com/ethereum/go-ethereum/p2p/enode.(*bufferIter).Close(0x30?)
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:298 +0x36
github.com/ethereum/go-ethereum/p2p/enode.(*FairMix).Close(0xc0004bfea0)
	github.com/ethereum/go-ethereum/p2p/enode/iter.go:379 +0xb7
github.com/ethereum/go-ethereum/eth.(*Ethereum).Stop(0xc000997b00)
	github.com/ethereum/go-ethereum/eth/backend.go:960 +0x4a
github.com/ethereum/go-ethereum/node.(*Node).stopServices(0xc0001362a0, {0xc012e16330, 0x1, 0xc000111410?})
	github.com/ethereum/go-ethereum/node/node.go:333 +0xb3
github.com/ethereum/go-ethereum/node.(*Node).Close(0xc0001362a0)
	github.com/ethereum/go-ethereum/node/node.go:263 +0x167
created by github.com/ethereum/go-ethereum/cmd/utils.StartNode.func1.1 in goroutine 9729
	github.com/ethereum/go-ethereum/cmd/utils/cmd.go:101 +0x78
```

The rootcause of the hang is caused by the extra 1 slot, which was
designed to make sure the routines in `AsyncFilter(...)` can be
finished. This PR fixes it by making sure the extra 1 shot can always be
resumed when node shutdown.
2025-10-02 12:43:31 +02:00
Zach Brown
f9756bb885
p2p: fix error message in test (#32804) 2025-09-30 19:30:47 +08:00
cui
64c6de7747
p2p: using testing.B.Loop (#32664) 2025-09-19 16:38:36 -06:00
Csaba Kiraly
de9fb9722b
revert to using table parameter
using it.lookup.tab inside is unsafe

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2025-09-17 09:04:41 +02:00
Csaba Kiraly
3589c0d59b
p2p/discover: expose timeout in lookupFailed
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>

# Conflicts:
#	p2p/discover/lookup.go
2025-09-16 14:03:11 +02:00
Felix Lange
0643427965 p2p/discover: continue 2025-09-12 12:50:07 +02:00
Felix Lange
68c18ede06
Update lookup.go 2025-09-12 11:34:44 +02:00
Csaba Kiraly
97afa2815b
Revert "p2p/discover: add test for lookup returning immediately"
This reverts commit 3eab4616a6.
2025-09-12 11:29:43 +02:00
Csaba Kiraly
3eab4616a6
p2p/discover: add test for lookup returning immediately
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2025-09-12 10:59:29 +02:00
Csaba Kiraly
72d3e881b3
p2p/discover: clarify lookup behavior on empty table
We have changed this behavior, better clarify in comment.

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2025-09-12 10:52:53 +02:00
Felix Lange
a9f9e0d589 p2p/discover: add imports in test 2025-09-10 20:10:51 +02:00
Felix Lange
3133fd369a p2p/discover: remove print in test 2025-09-10 20:10:51 +02:00
Felix Lange
3946708935 p2p/discover: fix two bugs in lookup iterator
The lookup would add self into the replyBuffer if returned by another node.
Avoid doing that by marking self as seen.

With the changed initialization behavior of lookup, the lookupIterator needs to yield the
buffer right after creation. This fixes the smallNetConvergence test, where all results
are straight out of the local table.
2025-09-10 20:10:51 +02:00
Felix Lange
cf0503da7c p2p/discover: track missing nodes in test 2025-09-10 20:10:51 +02:00
Felix Lange
721c8de738 p2p/discover: trigger refresh in lookupIterator 2025-09-10 20:10:51 +02:00
Felix Lange
e58e7f7927 p2p/discover: fix bug in lookup 2025-09-10 20:10:51 +02:00
Felix Lange
4ed8f5ee2b p2p/discover: improve iterator 2025-09-10 20:10:51 +02:00
Felix Lange
f4046b0cfb p2p/discover: move wait condition to lookupIterator 2025-09-10 20:10:51 +02:00
Felix Lange
f8e0e8dc55 p2p/discover: add context in waitForNodes 2025-09-10 20:10:51 +02:00
Felix Lange
46e4f0b5c1 p2p/discover: add waitForNodes 2025-09-10 20:10:51 +02:00
Csaba Kiraly
1f7f95d718
p2p/discover: remove delay from discv5 RandomNodes (#32517)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
Refresh is doing some lookups and thus it could block for some time. We
do not want the initializer of an iterator to block. If there is
something blocking, it should happen when calling Next.

Here, next will start a lookup, which will wait if needed (no nodes),
making sure the iterator's Next is not creating a busy loop.

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
2025-09-10 19:51:04 +02:00
Zach Brown
2a795c14f4
all: fix problematic function name in comment (#32513)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
Fix problematic function name in comment.
Do my best to correct them all with a script to avoid spamming PRs.
2025-08-29 08:54:23 +08:00
cui
9b2e8e7ce3
p2p: use slices.Clone (#32428)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
Replaces a helper method with slices.Clone
2025-08-25 11:30:51 +02:00
Ocenka
276ed4848c
p2p/discover: add discv5 invalid findnodes result test cases (#32481)
Some checks failed
/ Linux Build (push) Has been cancelled
/ Linux Build (arm) (push) Has been cancelled
/ Windows Build (push) Has been cancelled
/ Docker Image (push) Has been cancelled
Supersedes #32470.

### What
- snap: shorten stall watchdog in `eth/protocols/snap/sync_test.go` from
1m to 10s.
- discover/v5: consolidate FINDNODE negative tests into a single
table-driven test:
  - `TestUDPv5_findnodeCall_InvalidNodes` covers:
    - invalid IP (unspecified `0.0.0.0`) → ignored
    - low UDP port (`<=1024`) → ignored

### Why
- Addresses TODOs:
  - “Make tests smaller” (reduce long 1m timeout).
- “check invalid IPs”; also cover low port per `verifyResponseNode`
rules (UDP must be >1024).

### How it’s validated
- Test-only changes; no production code touched.
- Local runs:
  - `go test ./p2p/discover -count=1 -timeout=300s` → ok
  - `go test ./eth/protocols/snap -count=1 -timeout=600s` → ok
- Lint:
  - `go run build/ci.go lint` → 0 issues on modified files.

### Notes
- The test harness uses `enode.ValidSchemesForTesting` (which includes
the “null” scheme), so records signed with `enode.SignNull` are
signature-valid; failures here are due to IP/port validation in
`verifyResponseNode` and `netutil.CheckRelayAddr`.
- Tests are written as a single table-driven function for clarity; no
helpers or environment switching.

---------

Co-authored-by: lightclient <lightclient@protonmail.com>
2025-08-22 11:44:11 -06:00
cui
f3467d1e63
p2p: remove todo comment, as it's unnecessary (#32397)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
as metioned in https://github.com/ethereum/go-ethereum/pull/32351, I
think this comment is unnecessary.
2025-08-21 15:48:46 -06:00
cui
997dff4fae
p2p: using math.MaxInt32 from go std lib (#32357)
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-08-20 16:22:21 -06:00
Klimov Sergei
62ac0e05b6
p2p: update MaxPeers comment (#32414) 2025-08-19 20:14:11 +08:00
cui
2b38daa48c
p2p: refactor to use time.Now().UnixMilli() in golang std lib (#32402) 2025-08-14 16:28:57 +08:00
cui
e979438a55
p2p/enode: use atomic.Pointer in LocalNode (#32360) 2025-08-07 15:03:18 +02:00
Micke
a7efdcbf09
p2p/rlpx: optimize XOR operation using bitutil.XORBytes (#32217)
Replace manual byte-by-byte XOR implementation with the optimized
bitutil.XORBytes function. This improves performance by using word-sized
operations on supported architectures while maintaining the same
functionality. The optimized version processes data in bulk rather than
one byte at a time

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-07-22 23:06:48 +02:00
asamuj
d7db10ddbd
eth/protocols/snap, p2p/discover: improve zero time checks (#32214)
Some checks are pending
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
2025-07-15 14:20:45 +08:00
Csaba Kiraly
4bb097b7ff
eth, p2p: improve dial speed by pre-fetching dial candidates (#31944)
Some checks are pending
/ Linux Build (arm) (push) Waiting to run
/ Linux Build (push) Waiting to run
/ Docker Image (push) Waiting to run
This PR improves the speed of Disc/v4 and Disc/v5 based discovery by
adding a prefetch buffer to discovery sources, eliminating slowdowns
due to timeouts and rate mismatch between the two processes.

Since we now want to filter the discv4 nodes iterator, it is being removed
from the default discovery mix in p2p.Server. To keep backwards-compatibility,
the default unfiltered discovery iterator will be utilized by the server when
no protocol-specific discovery is configured.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-06-05 12:14:35 +02:00
Felix Lange
228803c1a2
p2p/enode: add support for naming iterator sources (#31779)
This adds support for naming the source iterators of FairMix, like so:

  mix.AddSource(enode.WithSourceName("mySource", iter))

The source that produced the latest node is returned by the new NodeSource method.
2025-05-15 14:17:58 +02:00
Csaba Kiraly
0d5de826da
p2p: add metrics for inbound connection errors (#31652)
Add metics detailing reasons we reject inbound connections for, and
reasons these connections fail during the handshake.
2025-05-07 15:34:52 +02:00