1
0
Fork 0
forked from forks/go-ethereum
Commit graph

715 commits

Author SHA1 Message Date
Csaba Kiraly
6928ec5d92
p2p: fix dial metrics not picking up the right error (#31621)
Our metrics related to dial errors were off. The original error was not
wrapped, so the caller function had no chance of picking it up.
Therefore the most common error, which is "TooManyPeers", was not
correctly counted.

The metrics were originally introduced in
https://github.com/ethereum/go-ethereum/pull/27621

I was thinking of various possible solutions.
- the one proposed here wraps both the new error and the origial error.
It is not a pattern we use in other parts of the code, but works. This
is maybe the smallest possible change.
- as an alternate, I could write a proper `errProtoHandshakeError` with
it's own wrapped error
- finally, I'm not even sure we need `errProtoHandshakeError`, maybe we
could just pass up the original error.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-15 20:40:30 +02:00
Csaba Kiraly
c5c75977ab
eth: add logic to drop peers randomly when saturated (#31476)
As of now, Geth disconnects peers only on protocol error or timeout,
meaning once connection slots are filled, the peerset is largely fixed.

As mentioned in https://github.com/ethereum/go-ethereum/issues/31321,
Geth should occasionally disconnect peers to ensure some churn.
What/when to disconnect could depend on:
- the state of geth (e.g. sync or not)
- current number of peers
- peer level metrics

This PR adds a very slow churn using a random drop.

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-14 12:45:27 +02:00
Csaba Kiraly
ecd5c18610
p2p: better dial/serve success metrics (#31629)
Our previous success metrics gave success even if a peer disconnected
right after connection. These metrics only count peers that stayed
connected for at least 1 min. The 1 min limit is an arbitrary choice. We do
not use this for decision logic, only statistics.
2025-04-14 10:13:45 +02:00
Csaba Kiraly
a7f24c26c0
p2p/nat: fix UPnP port reset (#31566)
Make UPnP more robust

- Once a random port was mapped, we try to stick to it even if a UPnP
refresh fails. Previously we were immediately moving back to try the
default port, leading to frequent ENR changes.

- We were deleting port mappings before refresh as a possible
workaround. This created issues in some UPnP servers. The UPnP (and PMP)
specification is explicit about the refresh requirements, and delete is
clearly not needed (see
https://github.com/ethereum/go-ethereum/pull/30265#issuecomment-2766987859).
From now on we only delete when closing.

- We were trying to add port mappings only once, and then moved on to
random ports. Now we insist a bit more, so that a simple failed request
won't lead to ENR changes.

Fixes https://github.com/ethereum/go-ethereum/issues/31418

---------

Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-09 11:28:29 +02:00
Nathan Jo
ff365afc63
p2p/nat: remove forceful port mapping in upnp (#30265)
Here we are modifying the port mapping logic so that existing port
mappings will only be removed when they were previously created by geth.

The AddAnyPortMapping functionality has been adapted to work consistently
between the IGDv1 and IGDv2 backends.
2025-04-04 10:56:55 +02:00
thinkAfCod
d2176f463b
p2p/discover: pass node instead of node ID to TALKREQ handler (#31075)
This is for the implementation of Portal Network in the Shisui client.
Their handler needs access to the node object in order to send further
calls to the requesting node. This is a breaking API change but it
should be fine, since there are basically no known users of TALKREQ
outside of Portal network.

---------

Signed-off-by: thinkAfCod <q315xia@163.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-02 14:56:21 +02:00
thinkAfCod
3e4fbce034
p2p/discover: repeat exact encoding when resending WHOAREYOU packet (#31543)
When resending the WHOAREYOU packet, a new nonce and random IV should not
be generated. The sent packet needs to match the previously-sent one exactly
in order to make the handshake retry work.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-02 13:47:44 +02:00
John
1bd70ba57a
p2p/nat: improve AddMapping code (#31486)
It introduces a new variable to store the external port returned by the
addAnyPortMapping function and ensures that the correct external port is
returned even in case of an error.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-04-01 14:07:47 +02:00
Shude Li
4ff5093df1
all: use fmt.Appendf instead of fmt.Sprintf where possible (#31301) 2025-03-25 14:53:02 +01:00
Felix Lange
9eb610f0a9
p2p/discover: repeat WHOAREYOU challenge when handshake in progress (#31356)
This fixes the handshake in a scenario where the remote end sends two unknown
packets in a row. When this happens, we would previously respond to both with
a WHOAREYOU challenge, but keep only the latest sent challenge. Transmission is
assumed to be unreliable, so any client that sends two request packets simultaneously
has to be prepared to follow up on whichever request leads to a handshake. With
this fix, we force them to do the handshake that we can actually complete.

Fixes #30581
2025-03-20 17:11:40 +01:00
Chen Kai
5117f77af9
p2p/discover: expose discv5 functions for portal JSON-RPC interface (#31117)
Fixes #31093

Here we add some API functions on the UDPv5 object for the purpose of implementing
the Portal Network JSON-RPC API in the shisui client.

---------

Signed-off-by: Chen Kai <281165273grape@gmail.com>
2025-03-13 15:16:01 +01:00
Martin HS
767c202e47
all: drop x/exp direct dependency (#30558)
This is a not-particularly-important "cleanliness" PR. It removes the
last remnants of the `x/exp` package, where we used the `maps.Keys`
function.

The original returned the keys in a slice, but when it became 'native'
the signature changed to return an iterator, so the new idiom is
`slices.Collect(maps.Keys(theMap))`, unless of course the raw iterator
can be used instead.

In some cases, where we previously collect into slice and then sort, we
can now instead do `slices.SortXX` on the iterator instead, making the
code a bit more concise.

This PR might be _slighly_ less optimal, because the original `x/exp`
implementation allocated the slice at the correct size off the bat,
which I suppose the new code won't.

Putting it up for discussion.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-02-27 15:53:52 +01:00
Felix Lange
9e6f924671
eth: report error from setupDiscovery at startup (#31233)
I ran into this while trying to debug a discv5 thing. I tried to disable
DNS discovery using `--discovery.dns=false`, which doesn't work.
Annoyingly, geth started anyway and discarded the error silently. I
eventually found my mistake, but it took way longer than it should have.

Also including a small change to the error message for invalid DNS URLs
here. The user actually needs to see the URL to make sense of the error.
2025-02-23 17:38:32 +01:00
Felix Lange
2a81bbaa4f
p2p/nat: remove test with default servers (#31225)
The test occasionally fails when network connectivity is bad or if it
hits the wrong server. We usually don't add tests with external network
dependency so I'm removing them.

Fixes #31220
2025-02-21 10:42:54 +08:00
Felix Lange
c113e3b5b1
p2p: fix marshaling of NAT in TOML (#31192)
This fixes an issue where a nat.Interface unmarshaled from the TOML
config file could not be re-marshaled to TOML correctly.

Fixes #31183
2025-02-17 09:47:12 +01:00
Chen Kai
22b9354494
p2p/discover: make discv5 response timeout configurable (#31119) 2025-02-11 13:52:43 +01:00
Harry Ngo
d2ca7cf9f1
p2p/discover: remove unused parameter in revalidationList.get (#31155) 2025-02-11 13:45:44 +01:00
Felix Lange
5d97db8d03
all: update license comments and AUTHORS (#31133) 2025-02-05 23:01:17 +01:00
zhen peng
75526bb8e0
p2p/nat: add stun protocol (#31064)
This implements a basic mechanism to query the node's external IP using
a STUN server. There is a built-in list of public STUN servers for convenience.
The new detection mechanism must be selected explicitly using `--nat=stun` 
and is not enabled by default in Geth.

Fixes #30881

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2025-01-24 16:16:02 +01:00
Felix Lange
218b697f05
p2p: support configuring NAT in TOML file (#31041)
This is an alternative for #27407 with a solution based on gencodec.
With the PR, one can now configure like this:

```
# config.toml
[Node.P2P]
NAT = "extip:33.33.33.33"
```

```shell
$ geth --config config.toml
...
INFO [01-17|16:37:31.436] Started P2P networking      self=enode://2290...ab@33.33.33.33:30303
```
2025-01-22 09:29:34 +01:00
georgehao
1843f27766
all: fix some typos in comments and names (#31023) 2025-01-14 14:16:15 +01:00
gitglorythegreat
85ffbde427
all: use cmp.Compare (#30958) 2025-01-02 14:06:47 +01:00
Lucas
804d45cc2e
p2p: DNS resolution for static nodes (#30822)
Closes #23210 

# Context 
When deploying Geth in Kubernetes with ReplicaSets, we encountered two
DNS-related issues affecting node connectivity. First, during startup,
Geth tries to resolve DNS names for static nodes too early in the config
unmarshaling phase. If peer nodes aren't ready yet (which is common in
Kubernetes rolling deployments), this causes an immediate failure:


```
INFO [11-26|10:03:42.816] Starting Geth on Ethereum mainnet...
INFO [11-26|10:03:42.817] Bumping default cache on mainnet         provided=1024 updated=4096
Fatal: config.toml, line 81: (p2p.Config.StaticNodes) lookup idontexist.geth.node: no such host
``` 

The second issue comes up when pods get rescheduled to different nodes -
their IPs change but peers keep using the initially resolved IP, never
updating the DNS mapping.

This PR adds proper DNS support for enode:// URLs by deferring resolution
to connection time. It also handles DNS failures gracefully instead of failing
fatally during startup, making it work better in container environments where
IPs are dynamic and peers come and go during rollouts.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2024-12-13 12:46:12 +01:00
lorenzo
c1c2507148
p2p: fix DiscReason encoding/decoding (#30855)
This fixes an issue where the disconnect message was not wrapped in a list.
The specification requires it to be a list like any other message.

In order to remain compatible with legacy geth versions, we now accept both
encodings when parsing a disconnect message.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2024-12-12 12:33:42 +01:00
Martin HS
9045b79bc2
metrics, cmd/geth: change init-process of metrics (#30814)
This PR modifies how the metrics library handles `Enabled`: previously,
the package `init` decided whether to serve real metrics or just
dummy-types.

This has several drawbacks: 
- During pkg init, we need to determine whether metrics are enabled or
not. So we first hacked in a check if certain geth-specific
commandline-flags were enabled. Then we added a similar check for
geth-env-vars. Then we almost added a very elaborate check for
toml-config-file, plus toml parsing.

- Using "real" types and dummy types interchangeably means that
everything is hidden behind interfaces. This has a performance penalty,
and also it just adds a lot of code.

This PR removes the interface stuff, uses concrete types, and allows for
the setting of Enabled to happen later. It is still assumed that
`metrics.Enable()` is invoked early on.

The somewhat 'heavy' operations, such as ticking meters and exp-decay,
now checks the enable-flag to prevent resource leak.

The change may be large, but it's mostly pretty trivial, and from the
last time I gutted the metrics, I ensured that we have fairly good test
coverage.

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2024-12-10 13:27:29 +01:00
tianyeyouyou
ae83912841
p2p/netutil: unittests for addrutil (#30439)
add unit tests for `p2p/addrutil`

---------

Co-authored-by: Martin HS <martin@swende.se>
2024-11-11 11:43:22 +01:00
Martin HS
5adc314817
build: update to golangci-lint 1.61.0 (#30587)
Changelog: https://golangci-lint.run/product/changelog/#1610 

Removes `exportloopref` (no longer needed), replaces it with
`copyloopvar` which is basically the opposite.

Also adds: 
- `durationcheck`
- `gocheckcompilerdirectives`
- `reassign`
- `mirror`
- `tenv`

---------

Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
2024-10-14 19:25:22 +02:00
Felix Lange
6b61b54dc7
p2p/discover: add config option for disabling FINDNODE liveness check (#30512)
This is for fixing Prysm integration tests.
2024-09-30 10:56:14 +02:00
Martin HS
b5a88dafae
p2p/discover: fix flaky tests writing to test.log after completion (#30506)
This PR fixes two tests, which had a tendency to sometimes write to the `*testing.T` `log` facility after the test function had completed, which is not allowed. This PR fixes it by using waitgroups to ensure that the handler/logwriter terminates before the test exits.

closes #30505
2024-09-26 08:12:12 +02:00
Guillaume Michel
f544fc3b46
p2p/enode: add quic ENR entry (#30283)
Add `quic` entry to the ENR as proposed in
https://github.com/ethereum/consensus-specs/pull/3644

---------

Co-authored-by: lightclient <lightclient@protonmail.com>
2024-09-13 23:47:18 +02:00
Nicolas Gotchac
87377c58bc
p2p/discover: fix Write method in metered connection (#30355)
`WriteToUDP` was never called, since `meteredUdpConn` exposed directly
all the methods from the underlying `UDPConn` interface.

This fixes the `discover/egress` metric never being updated.
2024-08-27 14:10:32 +02:00
lightclient
00294e9d28
cmd/utils,p2p: enable discv5 by default (#30327) 2024-08-20 16:02:54 +02:00
lightclient
33a13b6f21
p2p/simulations: remove packages (#30250)
Looking at the history of these packages over the past several years, there
haven't been any meaningful contributions or usages:
https://github.com/ethereum/go-ethereum/commits/master/p2p/simulations?before=de6d5976794a9ed3b626d4eba57bf7f0806fb970+35

Almost all of the commits are part of larger refactors or low-hanging-fruit contributions.
Seems like it's not providing much value and taking up team + contributor time.
2024-08-12 10:36:48 +02:00
Daniel Knopik
de6d597679
p2p/discover: schedule revalidation also when all nodes are excluded (#30239)
## Issue

If `nextTime` has passed, but all nodes are excluded, `get` would return
`nil` and `run` would therefore not invoke `schedule`. Then, we schedule
a timer for the past, as neither `nextTime` value has been updated. This
creates a busy loop, as the timer immediately returns.

## Fix

With this PR, revalidation will be also rescheduled when all nodes are
excluded.

---------

Co-authored-by: lightclient <lightclient@protonmail.com>
2024-07-31 21:38:23 +02:00
Marius G
6e33dbf96a
p2p: fix flaky test TestServerPortMapping (#30241)
The test specifies `ListenAddr: ":0"`, which means a random ephemeral
port will be chosen for the TCP listener by the OS. Additionally, since
no `DiscAddr` was specified, the same port that is chosen automatically
by the OS will also be used for the UDP listener in the discovery UDP
setup. This sometimes leads to test failures if the TCP listener picks a
free TCP port that is already taken for UDP. By specifying `DiscAddr:
":0"`, the UDP port will be chosen independently from the TCP port,
fixing the random failure.

See issue #29830.

Verified using
```
cd p2p
go test -c -race
stress ./p2p.test -test.run=TestServerPortMapping
...
5m0s: 4556 runs so far, 0 failures
```

The issue described above can technically lead to sporadic failures on
systems that specify a listen address via the `--port` flag of 0 while
not setting `--discovery.port`. Since the default is using port `30303`
and using a random ephemeral port is likely not used much to begin with,
not addressing the root cause might be acceptable.
2024-07-30 07:31:27 -06:00
dknopik
b0f66e34ca
p2p/nat: return correct port for ExtIP NAT (#30234)
Return the actually requested external port instead of 0 in the
AddMapping implementation for `--nat extip:<IP>`.
2024-07-27 10:18:05 +02:00
yukionfire
4dfc75deef
beacon/types, cmd/devp2p, p2p/enr: clean up uses of fmt.Errorf (#30182) 2024-07-25 00:32:58 +02:00
Felix Lange
ad49c708f5
p2p/discover: remove type encPubkey (#30172)
The pubkey type was moved to package v4wire a long time ago. Remaining uses of
encPubkey were probably left in due to laziness.
2024-07-18 11:09:02 +02:00
Nathan Jo
4bbe993252
p2p: fix ip change log parameter (#30158) 2024-07-15 10:15:35 +02:00
Halimao
a71f6f91fd
p2p/discover: improve flaky revalidation tests (#30023) 2024-06-21 15:29:07 +02:00
David Theodore
27654d3022
p2p/rlpx: 2KB maximum size for handshake messages (#30029)
Co-authored-by: Felix Lange <fjl@twurst.com>
2024-06-20 14:08:54 +02:00
bugmaker9371
b6f2bbd417
p2p/simulations: update doc of HTTP endpoints (#29894) 2024-06-11 19:41:17 +02:00
Gealber Morales
8bda642963
p2p: use package slices to sort in PeersInfo (#29957) 2024-06-09 22:50:22 +02:00
Gealber Morales
349fcdd22d
p2p/discover: add missing lock when calling tab.handleAddNode (#29960) 2024-06-09 22:47:51 +02:00
Felix Lange
85459e1439
p2p/discover: unwrap 4-in-6 UDP source addresses (#29944)
Fixes an issue where discovery responses were not recognized.
2024-06-06 16:15:22 +03:00
Hteev Oli
0750cb0c8f
p2p/netutil: fix comments (#29942) 2024-06-06 10:56:41 +03:00
Felix Lange
bc6569462d
p2p: use netip.Addr where possible (#29891)
enode.Node was recently changed to store a cache of endpoint information. The IP address in the cache is a netip.Addr. I chose that type over net.IP because it is just better. netip.Addr is meant to be used as a value type. Copying it does not allocate, it can be compared with ==, and can be used as a map key.

This PR changes most uses of Node.IP() into Node.IPAddr(), which returns the cached value directly without allocating.
While there are still some public APIs left where net.IP is used, I have converted all code used internally by p2p/discover to the new types. So this does change some public Go API, but hopefully not APIs any external code actually uses.

There weren't supposed to be any semantic differences resulting from this refactoring, however it does introduce one: In package p2p/netutil we treated the 0.0.0.0/8 network (addresses 0.x.y.z) as LAN, but netip.Addr.IsPrivate() doesn't. The treatment of this particular IP address range is controversial, with some software supporting it and others not. IANA lists it as special-purpose and invalid as a destination for a long time, so I don't know why I put it into the LAN list. It has now been marked as special in p2p/netutil as well.
2024-06-05 19:31:04 +02:00
Felix Lange
94a8b296e4
p2p/discover: refactor node and endpoint representation (#29844)
Here we clean up internal uses of type discover.node, converting most code to use
enode.Node instead. The discover.node type used to be the canonical representation of
network hosts before ENR was introduced. Most code worked with *node to avoid conversions
when interacting with Table methods. Since *node also contains internal state of Table and
is a mutable type, using *node outside of Table code is prone to data races. It's also
cleaner not having to wrap/unwrap *enode.Node all the time.

discover.node has been renamed to tableNode to clarify its purpose.

While here, we also change most uses of net.UDPAddr into netip.AddrPort. While this is
technically a separate refactoring from the *node -> *enode.Node change, it is more
convenient because *enode.Node handles IP addresses as netip.Addr. The switch to package
netip in discovery would've happened very soon anyway.

The change to netip.AddrPort stops at certain interface points. For example, since package
p2p/netutil has not been converted to use netip.Addr yet, we still have to convert to
net.IP/net.UDPAddr in a few places.
2024-05-29 15:02:26 +02:00
lilasxie
153f8da887
p2p/nodestate: remove unused package (#29872) 2024-05-29 12:11:18 +02:00
bugmaker9371
daf4f72077
p2p/simulations: remove stale information about docker adapter (#29874) 2024-05-29 12:09:58 +02:00