This PR improves the speed of Disc/v4 and Disc/v5 based discovery by
adding a prefetch buffer to discovery sources, eliminating slowdowns
due to timeouts and rate mismatch between the two processes.
Since we now want to filter the discv4 nodes iterator, it is being removed
from the default discovery mix in p2p.Server. To keep backwards-compatibility,
the default unfiltered discovery iterator will be utilized by the server when
no protocol-specific discovery is configured.
---------
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
Our metrics related to dial errors were off. The original error was not
wrapped, so the caller function had no chance of picking it up.
Therefore the most common error, which is "TooManyPeers", was not
correctly counted.
The metrics were originally introduced in
https://github.com/ethereum/go-ethereum/pull/27621
I was thinking of various possible solutions.
- the one proposed here wraps both the new error and the origial error.
It is not a pattern we use in other parts of the code, but works. This
is maybe the smallest possible change.
- as an alternate, I could write a proper `errProtoHandshakeError` with
it's own wrapped error
- finally, I'm not even sure we need `errProtoHandshakeError`, maybe we
could just pass up the original error.
---------
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
As of now, Geth disconnects peers only on protocol error or timeout,
meaning once connection slots are filled, the peerset is largely fixed.
As mentioned in https://github.com/ethereum/go-ethereum/issues/31321,
Geth should occasionally disconnect peers to ensure some churn.
What/when to disconnect could depend on:
- the state of geth (e.g. sync or not)
- current number of peers
- peer level metrics
This PR adds a very slow churn using a random drop.
---------
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
This is an alternative for #27407 with a solution based on gencodec.
With the PR, one can now configure like this:
```
# config.toml
[Node.P2P]
NAT = "extip:33.33.33.33"
```
```shell
$ geth --config config.toml
...
INFO [01-17|16:37:31.436] Started P2P networking self=enode://2290...ab@33.33.33.33:30303
```
enode.Node was recently changed to store a cache of endpoint information. The IP address in the cache is a netip.Addr. I chose that type over net.IP because it is just better. netip.Addr is meant to be used as a value type. Copying it does not allocate, it can be compared with ==, and can be used as a map key.
This PR changes most uses of Node.IP() into Node.IPAddr(), which returns the cached value directly without allocating.
While there are still some public APIs left where net.IP is used, I have converted all code used internally by p2p/discover to the new types. So this does change some public Go API, but hopefully not APIs any external code actually uses.
There weren't supposed to be any semantic differences resulting from this refactoring, however it does introduce one: In package p2p/netutil we treated the 0.0.0.0/8 network (addresses 0.x.y.z) as LAN, but netip.Addr.IsPrivate() doesn't. The treatment of this particular IP address range is controversial, with some software supporting it and others not. IANA lists it as special-purpose and invalid as a destination for a long time, so I don't know why I put it into the LAN list. It has now been marked as special in p2p/netutil as well.
Here we clean up internal uses of type discover.node, converting most code to use
enode.Node instead. The discover.node type used to be the canonical representation of
network hosts before ENR was introduced. Most code worked with *node to avoid conversions
when interacting with Table methods. Since *node also contains internal state of Table and
is a mutable type, using *node outside of Table code is prone to data races. It's also
cleaner not having to wrap/unwrap *enode.Node all the time.
discover.node has been renamed to tableNode to clarify its purpose.
While here, we also change most uses of net.UDPAddr into netip.AddrPort. While this is
technically a separate refactoring from the *node -> *enode.Node change, it is more
convenient because *enode.Node handles IP addresses as netip.Addr. The switch to package
netip in discovery would've happened very soon anyway.
The change to netip.AddrPort stops at certain interface points. For example, since package
p2p/netutil has not been converted to use netip.Addr yet, we still have to convert to
net.IP/net.UDPAddr in a few places.
Node discovery periodically revalidates the nodes in its table by sending PING, checking
if they are still alive. I recently noticed some issues with the implementation of this
process, which can cause strange results such as nodes dropping unexpectedly, certain
nodes not getting revalidated often enough, and bad results being returned to incoming
FINDNODE queries.
In this change, the revalidation process is improved with the following logic:
- We maintain two 'revalidation lists' containing the table nodes, named 'fast' and 'slow'.
- The process chooses random nodes from each list on a randomized interval, the interval being
faster for the 'fast' list, and performs revalidation for the chosen node.
- Whenever a node is newly inserted into the table, it goes into the 'fast' list.
Once validation passes, it transfers to the 'slow' list. If a request fails, or the
node changes endpoint, it transfers back into 'fast'.
- livenessChecks is incremented by one for successful checks. Unlike the old implementation,
we will not drop the node on the first failing check. We instead quickly decay the
livenessChecks give it another chance.
- Order of nodes in bucket doesn't matter anymore.
I am also adding a debug API endpoint to dump the node table content.
Co-authored-by: Martin HS <martin@swende.se>
The Go authors updated golang/x/ext to change the function signature of the slices sort method.
It's an entire shitshow now because x/ext is not tagged, so everyone's codebase just
picked a new version that some other dep depends on, causing our code to fail building.
This PR updates the dep on our code too and does all the refactorings to follow upstream...
This changes the port mapping procedure such that, when the requested port is unavailable
an alternative port suggested by the router is used instead.
We now also repeatedly request the external IP from the router in order to catch any IP changes.
Co-authored-by: Felix Lange <fjl@twurst.com>
This simplifies the code that initializes the discovery a bit, and
adds new flags for enabling/disabling discv4 and discv5 separately.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This enables the following linters
- typecheck
- unused
- staticcheck
- bidichk
- durationcheck
- exportloopref
- gosec
WIth a few exceptions.
- We use a deprecated protobuf in trezor. I didn't want to mess with that, since I cannot meaningfully test any changes there.
- The deprecated TypeMux is used in a few places still, so the warning for it is silenced for now.
- Using string type in context.WithValue is apparently wrong, one should use a custom type, to prevent collisions between different places in the hierarchy of callers. That should be fixed at some point, but may require some attention.
- The warnings for using weak random generator are squashed, since we use a lot of random without need for cryptographic guarantees.
This PR enables running the new discv5 protocol in both LES client
and server mode. In client mode it mixes discv5 and dnsdisc iterators
(if both are enabled) and filters incoming ENRs for "les" tag and fork ID.
The old p2p/discv5 package and all references to it are removed.
Co-authored-by: Felix Lange <fjl@twurst.com>
* peer: return localAddr instead of name to prevent spam
We currently use the name (which can be freely set by the peer) in several log messages.
This enables malicious actors to write spam into your geth log.
This commit returns the localAddr instead of the freely settable name.
* p2p: reduce usage of peer.Name in warn messages
* eth, p2p: use truncated names
* Update peer.go
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Co-authored-by: Felix Lange <fjl@twurst.com>
This change moves the RLPx protocol implementation into a separate package,
p2p/rlpx. The new package can be used to establish RLPx connections for
protocol testing purposes.
Co-authored-by: Felix Lange <fjl@twurst.com>
* p2p: new dial scheduler
This change replaces the peer-to-peer dial scheduler with a new and
improved implementation. The new code is better than the previous
implementation in two key aspects:
- The time between discovery of a node and dialing that node is
significantly lower in the new version. The old dialState kept
a buffer of nodes and launched a task to refill it whenever the buffer
became empty. This worked well with the discovery interface we used to
have, but doesn't really work with the new iterator-based discovery
API.
- Selection of static dial candidates (created by Server.AddPeer or
through static-nodes.json) performs much better for large amounts of
static peers. Connections to static nodes are now limited like dynanic
dials and can no longer overstep MaxPeers or the dial ratio.
* p2p/simulations/adapters: adapt to new NodeDialer interface
* p2p: re-add check for self in checkDial
* p2p: remove peersetCh
* p2p: allow static dials when discovery is disabled
* p2p: add test for dialScheduler.removeStatic
* p2p: remove blank line
* p2p: fix documentation of maxDialPeers
* p2p: change "ok" to "added" in static node log
* p2p: improve dialTask docs
Also increase log level for "Can't resolve node"
* p2p: ensure dial resolver is truly nil without discovery
* p2p: add "looking for peers" log message
* p2p: clean up Server.run comments
* p2p: fix maxDialedConns for maxpeers < dialRatio
Always allocate at least one dial slot unless dialing is disabled using
NoDial or MaxPeers == 0. Most importantly, this fixes MaxPeers == 1 to
dedicate the sole slot to dialing instead of listening.
* p2p: fix RemovePeer to disconnect the peer again
Also make RemovePeer synchronous and add a test.
* p2p: remove "Connection set up" log message
* p2p: clean up connection logging
We previously logged outgoing connection failures up to three times.
- in SetupConn() as "Setting up connection failed addr=..."
- in setupConn() with an error-specific message and "id=... addr=..."
- in dial() as "Dial error task=..."
This commit ensures a single log message is emitted per failure and adds
"id=... addr=... conn=..." everywhere (id= omitted when the ID isn't
known yet).
Also avoid printing a log message when a static dial fails but can't be
resolved because discv4 is disabled. The light client hit this case all
the time, increasing the message count to four lines per failed
connection.
* p2p: document that RemovePeer blocks
This is a temporary fix for a problem which started happening when the
dialer was changed to read nodes from an enode.Iterator. Before the
iterator change, discovery queries would always return within a couple
seconds even if there was no Internet access. Since the iterator won't
return unless a node is actually found, discoverTask can take much
longer. This means that the 'emergency connect' logic might not execute
in time, leading to a stuck node.
This adds all dashboard changes from the last couple months.
We're about to remove the dashboard, but decided that we should
get all the recent work in first in case anyone wants to pick up this
project later on.
* cmd, dashboard, eth, p2p: send peer info to the dashboard
* dashboard: update npm packages, improve UI, rebase
* dashboard, p2p: remove println, change doc
* cmd, dashboard, eth, p2p: cleanup after review
* dashboard: send current block to the dashboard client
* eth: chain config (genesis + fork) ENR entry
* core/forkid, eth: protocol independent fork ID, update to CRC32 spec
* core/forkid, eth: make forkid a struct, next uint64, enr struct, RLP
* core/forkid: change forkhash rlp encoding from int to [4]byte
* eth: fixup eth entry a bit and update it every block
* eth: fix lint
* eth: fix crash in ethclient tests
The dialer limits itself to one attempt every 30s. Apply the same limit
in Server and reject peers which try to connect too eagerly. The check
against the limit happens right after accepting the connection.
Further changes in this commit ensure we pass the Server logger
down to Peer instances, discovery and dialState. Unit test logging now
works in all Server tests.
* p2p/enr: add entries for for IPv4/IPv6 separation
This adds entry types for "ip6", "udp6", "tcp6" keys. The IP type stays
around because removing it would break a lot of code and force everyone
to care about the distinction.
* p2p/enode: track IPv4 and IPv6 address separately
LocalNode predicts the local node's UDP endpoint and updates the record.
This change makes it predict IPv4 and IPv6 endpoints separately since
they can now be in the record at the same time.
* p2p/enode: implement base64 text format
* all: switch to enode.Parse(...)
This allows passing base64-encoded node records to all the places that
previously accepted enode:// URLs. The URL format is still supported.
* cmd/bootnode, p2p: log node URL instead of ENR
...and return the base64 record in NodeInfo.
This change extends the peer metrics collection:
- traces the life-cycle of the peers
- meters the peer traffic separately for every peer
- creates event feed for the peer events
- emits the peer events