This fixes an issue where the disconnect message was not wrapped in a list.
The specification requires it to be a list like any other message.
In order to remain compatible with legacy geth versions, we now accept both
encodings when parsing a disconnect message.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR modifies how the metrics library handles `Enabled`: previously,
the package `init` decided whether to serve real metrics or just
dummy-types.
This has several drawbacks:
- During pkg init, we need to determine whether metrics are enabled or
not. So we first hacked in a check if certain geth-specific
commandline-flags were enabled. Then we added a similar check for
geth-env-vars. Then we almost added a very elaborate check for
toml-config-file, plus toml parsing.
- Using "real" types and dummy types interchangeably means that
everything is hidden behind interfaces. This has a performance penalty,
and also it just adds a lot of code.
This PR removes the interface stuff, uses concrete types, and allows for
the setting of Enabled to happen later. It is still assumed that
`metrics.Enable()` is invoked early on.
The somewhat 'heavy' operations, such as ticking meters and exp-decay,
now checks the enable-flag to prevent resource leak.
The change may be large, but it's mostly pretty trivial, and from the
last time I gutted the metrics, I ensured that we have fairly good test
coverage.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Changelog: https://golangci-lint.run/product/changelog/#1610
Removes `exportloopref` (no longer needed), replaces it with
`copyloopvar` which is basically the opposite.
Also adds:
- `durationcheck`
- `gocheckcompilerdirectives`
- `reassign`
- `mirror`
- `tenv`
---------
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
This PR fixes two tests, which had a tendency to sometimes write to the `*testing.T` `log` facility after the test function had completed, which is not allowed. This PR fixes it by using waitgroups to ensure that the handler/logwriter terminates before the test exits.
closes#30505
`WriteToUDP` was never called, since `meteredUdpConn` exposed directly
all the methods from the underlying `UDPConn` interface.
This fixes the `discover/egress` metric never being updated.
## Issue
If `nextTime` has passed, but all nodes are excluded, `get` would return
`nil` and `run` would therefore not invoke `schedule`. Then, we schedule
a timer for the past, as neither `nextTime` value has been updated. This
creates a busy loop, as the timer immediately returns.
## Fix
With this PR, revalidation will be also rescheduled when all nodes are
excluded.
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
The test specifies `ListenAddr: ":0"`, which means a random ephemeral
port will be chosen for the TCP listener by the OS. Additionally, since
no `DiscAddr` was specified, the same port that is chosen automatically
by the OS will also be used for the UDP listener in the discovery UDP
setup. This sometimes leads to test failures if the TCP listener picks a
free TCP port that is already taken for UDP. By specifying `DiscAddr:
":0"`, the UDP port will be chosen independently from the TCP port,
fixing the random failure.
See issue #29830.
Verified using
```
cd p2p
go test -c -race
stress ./p2p.test -test.run=TestServerPortMapping
...
5m0s: 4556 runs so far, 0 failures
```
The issue described above can technically lead to sporadic failures on
systems that specify a listen address via the `--port` flag of 0 while
not setting `--discovery.port`. Since the default is using port `30303`
and using a random ephemeral port is likely not used much to begin with,
not addressing the root cause might be acceptable.
enode.Node was recently changed to store a cache of endpoint information. The IP address in the cache is a netip.Addr. I chose that type over net.IP because it is just better. netip.Addr is meant to be used as a value type. Copying it does not allocate, it can be compared with ==, and can be used as a map key.
This PR changes most uses of Node.IP() into Node.IPAddr(), which returns the cached value directly without allocating.
While there are still some public APIs left where net.IP is used, I have converted all code used internally by p2p/discover to the new types. So this does change some public Go API, but hopefully not APIs any external code actually uses.
There weren't supposed to be any semantic differences resulting from this refactoring, however it does introduce one: In package p2p/netutil we treated the 0.0.0.0/8 network (addresses 0.x.y.z) as LAN, but netip.Addr.IsPrivate() doesn't. The treatment of this particular IP address range is controversial, with some software supporting it and others not. IANA lists it as special-purpose and invalid as a destination for a long time, so I don't know why I put it into the LAN list. It has now been marked as special in p2p/netutil as well.
Here we clean up internal uses of type discover.node, converting most code to use
enode.Node instead. The discover.node type used to be the canonical representation of
network hosts before ENR was introduced. Most code worked with *node to avoid conversions
when interacting with Table methods. Since *node also contains internal state of Table and
is a mutable type, using *node outside of Table code is prone to data races. It's also
cleaner not having to wrap/unwrap *enode.Node all the time.
discover.node has been renamed to tableNode to clarify its purpose.
While here, we also change most uses of net.UDPAddr into netip.AddrPort. While this is
technically a separate refactoring from the *node -> *enode.Node change, it is more
convenient because *enode.Node handles IP addresses as netip.Addr. The switch to package
netip in discovery would've happened very soon anyway.
The change to netip.AddrPort stops at certain interface points. For example, since package
p2p/netutil has not been converted to use netip.Addr yet, we still have to convert to
net.IP/net.UDPAddr in a few places.
It seems the semantic differences between addFoundNode and addInboundNode were lost in
#29572. My understanding is addFoundNode is for a node you have not contacted directly
(and are unsure if is available) whereas addInboundNode is for adding nodes that have
contacted the local node and we can verify they are active.
handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in
the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes
this. It wasn't originally caught in tests like TestTable_addSeenNode because the
manipulation of the node object actually modified the node value used by the test.
New logic is added to reject non-inbound updates unless the sequence number of the
(signed) ENR increases. Inbound updates, which are published by the updated node itself,
are always accepted. If an inbound update changes the endpoint, the node will be
revalidated on an expedited schedule.
Co-authored-by: Felix Lange <fjl@twurst.com>
In #29572, I assumed the revalidation list that the node is contained in could only ever
be changed by the outcome of a revalidation request. But turns out that's not true: if the
node gets removed due to FINDNODE failure, it will also be removed from the list it is in.
This causes a crash.
The invariant is: while node is in table, it is always in exactly one of the two lists. So
it seems best to store a pointer to the current list within the node itself.
enode.Node has separate accessor functions for getting the IP, UDP port and TCP port.
These methods performed separate checks for attributes set in the ENR.
With this PR, the accessor methods will now return cached information, and the endpoint is
determined when the node is created. The logic to determine the preferred endpoint is now
more correct, and considers how 'global' each address is when both IPv4 and IPv6 addresses
are present in the ENR.
Node discovery periodically revalidates the nodes in its table by sending PING, checking
if they are still alive. I recently noticed some issues with the implementation of this
process, which can cause strange results such as nodes dropping unexpectedly, certain
nodes not getting revalidated often enough, and bad results being returned to incoming
FINDNODE queries.
In this change, the revalidation process is improved with the following logic:
- We maintain two 'revalidation lists' containing the table nodes, named 'fast' and 'slow'.
- The process chooses random nodes from each list on a randomized interval, the interval being
faster for the 'fast' list, and performs revalidation for the chosen node.
- Whenever a node is newly inserted into the table, it goes into the 'fast' list.
Once validation passes, it transfers to the 'slow' list. If a request fails, or the
node changes endpoint, it transfers back into 'fast'.
- livenessChecks is incremented by one for successful checks. Unlike the old implementation,
we will not drop the node on the first failing check. We instead quickly decay the
livenessChecks give it another chance.
- Order of nodes in bucket doesn't matter anymore.
I am also adding a debug API endpoint to dump the node table content.
Co-authored-by: Martin HS <martin@swende.se>
time.After is equivalent to NewTimer(d).C, and does not call Stop if the timer is no longer needed. This can cause memory leaks. This change changes many such occations to use NewTimer instead, and calling Stop once the timer is no longer needed.
Since Go 1.22 has deprecated certain elliptic curve operations, this PR removes
references to the affected functions and replaces them with a custom implementation
in package crypto. This causes backwards-incompatible changes in some places.
---------
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Co-authored-by: Felix Lange <fjl@twurst.com>
* p2p/discover: add liveness check in collectTableNodes
* p2p/discover: fix test
* p2p/discover: rename to appendLiveNodes
* p2p/discover: add dedup logic back
* p2p/discover: simplify
* p2p/discover: fix issue found by test
This change
- Removes interface `log.Format`,
- Removes method `log.FormatFunc`,
- unexports `TerminalHandler.TerminalFormat` formatting methods (renamed to `TerminalHandler.format`)
- removes the notion of `log.Lazy` values
The lazy handler was useful in the old log package, since it
could defer the evaluation of costly attributes until later in the
log pipeline: thus, if the logging was done at 'Trace', we could
skip evaluation if logging only was set to 'Info'.
With the move to slog, this way of deferring evaluation is no longer
needed, since slog introduced 'Enabled': the caller can thus do
the evaluate-or-not decision at the callsite, which is much more
straight-forward than dealing with lazy reflect-based evaluation.
Also, lazy evaluation would not work with 'native' slog, as in, these
two statements would be evaluated differently:
```golang
log.Info("foo", "my lazy", lazyObj)
slog.Info("foo", "my lazy", lazyObj)
```
This PR replaces Geth's logger package (a fork of [log15](https://github.com/inconshreveable/log15)) with an implementation using slog, a logging library included as part of the Go standard library as of Go1.21.
Main changes are as follows:
* removes any log handlers that were unused in the Geth codebase.
* Json, logfmt, and terminal formatters are now slog handlers.
* Verbosity level constants are changed to match slog constant values. Internal translation is done to make this opaque to the user and backwards compatible with existing `--verbosity` and `--vmodule` options.
* `--log.backtraceat` and `--log.debug` are removed.
The external-facing API is largely the same as the existing Geth logger. Logger method signatures remain unchanged.
A small semantic difference is that a `Handler` can only be set once per `Logger` and not changed dynamically. This just means that a new logger must be instantiated every time the handler of the root logger is changed.
----
For users of the `go-ethereum/log` module. If you were using this module for your own project, you will need to change the initialization. If you previously did
```golang
log.Root().SetHandler(log.LvlFilterHandler(log.LvlInfo, log.StreamHandler(os.Stderr, log.TerminalFormat(true))))
```
You now instead need to do
```golang
log.SetDefault(log.NewLogger(log.NewTerminalHandlerWithLevel(os.Stderr, log.LevelInfo, true)))
```
See more about reasoning here: https://github.com/ethereum/go-ethereum/issues/28558#issuecomment-1820606613
a little copying is better than a little dependency
-- go proverb
We have this dependency on docker, a.k.a moby: a gigantic library, and we only need ~70 LOC,
so here I tried moving it inline instead.
Co-authored-by: Felix Lange <fjl@twurst.com>
The Go authors updated golang/x/ext to change the function signature of the slices sort method.
It's an entire shitshow now because x/ext is not tagged, so everyone's codebase just
picked a new version that some other dep depends on, causing our code to fail building.
This PR updates the dep on our code too and does all the refactorings to follow upstream...
This changes the port mapping procedure such that, when the requested port is unavailable
an alternative port suggested by the router is used instead.
We now also repeatedly request the external IP from the router in order to catch any IP changes.
Co-authored-by: Felix Lange <fjl@twurst.com>
This simplifies the code that initializes the discovery a bit, and
adds new flags for enabling/disabling discv4 and discv5 separately.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
In all other UDPv4 methods, the deadline is checked first. It seems weird to me that ping is an exception. Deadline comparison is also less resource intensive.
Co-authored-by: Exca-DK <Exca-DK@users.noreply.github.com>
* p2p/discover: remove ReadRandomNodes
Even though it's public, this method is not callable by code outside of
package p2p/discover because one can't get a valid instance of Table.
* p2p/discover: add Table.Nodes
* p2p/discover: make Table settings configurable
In unit tests and externally developed cmd/devp2p test runs, it can be
useful to tune the timer intervals used by Table.
This changes TALKREQ message processing to run the handler on separate goroutine,
instead of running on the main discv5 dispatcher goroutine. It's better this way because
it allows the handler to perform blocking actions.
I'm also adding a new method TalkRequestToID here. The method allows implementing
a request flow where one node A sends TALKREQ to another node B, and node B later
sends a TALKREQ back. With TalkRequestToID, node B does not need the ENR of A to
send its request.
This PR is a (superior) alternative to https://github.com/ethereum/go-ethereum/pull/26708, it handles deprecation, primarily two specific cases.
`rand.Seed` is typically used in two ways
- `rand.Seed(time.Now().UnixNano())` -- we seed it, just to be sure to get some random, and not always get the same thing on every run. This is not needed, with global seeding, so those are just removed.
- `rand.Seed(1)` this is typically done to ensure we have a stable test. If we rely on this, we need to fix up the tests to use a deterministic prng-source. A few occurrences like this has been replaced with a proper custom source.
`rand.Read` has been replaced by `crypto/rand`.`Read` in this PR.
* p2p/discover: add more packet information in logs
This adds more fields to discv5 packet logs. These can be useful when
debugging multi-packet interactions.
The FINDNODE message also gets an additional field, OpID for debugging
purposes. This field is not encoded onto the wire.
I'm also removing topic system related message types in this change.
These will come back in the future, where support for them will be
guarded by a config flag.
* p2p/discover/v5wire: rename 'Total' to 'RespCount'
The new name captures the meaning of this field better.
Alarm is a timer utility that simplifies code where a timer needs to be rescheduled over
and over. Doing this can be tricky with time.Timer or time.AfterFunc because the channel
requires draining in some cases.
Alarm is optimized for use cases where items are tracked in a heap according to their expiry
time, and a goroutine with a for/select loop wants to be woken up whenever the next item expires.
In this application, the timer needs to be rescheduled when an item is added or removed
from the heap. Using a timer naively, these updates will always require synchronization
with the global runtime timer datastructure to update the timer using Reset. Alarm avoids
this by tracking the next expiry time and only modifies the timer if it would need to fire earlier
than already scheduled.
As an example use, I have converted p2p.dialScheduler to use Alarm instead of AfterFunc.
This improves readability of function 'push'.
sort.Search(N, ...) will at most return N when no match, so ix should be compared
with N. The previous version would compare ix with N+1 in case an additional item
was appended. No bug resulted from this comparison, but it's not easy to understand
why.
Co-authored-by: Felix Lange <fjl@twurst.com>
Here we add special handling for sending an error response when the write timeout of the
HTTP server is just about to expire. This is surprisingly difficult to get right, since is
must be ensured that all output is fully flushed in time, which needs support from
multiple levels of the RPC handler stack:
The timeout response can't use chunked transfer-encoding because there is no way to write
the final terminating chunk. net/http writes it when the topmost handler returns, but the
timeout will already be over by the time that happens. We decided to disable chunked
encoding by setting content-length explicitly.
Gzip compression must also be disabled for timeout responses because we don't know the
true content-length before compressing all output, i.e. compression would reintroduce
chunked transfer-encoding.
This changes the Pop method to assign the zero value before
reducing slice size. Doing so ensures the backing array does not
reference removed item values.