Found in
https://github.com/ethereum/go-ethereum/actions/runs/17803828253/job/50611300621?pr=32585
```
--- FAIL: TestClientCancelWebsocket (0.33s)
panic: read tcp 127.0.0.1:36048->127.0.0.1:38643: read: connection reset by peer [recovered, repanicked]
goroutine 15 [running]:
testing.tRunner.func1.2({0x98dd20, 0xc0005b0100})
/opt/actions-runner/_work/_tool/go/1.25.1/x64/src/testing/testing.go:1872 +0x237
testing.tRunner.func1()
/opt/actions-runner/_work/_tool/go/1.25.1/x64/src/testing/testing.go:1875 +0x35b
panic({0x98dd20?, 0xc0005b0100?})
/opt/actions-runner/_work/_tool/go/1.25.1/x64/src/runtime/panic.go:783 +0x132
github.com/ethereum/go-ethereum/rpc.httpTestClient(0xc0001dc1c0?, {0x9d5e40, 0x2}, 0xc0002bc1c0)
/opt/actions-runner/_work/go-ethereum/go-ethereum/rpc/client_test.go:932 +0x2b1
github.com/ethereum/go-ethereum/rpc.testClientCancel({0x9d5e40, 0x2}, 0xc0001dc1c0)
/opt/actions-runner/_work/go-ethereum/go-ethereum/rpc/client_test.go:356 +0x15f
github.com/ethereum/go-ethereum/rpc.TestClientCancelWebsocket(0xc0001dc1c0?)
/opt/actions-runner/_work/go-ethereum/go-ethereum/rpc/client_test.go:319 +0x25
testing.tRunner(0xc0001dc1c0, 0xa07370)
/opt/actions-runner/_work/_tool/go/1.25.1/x64/src/testing/testing.go:1934 +0xea
created by testing.(*T).Run in goroutine 1
/opt/actions-runner/_work/_tool/go/1.25.1/x64/src/testing/testing.go:1997 +0x465
FAIL github.com/ethereum/go-ethereum/rpc 0.371s
```
In `testClientCancel` we wrap the server listener in `flakeyListener`,
which schedules an unconditional close of every accepted connection
after a random delay, if the random delay is zero then the timer fires
immediately, and then the http client paniced of connection reset by
peer.
Here we add a minimum 10ms to ensure the timeout won't fire immediately.
Signed-off-by: jsvisa <delweng@gmail.com>
closes#32240#32232
The main cause for the time out is the slow json encoding of large data.
In #32240 they tried to resolve the issue by reducing the size of the
test. However as Felix pointed out, the test is still kind of confusing.
I've refactored the test so it is more understandable and have reduced
the amount of data needed to be json encoded. I think it is still
important to ensure that the default read limit is not active, so I have
retained one large (~32 MB) test case, but it's at least smaller than
the existing ~64 MB test case.
Exposing the public method to setReadLimits for Websocket RPC to
prevent OOM.
Current, Geth Server is using a default 32MB max read limit (message
size) for websocket, which is prune to being attacked for OOM. Any one
can easily launch a client to send a bunch of concurrent large request
to cause the node to crash for OOM. One example of such script that can
easily crash a Geth node running websocket server is like this:
ec830979ac/poc.go
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
---
**Description:**
- Replaced outdated GitHub wiki links with current, official
documentation URLs.
- Removed links that redirect or are no longer relevant.
- Ensured all references point to up-to-date and reliable sources.
---
This change adds a limit for RPC method names to prevent potential abuse
where large method names could lead to large response sizes.
The limit is enforced in:
- handleCall for regular RPC method calls
- handleSubscribe for subscription method calls
Added tests in websocket_test.go to verify the length limit
functionality for both regular method calls and subscriptions.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Changelog: https://golangci-lint.run/product/changelog/#1610
Removes `exportloopref` (no longer needed), replaces it with
`copyloopvar` which is basically the opposite.
Also adds:
- `durationcheck`
- `gocheckcompilerdirectives`
- `reassign`
- `mirror`
- `tenv`
---------
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Here we add distinct error messages for network timeouts and JSON parsing errors.
Note this specifically applies to HTTP connections serving a single RPC request.
Co-authored-by: Felix Lange <fjl@twurst.com>
It turns out that encoding json.RawMessage is slow because
package json basically parses the message again to ensure it is valid.
We can avoid the slowdown by encoding the entire RPC notification once,
which yields a 30% speedup.
* rpc: make subscription test faster
reduces time for TestClientSubscriptionChannelClose
from 25 sec to < 1 sec.
* trie: cache trie nodes for faster sanity check
This reduces the time spent on TestIncompleteSyncHash
from ~25s to ~16s.
* core/forkid: speed up validation test
This takes the validation test from > 5s to sub 1 sec
* core/state: improve snapshot test run
brings the time for TestSnapshotRandom from 13s down to 6s
* accounts/keystore: improve keyfile test
This removes some unnecessary waits and reduces the
runtime of TestUpdatedKeyfileContents from 5 to 3 seconds
* trie: remove resolver
* trie: only check ~5% of all trie nodes
The String() version of BlockNumberOrHash uses decimal for all block numbers, including negative ones used to indicate labels. Switch to using BlockNumber.String() which encodes it correctly for use in the JSON-RPC API.
We're trying a new named pipe library, which should hopefully fix some occasional failures in CI.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This should fix#27726. With enough load, it might happen that the SetPongHandler
callback gets invoked before the call to SetReadDeadline is made in pingLoop. When
this occurs, the socket will end up with a 30s read deadline even though it got the pong,
which will lead to a timeout.
The fix here is processing the pong on pingLoop, synchronizing with the code that
sends the ping.
Package rpc uses cgo to find the maximum UNIX domain socket path
length. If exceeded, a warning is printed. This is the only use of cgo in this
package. It seems excessive to depend on cgo just for this warning, so
we now hard-code the usual limit for Linux instead.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This adds two ways to check for subscription support. First, one can now check
whether the transport method (HTTP/WS/etc.) is capable of subscriptions using
the new Client.SupportsSubscriptions method.
Second, the error returned by Subscribe can now reliably be tested using this
pattern:
sub, err := client.Subscribe(...)
if errors.Is(err, rpc.ErrNotificationsUnsupported) {
// no subscription support
}
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR adds server-side limits for JSON-RPC batch requests. Before this change, batches
were limited only by processing time. The server would pick calls from the batch and
answer them until the response timeout occurred, then stop processing the remaining batch
items.
Here, we are adding two additional limits which can be configured:
- the 'item limit': batches can have at most N items
- the 'response size limit': batches can contain at most X response bytes
These limits are optional in package rpc. In Geth, we set a default limit of 1000 items
and 25MB response size.
When a batch goes over the limit, an error response is returned to the client. However,
doing this correctly isn't always possible. In JSON-RPC, only method calls with a valid
`id` can be responded to. Since batches may also contain non-call messages or
notifications, the best effort thing we can do to report an error with the batch itself is
reporting the limit violation as an error for the first method call in the batch. If a batch is
too large, but contains only notifications and responses, the error will be reported with
a null `id`.
The RPC client was also changed so it can deal with errors resulting from too large
batches. An older client connected to the server code in this PR could get stuck
until the request timeout occurred when the batch is too large. **Upgrading to a version
of the RPC client containing this change is strongly recommended to avoid timeout issues.**
For some weird reason, when writing the original client implementation, @fjl worked off of
the assumption that responses could be distributed across batches arbitrarily. So for a
batch request containing requests `[A B C]`, the server could respond with `[A B C]` but
also with `[A B] [C]` or even `[A] [B] [C]` and it wouldn't make a difference to the
client.
So in the implementation of BatchCallContext, the client waited for all requests in the
batch individually. If the server didn't respond to some of the requests in the batch, the
client would eventually just time out (if a context was used).
With the addition of batch limits into the server, we anticipate that people will hit this
kind of error way more often. To handle this properly, the client now waits for a single
response batch and expects it to contain all responses to the requests.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Co-authored-by: Martin Holst Swende <martin@swende.se>
ethclient accepts certain negative block number values as specifiers for the "pending",
"safe" and "finalized" block. In case of "pending", the value accepted by ethclient (-1)
did not match rpc.PendingBlockNumber (-2).
This wasn't really a problem, but other values accepted by ethclient did match the
definitions in package rpc, and it's weird to have this one special case where they don't.
To fix it, we decided to change the values of the constants rather than changing ethclient.
The constant values are not otherwise significant. This is a breaking API change, but we
believe not a dangerous one.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This changes the RPC server to ignore methods using *context.Context as parameter
and *error as return value type. Methods with such types would crash the server when
called.