Since go 1.18 reflect has `reflect.Pointer` which replaces
`reflect.Ptr`. Newer versions of `govet` will alert.
See also: https://pkg.go.dev/reflect#pkg-constants
Adds `testing_commitBlockV1`. It is the write companion of `testing_buildBlockV1`:
it builds a block from the provided payload attributes and transactions on
top of the current canonical head, inserts it, and sets it as the new
head, returning the new head hash.
---------
Co-authored-by: MariusVanDerWijden <m.vanderwijden@live.de>
This PR fixes an issue where flat states are continuously persisted
during downloadState, while the sync journal is only persisted at the
end of Sync.
As a result, an unclean shutdown can leave the on-disk flat state ahead
of the journal markers. Some persisted entries may be stale (storage
slots that should have been deleted), and these dangling entries are not
detected or fixed by subsequent state downloads.
To address this, this PR introduces a cleanup step before state
downloading begins. It removes all state entries that are not covered by
the persisted journal markers.
This PR introduces a new condition that if the local node falls behind
too much and the required BAL for catching up is very likely to be
unavailable, the entire snap sync will be restarting from scratch.
As the defined BAL retention window is weak-subjective-period which is
calculated dynamically. A more conservative threshold is used (90K
blocks) for robustness.
Apart from that, the BAL catchup will be divided into several spans and
apply one by one. It's essential to prevent the potential out-of-memory
panic of placing the entire BAL set in memory.
This PR improves the slot reservation logic in the context of snap/2.
Geth has the mechanism to reserve roughly half the peer slots for peers
supporting the snap protocol if snap syncing is needed by local node.
With the context of snap/2, this mechanism should be changed that:
we reserve the slot for the "usable snap peer", not blindly for peer
with snap extension enabled (such as legacy snap/1, which can't serve
the snap/2).
sendInvalidTxs's *eth.TransactionsPacket case iterated `txs` — the
locally-sent invalid transactions, every one of which is in `invalids`
by construction — instead of the transactions actually carried by the
received packet. As a result the loop returned "received bad tx" on the
very first TransactionsPacket the peer sent, regardless of its contents,
and never inspected what was really propagated.
Iterate msg.Items() (the decoded contents of the received packet) so the
"node must not propagate invalid txs" conformance check tests the real
condition instead of producing a false negative.
---------
Co-authored-by: Bosul Mun <bsbs8645@snu.ac.kr>
The stack primitives pop by value: pop() returns the 32-byte value
itself, so every popped operand is copied out of the stack arena before
it is used. The result side was already in place, peek returns a pointer
and binary ops write into the new stack top. This PR fixes the operand
side: pointer-returning primitives (popPtr, popPtrPeek, etc), with the
handlers rewritten to read operands directly from their arena slots.
Every popped operand paid the copy, whatever the op went on to do with
it, so this optimization covers the arithmetic and comparison ops as
much as JUMP, MSTORE, SSTORE and RETURN.
The copy is visible in the assembly. On arm64, master's opLt spends four
instructions moving the popped value through the frame, and the
comparison then reads it back from there:
LDP (R5), (R6, R7) ; load words 0 and 1 of the popped value from the
arena
LDP 16(R5), (R5, R8) ; load words 2 and 3
STP (R6, R7), vm.~r0-64(SP) ; store words 0 and 1 into a frame slot
STP (R5, R8), vm.~r0-48(SP) ; store words 2 and 3
With popPtrPeek those four instructions are gone, the frame shrinks from
locals=0x58 to locals=0x18, and the function from 336 to 288 bytes. The
compiler cannot remove the copy itself: uint256.Int is a four-element
array, and Go's SSA does not promote arrays longer than one element to
registers, so a by-value pop pays this round trip no matter how far
inlining gets, for LT exactly as for ADD.
The CALL and CREATE families are deliberately not converted: a child
frame reuses the same stack arena, so parent pointers into popped slots
die when the child pushes. The rule is recorded on the primitives:
pointers stay valid until the next push or any sub call. Converting the
call family safely means materializing scalars before the child call,
left for later work with a call-heavy benchmark to justify it.
### Benchmarks
Measured with the benchmark suite from #35144 (the evm-bench contract
workloads and the block import benchmark), which is not part of this
PR's diff. Apple M4 Max, fixed iteration counts, n=10, all p=0.000. B/op
and allocs/op are statistically identical on every benchmark:
| benchmark | master | PR | vs master |
|---|---|---|---|
| Snailtracer | 60.0 ms | 54.1 ms | -9.8% |
| TenThousandHashes | 13.2 ms | 12.2 ms | -7.8% |
| ERC20Transfer | 11.7 ms | 11.0 ms | -5.5% |
| ERC20Mint | 7.49 ms | 7.02 ms | -6.2% |
| ERC20ApprovalTransfer | 8.92 ms | 8.44 ms | -5.4% |
This PR is independent of #35144 but plays nicely with it: the generated
dispatch there splices these handler bodies, so the in-place forms land
in its fast path too, where they measure larger.
### Testing
The rewritten handlers run on the interpreter's only execution path, so
correctness rests on references outside the change:
- **Consensus fixtures.** The full tests package passes: state tests,
the execution-spec families, blockchain tests.
- **Opcode testcases.** The JSON testcases compare individual opcode
results against committed expected values.
- **Tracer fixtures.** The tracetest reference files pin exact log and
return data shapes, covering the rewritten LOG and RETURN paths.
- **Cross-build differential.** A goevmlab campaign running this
branch's evm against master's evm over generated state tests across four
forks (Prague, Cancun, London, Osaka) with full trace comparison:
160,566 tests, zero divergences.
---------
Co-authored-by: MariusVanDerWijden <m.vanderwijden@live.de>
`TestTracingHTTPTimeout` still flakes in CI after #35101, failing at the
POST:
--- FAIL: TestTracingHTTPTimeout (0.26s)
tracing_test.go:633: request: Post "http://127.0.0.1:43497": EOF
The test sets a short server `WriteTimeout` and posts a blocking call.
`ContextRequestTimeout` leaves a fixed 100ms for the server to write its
timeout response before the HTTP write deadline cuts the connection.
I can't repro it locally, but my theory is that under load that write
can miss the window, so the connection is dropped and the client POST
returns `EOF`, failing the test before it inspects the span. This is the
only test exposed to it because it is the only one that configures a
`WriteTimeout`.
The EOF is benign: the server sets the timeout error on the SERVER span
before attempting the write, independent of whether the client receives
the response. Since that span status is all the test asserts,
`tryPostJSONRPC` tolerates the transport error instead of failing on it.
This PR fixes an issue that when peers legitimately lack a requested
BAL, empty (0x80) is delivered and this BAL entry will be refetched
over and over again.
A `refused` tracker is added and catchUp will fail if this BAL is
unavailable against the entire peerset.
Implements https://eips.ethereum.org/EIPS/eip-8037
mainly done in order to judge the complexity of the EIP
and to act as a jumping off point, since the eip will likely
change.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Currently geth ignores the docker `--memory` directive and doesn't
adjust its cache size downward when necessary, potentially running into
OOM.
while gopsutil has functions like `docker/CgroupMem()` they are rather
for reading cgroup memory limit of a container from the host.
This PR adds an optimization to the `findNodeByID` function in
`p2p/discover`. There is already an open PR (#33205) for similar
improvements, and I have further optimized the function to get better
performance. I have attached the benchmark results comparing the current
`main` branch with my `optimized version`, and the results show clear
improvements.
---------
Co-authored-by: Csaba Kiraly <csaba.kiraly@gmail.com>
This PR introduces a cache for GetBlobs request.
The main purpose of this PR is to reduce the getBlobs latency by reading and
decoding blobs from the pool in advance of the actual query. This is important
especially in the context of a sparse blobpool, since it may be necessary to
recover blobs from cells on a getBlobs request.
Previously, the Engine API read and decoded blobs from the pool on every call.
Now those calls check the cache and only fall back to the pool on a miss.
The cache has two modes:
- In topK mode (default), it wakes up periodically, picks the most profitable
pending blob transactions up to the current fork's maxBlobsPerBlock, and loads
their blobs. The selection logic is shared with the miner's block-building
logic. The selection size is derived from eip4844.MaxBlobsPerBlock at the
current head.
- When the CL calls HasBlobs, the cache switches to hasBlobs mode and tries to
pin the set it just reported as available. Cache updates (read, decode, and
optionally conversion in the future) run in background goroutines.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
DeriveFields rewrites every canonical inclusion field on each log
(BlockNumber, BlockHash, BlockTimestamp, TxHash, TxIndex, Index) but left
the transient Removed flag untouched. A canonically-included log is by
definition not removed, so derivation should drive Removed back to false
just like the other derived fields.
This is a defensive consistency fix rather than a fix for a presently
reachable bug. The only place Removed is set to true is
collectReceiptsAndLogs, which operates on fresh receipts read via
ReadRawReceipts (Removed is rlp:"-", so always decoded as false), derives
fields before flagging removal, and discards those objects after emitting
the reorg event. They are never cached or re-derived, and the cached
receipts are separate instances, so no Removed==true log currently reaches
DeriveFields. Resetting the flag here hardens the function against a
reused/stale log object regardless of the calling path.
Adds snap/2 (EIP-8189), a block-access-list (BAL) based state sync, and
wires it to run side by side with snap/1. It's opt-in (for now) behind a
new --snap.v2 flag and chosen at startup.
https://eips.ethereum.org/EIPS/eip-8189
---------
Co-authored-by: Toni Wahrstätter <info@toniwahrstaetter.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
In old code, mu is struct not pointer, it caused create new mutex event
with same writer. Change to use pointer to sync.Mutex, so that the mutex
is shared between handler with same writer.
This adds a client option to configure trace context propagation via the
`traceparent` HTTP header.
I'm adding this so that prysm can enable distributed tracing on their
engine API client.
This is a PR that removes all correctly flagged typos, in order to stop
an onslaught of slop PRs in its tracks. It should be followed by #34994
but the latter needs more configuration work and I want to limit the
stem of PRs right now.
Fixes an issue where we would falsely return 400 when we should return
200 because the request was served successfully
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR is trying to stem further slop PRs by going over all incorrect
strings and fixing them.
---------
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
## Overview
This PR fixes a race condition during blockchain shutdown where snapshot
generation could continue accessing the trie database after it has been
closed, leading to iterator errors. We noticed this in one of our nodes
on https://github.com/ava-labs/avalanchego, which relies on an older
version of geth with the same issue (so this behavior does happen!).
During node shutdown, the following sequence occurs:
1. `BlockChain.Stop()` calls `snaps.Release()` to clean up snapshot
resources
2. `Release()` only resets the cache but doesn't stop the generator
goroutine
3. The trie database is then closed via `triedb.Close()`
4. The still-running generator attempts to iterate storage tries
5. Iterator fails because the database is closed (`"Generator failed to
iterate storage trie"`)
## Problem
There are three related bugs:
1. `Release()` doesn't stop generation: The `diskLayer.Release()` method
only resets the cache without stopping ongoing snapshot generation,
leaving the generator goroutine running after database closure.
2. `stopGeneration()` has an incorrect completion check: The
`stopGeneration()` method checks `genMarker != nil` to determine if
generation is running. However, `genMarker` is set to nil when
generation completes successfully, even though the generator goroutine
is still waiting for the abort signal at the end of `generate()`. See
line 705 in `generate.go`:
eaaa5b716d/core/state/snapshot/generate.go (L699-L707)
This means `stopGeneration()` returns early without sending the abort
signal.
3. Node shutdown doesn't stop generation: During shutdown, no code path
calls `stopGeneration()` or sends the abort signal to the generator,
causing the generator to access a closed database and error.
## Fix
- Modified `diskLayer.Release()` to call `stopGeneration()` before
releasing resources
- Added cancelation architecture, removing reliance on someone having to
wait
- Fixed `stopGeneration()` to properly and safely stop snapshot
generation
- Added `TestGenerateGoroutineLeak` to verify the fix and prevent
regression. The test fails without the fix and passes with it.
- The test creates a snapshot with active generation, waits for
completion, then calls `Release()`, and uses `go.uber.org/goleak` to
assert no generator goroutine survives.
- Without the fix, the test fails: `Release()` returns without stopping
the generator, which stays parked at `generate.go:705` waiting for an
abort signal that never comes:
```
--- FAIL: TestGenerateGoroutineLeak (0.88s)
generate_test.go: found unexpected goroutines:
[Goroutine 6 in state chan receive, with
core/state/snapshot.(*diskLayer).generate on top of the stack:
core/state/snapshot.(*diskLayer).generate(...)
core/state/snapshot/generate.go:705
created by core/state/snapshot.generateSnapshot
core/state/snapshot/generate.go:79 ]
```
- With the fix, the test passes: `Release()` -> `stopGeneration()`
blocks until the generator goroutine has fully exited, so nothing leaks
Note that this fix follows the same pattern used in `Tree.Disable()` in
https://github.com/ethereum/go-ethereum/pull/30040, which introduced
`stopGeneration()` for use in `Disable()` and `Rebuild()` but didn't
address the shutdown path.
The test follows the same pattern used in
`TestCheckSimBackendGoroutineLeak`
### Summary
`TestServerWebsocketReadLimit/limit_with_large_request_-_should_fail` is
flaky on `windows/amd64` (see [run
25364841576](https://github.com/ethereum/go-ethereum/actions/runs/25364841576/job/74378334589)
referenced in #34877):
```
--- FAIL: TestServerWebsocketReadLimit/limit_with_large_request_-_should_fail (0.02s)
server_test.go:279: unexpected error for read limit violation: read tcp 127.0.0.1:56703->127.0.0.1:56700:
wsarecv: An existing connection was forcibly closed by the remote host.
```
When the server enforces the read limit and tears the connection down,
the client's read can race the close frame. On Windows the OS surfaces
that race as `wsarecv: An existing connection was forcibly closed by the
remote host` instead of the gorilla `CloseError(1009)`,
`websocket.ErrReadLimit`, or the POSIX `connection reset by peer` the
test already tolerates.
This change adds `"forcibly closed"` to the set of acceptable error
substrings for the failure case, so the Windows reset is recognized as a
valid signal that the server enforced the limit.
### Fixes
#34877
### Test plan
- [x] `go test -count=5 -run TestServerWebsocketReadLimit ./rpc/`
(darwin/arm64) — pass
- [x] `go test ./rpc/...` — pass
- [x] `go vet ./rpc/...` / `gofmt -l rpc/server_test.go` — clean
- [ ] CI on `windows/amd64` confirms the flake no longer trips
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
The checksum count during EraE import is off by one when `checksums.txt`
ends its last line on a newline, as the pandaops file does. The current
code would result in one empty string after the final `\n`, something
like
```
[]string{
"line1",
"line2",
"line3",
"",
}
```
Trim off the final `\n`, if it exists: `return
strings.Split(strings.TrimRight(string(b), "\n"), "\n"), nil`
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
`clef` is a great tool, however:
* It is no longer maintained
* No one else in the team can pronounce it properly
We are however receiving some slop PRs for it, so I think it's time -
with infinite sadness - to say goodbye.
Make the `Block` parameter optional on the six state-reading methods,
defaulting to `latest` when omitted:
- `eth_getBalance`
- `eth_getCode`
- `eth_getStorageAt`
- `eth_getTransactionCount`
- `eth_getProof`
- `eth_getStorageValues`
This implements the behavior proposed in https://github.com/ethereum/execution-apis/pull/812.
---------
Co-authored-by: Sina M <1591639+s1na@users.noreply.github.com>
Update the EraE (ere) reader and builder to the latest e2store ere spec
(https://github.com/eth-clients/e2store-format-specs/pull/16).
The reader now derives the component layout from the on-disk e2store type
tags via the dynamic block index, rather than assuming fixed slot positions.
This makes the optional components (receipts, td, proof) resolvable in any
supported subset.
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
Rewrites triedb.GenerateTrie as a single partitioned pass that
reconciles stale account.Root fields and rebuilds the trie at the same
time, with 16-way parallelism and crash resume baked in.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
The per-call SERVER span ended inside `handleCall()`, so the JSON-RPC
response write happened after the span closed. For large responses like
`engine_getBlobsV*`, that write time was missing from traces.
- Extend the SERVER span past `writeJSON`.
- For batches, add a top-level `jsonrpc.batch` SERVER span (with `rpc.batch.size`) covering the whole batch including `callBuffer.write`.
- Add `rpc.writeJSON` span around the non-batch response write.
- Add `rpc.writeJSONBatch` span around the batch response write.
- Add `rpc.httpWrite` span around the actual HTTP write, separating JSON encoding from network write.
- Add additional telemetry helpers.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR is a prerequisite for landing snap v2, the BAL-healing snap sync
algorithm.
It duplicates much of the snap v1 skeleton, which is expected to be
deprecated once v2 is enabled. The code duplication is acceptable as a
short-term tradeoff, simplifying development and reducing integration
complexity.