This PR introduces a cache for GetBlobs request.
The main purpose of this PR is to reduce the getBlobs latency by reading and
decoding blobs from the pool in advance of the actual query. This is important
especially in the context of a sparse blobpool, since it may be necessary to
recover blobs from cells on a getBlobs request.
Previously, the Engine API read and decoded blobs from the pool on every call.
Now those calls check the cache and only fall back to the pool on a miss.
The cache has two modes:
- In topK mode (default), it wakes up periodically, picks the most profitable
pending blob transactions up to the current fork's maxBlobsPerBlock, and loads
their blobs. The selection logic is shared with the miner's block-building
logic. The selection size is derived from eip4844.MaxBlobsPerBlock at the
current head.
- When the CL calls HasBlobs, the cache switches to hasBlobs mode and tries to
pin the set it just reported as available. Cache updates (read, decode, and
optionally conversion in the future) run in background goroutines.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Adds snap/2 (EIP-8189), a block-access-list (BAL) based state sync, and
wires it to run side by side with snap/1. It's opt-in (for now) behind a
new --snap.v2 flag and chosen at startup.
https://eips.ethereum.org/EIPS/eip-8189
---------
Co-authored-by: Toni Wahrstätter <info@toniwahrstaetter.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
In old code, mu is struct not pointer, it caused create new mutex event
with same writer. Change to use pointer to sync.Mutex, so that the mutex
is shared between handler with same writer.
This adds a client option to configure trace context propagation via the
`traceparent` HTTP header.
I'm adding this so that prysm can enable distributed tracing on their
engine API client.
This is a PR that removes all correctly flagged typos, in order to stop
an onslaught of slop PRs in its tracks. It should be followed by #34994
but the latter needs more configuration work and I want to limit the
stem of PRs right now.
Fixes an issue where we would falsely return 400 when we should return
200 because the request was served successfully
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR is trying to stem further slop PRs by going over all incorrect
strings and fixing them.
---------
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
## Overview
This PR fixes a race condition during blockchain shutdown where snapshot
generation could continue accessing the trie database after it has been
closed, leading to iterator errors. We noticed this in one of our nodes
on https://github.com/ava-labs/avalanchego, which relies on an older
version of geth with the same issue (so this behavior does happen!).
During node shutdown, the following sequence occurs:
1. `BlockChain.Stop()` calls `snaps.Release()` to clean up snapshot
resources
2. `Release()` only resets the cache but doesn't stop the generator
goroutine
3. The trie database is then closed via `triedb.Close()`
4. The still-running generator attempts to iterate storage tries
5. Iterator fails because the database is closed (`"Generator failed to
iterate storage trie"`)
## Problem
There are three related bugs:
1. `Release()` doesn't stop generation: The `diskLayer.Release()` method
only resets the cache without stopping ongoing snapshot generation,
leaving the generator goroutine running after database closure.
2. `stopGeneration()` has an incorrect completion check: The
`stopGeneration()` method checks `genMarker != nil` to determine if
generation is running. However, `genMarker` is set to nil when
generation completes successfully, even though the generator goroutine
is still waiting for the abort signal at the end of `generate()`. See
line 705 in `generate.go`:
eaaa5b716d/core/state/snapshot/generate.go (L699-L707)
This means `stopGeneration()` returns early without sending the abort
signal.
3. Node shutdown doesn't stop generation: During shutdown, no code path
calls `stopGeneration()` or sends the abort signal to the generator,
causing the generator to access a closed database and error.
## Fix
- Modified `diskLayer.Release()` to call `stopGeneration()` before
releasing resources
- Added cancelation architecture, removing reliance on someone having to
wait
- Fixed `stopGeneration()` to properly and safely stop snapshot
generation
- Added `TestGenerateGoroutineLeak` to verify the fix and prevent
regression. The test fails without the fix and passes with it.
- The test creates a snapshot with active generation, waits for
completion, then calls `Release()`, and uses `go.uber.org/goleak` to
assert no generator goroutine survives.
- Without the fix, the test fails: `Release()` returns without stopping
the generator, which stays parked at `generate.go:705` waiting for an
abort signal that never comes:
```
--- FAIL: TestGenerateGoroutineLeak (0.88s)
generate_test.go: found unexpected goroutines:
[Goroutine 6 in state chan receive, with
core/state/snapshot.(*diskLayer).generate on top of the stack:
core/state/snapshot.(*diskLayer).generate(...)
core/state/snapshot/generate.go:705
created by core/state/snapshot.generateSnapshot
core/state/snapshot/generate.go:79 ]
```
- With the fix, the test passes: `Release()` -> `stopGeneration()`
blocks until the generator goroutine has fully exited, so nothing leaks
Note that this fix follows the same pattern used in `Tree.Disable()` in
https://github.com/ethereum/go-ethereum/pull/30040, which introduced
`stopGeneration()` for use in `Disable()` and `Rebuild()` but didn't
address the shutdown path.
The test follows the same pattern used in
`TestCheckSimBackendGoroutineLeak`
### Summary
`TestServerWebsocketReadLimit/limit_with_large_request_-_should_fail` is
flaky on `windows/amd64` (see [run
25364841576](https://github.com/ethereum/go-ethereum/actions/runs/25364841576/job/74378334589)
referenced in #34877):
```
--- FAIL: TestServerWebsocketReadLimit/limit_with_large_request_-_should_fail (0.02s)
server_test.go:279: unexpected error for read limit violation: read tcp 127.0.0.1:56703->127.0.0.1:56700:
wsarecv: An existing connection was forcibly closed by the remote host.
```
When the server enforces the read limit and tears the connection down,
the client's read can race the close frame. On Windows the OS surfaces
that race as `wsarecv: An existing connection was forcibly closed by the
remote host` instead of the gorilla `CloseError(1009)`,
`websocket.ErrReadLimit`, or the POSIX `connection reset by peer` the
test already tolerates.
This change adds `"forcibly closed"` to the set of acceptable error
substrings for the failure case, so the Windows reset is recognized as a
valid signal that the server enforced the limit.
### Fixes
#34877
### Test plan
- [x] `go test -count=5 -run TestServerWebsocketReadLimit ./rpc/`
(darwin/arm64) — pass
- [x] `go test ./rpc/...` — pass
- [x] `go vet ./rpc/...` / `gofmt -l rpc/server_test.go` — clean
- [ ] CI on `windows/amd64` confirms the flake no longer trips
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
The checksum count during EraE import is off by one when `checksums.txt`
ends its last line on a newline, as the pandaops file does. The current
code would result in one empty string after the final `\n`, something
like
```
[]string{
"line1",
"line2",
"line3",
"",
}
```
Trim off the final `\n`, if it exists: `return
strings.Split(strings.TrimRight(string(b), "\n"), "\n"), nil`
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
`clef` is a great tool, however:
* It is no longer maintained
* No one else in the team can pronounce it properly
We are however receiving some slop PRs for it, so I think it's time -
with infinite sadness - to say goodbye.
Make the `Block` parameter optional on the six state-reading methods,
defaulting to `latest` when omitted:
- `eth_getBalance`
- `eth_getCode`
- `eth_getStorageAt`
- `eth_getTransactionCount`
- `eth_getProof`
- `eth_getStorageValues`
This implements the behavior proposed in https://github.com/ethereum/execution-apis/pull/812.
---------
Co-authored-by: Sina M <1591639+s1na@users.noreply.github.com>
Update the EraE (ere) reader and builder to the latest e2store ere spec
(https://github.com/eth-clients/e2store-format-specs/pull/16).
The reader now derives the component layout from the on-disk e2store type
tags via the dynamic block index, rather than assuming fixed slot positions.
This makes the optional components (receipts, td, proof) resolvable in any
supported subset.
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
Rewrites triedb.GenerateTrie as a single partitioned pass that
reconciles stale account.Root fields and rebuilds the trie at the same
time, with 16-way parallelism and crash resume baked in.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
The per-call SERVER span ended inside `handleCall()`, so the JSON-RPC
response write happened after the span closed. For large responses like
`engine_getBlobsV*`, that write time was missing from traces.
- Extend the SERVER span past `writeJSON`.
- For batches, add a top-level `jsonrpc.batch` SERVER span (with `rpc.batch.size`) covering the whole batch including `callBuffer.write`.
- Add `rpc.writeJSON` span around the non-batch response write.
- Add `rpc.writeJSONBatch` span around the batch response write.
- Add `rpc.httpWrite` span around the actual HTTP write, separating JSON encoding from network write.
- Add additional telemetry helpers.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR is a prerequisite for landing snap v2, the BAL-healing snap sync
algorithm.
It duplicates much of the snap v1 skeleton, which is expected to be
deprecated once v2 is enabled. The code duplication is acceptable as a
short-term tradeoff, simplifying development and reducing integration
complexity.
This PR implements flat-file storage for finalized block access lists,
specifically:
* The freezer is extended with the notion of tail groups, allowing
different groups within a single freezer instance to maintain
independent tails, while all tables within the same group remain
tail-aligned.
* The freezer can now dynamically attach new tables to an existing
freezer instance, with both the table head and tail initialized to the
freezer's common head.
* A new freezer table, **bals**, has been added to the chain freezer
with its own dedicated tail group, preserving the flexibility to deploy
a tail-pruning policy different from the main chain data group.
Additionally, the BALs in the key-value store will be migrated to the
freezer instance once they are finalized or there are at least 90K block
confirmations on top acting as a "soft finalization". This freezing
policy is same with all chain segment data.
Checks the Ledger Ethereum app version before sending typed transactions
that require newer app support.
EIP-2930/EIP-1559 transactions now require Ledger app v1.9.0 or newer,
and EIP-7702 transactions require v1.17.0 or newer. Older apps now
return the same kind of local update error already used for earlier
Ledger feature gates instead of sending an unsupported transaction to
the device.
---------
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Because the UBT doesn't differentiate slots from accounts, the content
of the tree can not be exported as a `GenesisAlloc`, which means that
`evm t8n` can not intergrate it. We have tried integrating the new
format into execution-specs, but this is very hard to maintain because
the team doesn't see it as a priority and their own repository is seeing
a lot of churn. This PR adds the ability to capture the structure of
what is being inserted in the tree, so that the information isn't lost
and it can be dumped in the t8n context.
---------
Co-authored-by: felipe <fselmo2@gmail.com>
The server already rejects empty batches with -32600. On the client
side, calling BatchCallContext with a zero-length slice on inproc/WS/IPC
transports registers no request IDs but the server still replies with an
error message whose id is null. The dispatch loop has no requestOp to
match it to, so op.resp is never written and op.wait blocks until ctx
deadline.
Short-circuit on len(b) == 0 with the same invalidRequestError the
server uses, so all transports return immediately with -32600.
Add a disableGzip parameter to NewHTTPHandlerStack and httpConfig.
initAuth sets it true so compression is disabled in the engine api.
Public HTTP RPC behavior is unchanged.
This is another one of my slop-PRs, aimed at reducing the amount of
future slop PRs by doing it all in one go.
All of the deprecated cli flags have been in that state for over a year.
It's time to remove them, especially since they are ineffective.
Note that I kept the code to report and manage deprecated cli flags, as
I assume we will be deprecating more flags in the future.
### Summary
Closes#34621.
`github.com/pion/dtls/v2` is affected by
[CVE-2026-26014](https://nvd.nist.gov/vuln/detail/CVE-2026-26014); the
fix lives in `github.com/pion/dtls/v3`. In this tree, dtls/v2 is pulled
in indirectly via `github.com/pion/stun/v2 v2.0.0` (declared at
`go.mod:53`), which is the only direct consumer — `p2p/nat/stun.go` is
the sole call site.
`github.com/pion/stun/v3` already uses dtls/v3, so bumping `stun`
upgrades the vulnerable dependency without touching `pion/dtls`
directly.
### API check
The v3 surface used by `p2p/nat/stun.go` is byte-identical in shape to
v2:
| Symbol | v2 | v3 |
|---|---|---|
| `Dial` | `func Dial(network, address string) (*Client, error)` | same
|
| `Build` | `func Build(setters ...Setter) (*Message, error)` | same |
| `TransactionID` | `var TransactionID Setter` | same |
| `BindingRequest` | `var BindingRequest = NewType(MethodBinding,
ClassRequest)` | same |
| `Event` | `type Event struct` | same |
| `XORMappedAddress` | `type XORMappedAddress struct { …
GetFrom(*Message) error }` | same |
| `DefaultPort` | `const DefaultPort = 3478` | same |
So the code change is just the import rename plus an alias rename to
keep the local label honest (`stunV2` → `stunV3`).
### Change
`go.mod` / `go.sum`:
- Replace direct `github.com/pion/stun/v2 v2.0.0` with
`github.com/pion/stun/v3 v3.0.1`.
- `go mod tidy` drops every `pion/dtls/v2` and `pion/stun/v2` entry from
`go.sum` and pulls `pion/dtls/v3 v3.0.7`, `pion/stun/v3 v3.0.1`,
`pion/transport/v3 v3.0.8` as the new indirect set.
`p2p/nat/stun.go`:
- Update the import path and rename the alias from `stunV2` to `stunV3`.
### Verification
- `go build ./p2p/nat/` clean.
- `go test ./p2p/nat/ -count=1` passes (26s).
- `grep 'pion/dtls/v2\|pion/stun/v2' go.sum` returns zero matches.
### Notes
- `pion/dtls` is not imported directly anywhere in the tree, so no other
code needs touching.
- `pion/transport/v3` was already in the dependency graph (the `stun/v3`
upgrade just bumps the patch from v3.0.1 → v3.0.8); the v2 transport
drops out cleanly.
fixes#32672
This is kind of a band aid solution since it fixes the issue by
bypassing the snap sync expectations of an empty db and attempting to
import the new payload if we're at block 1. The next FCU will set the
status to synced.
Will continue looking to better understand how the above issue arises
and find a more thorough solution.
---------
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Fixes#35033
## Problem
The GraphQL HTTP handler decoded request bodies directly before
executing the query. Unlike the JSON-RPC HTTP path, `/graphql` did not
have an explicit request body limit before JSON decoding.
A single `Decode` also stops after the first JSON value, so the handler
now requires EOF after the GraphQL request object to ensure oversized
trailing request data is not ignored.
## Changes
- Limit GraphQL request bodies to 5 MiB, matching the existing JSON-RPC
default body limit.
- Return `413 Request Entity Too Large` when the limit is exceeded.
- Require EOF after the request JSON object.
- Add regression coverage for oversized query bodies and oversized
trailing request data.
- Fix an existing GraphQL test fixture that had an unintended trailing
quote after the JSON object.
## Validation
- `gofmt -w graphql/service.go graphql/graphql_test.go`
- `go run golang.org/x/tools/cmd/goimports@latest -w graphql/service.go
graphql/graphql_test.go`
- `go test ./graphql -run TestGraphQLHTTPBodyLimit -count=1`
- `go test ./graphql -count=1`
There is currently no way for JSON-RPC clients to discover which
historical data a node can serve without probing with trial-and-error
calls and interpreting opaque error messages (`pruned history
unavailable`).
This makes it hard to build robust tooling on top of nodes that prune
their history, for example nodes started with `--history.chain
postmerge`
or with reduced `TransactionHistory`, `LogHistory`, or `StateHistory`
windows.
This PR implements `eth_capabilities` as defined in
ethereum/execution-apis#755. The method takes no parameters and returns
the current head plus six per-resource capability records:
- `state`
- `tx`
- `logs`
- `receipts`
- `blocks`
- `stateproofs`
Closes#33828
Archive nodes store the full history of transactions in the index. This PR
fixes a bug for users who provided the NoPruning field in a YAML config file.
Now geth correctly stores full transaction history if archive is configured via
YAML.
---------
Co-authored-by: Sina Mahmoodi <itz.s1na@gmail.com>
Each commit on a PR kicks off a CI run. Those CI jobs run to the finish
regardless, even when new commits have been pushed which make them stale
and useless. This change attempts to cancel any previously running job
for the same PR.
Passing `--state.size-tracking=false` currently cannot disable state
size tracking when it was enabled by the config file because the CLI
path only turns the config value on.
---------
Co-authored-by: Jared Wasinger <j-wasinger@hotmail.com>
The first NODES response sets total = min(int(response.RespCount),
totalNodesResponseLimit), With RespCount=0, total=0 but receive become
1; receive == count is never satisfied.
Return empty raw bytes when the GraphQL `Block.raw` resolver cannot load
the block body. This matches the nil handling used by the other
block-body-backed resolvers and avoids exposing RLP empty-list bytes as
raw block data.
The changes here enable us to fill tests with Amsterdam using geth EVM
bin.
This will be useful for block builder tests using `testing_buildBlockV1`
endpoint and for filling benchmarking compute and stateful tests as
Python is too slow for benchmark tests.
Tested in
[ethereum/execution-specs](https://github.com/ethereum/execution-specs)
with:
```
uv run fill --clean --fork=Amsterdam tests/amsterdam/eip7928_block_level_access_lists/test_block_access_lists.py --evm-bin=$GETH_EVM_PATH
```
- Adds tracing to the `GetBlobsV1/V2/V3`
- Adds `blobs.requested` and `blobs.filled` attributes to
`GetBlobsV1/V2/V3` spans.
- Adds tracing to `BlobTxPool().GetBlobs()`
This PR:
- Adds `engine_newPayloadWithWitnessV5`. The codebase already supports
the previous `VX`, so only `V5` was missing.
- Make the consensus witness format use the field [ordering defined in
the
spec](8d7e68f4b7/src/ethereum/forks/amsterdam/stateless_host_exec_witness.py (L175-L176))
to make it canonical.
cc @gballet
---------
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Fixes a regression where nil results from getBlobs were encoded as an empty array instead of null.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Adds a fast path for ExecutionPayloadEnvelope and BlobAndProofListV*
that bypasses encoding/json's reflection and re-validation, which are
expensive for large payloads with many blobs. Also hand-rolls the
jsonrpcMessage wire encoding in the RPC codec to avoid a second
re-validation pass when writing responses to the connection.
Resolves#33814
---------
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Co-authored-by: Felix Lange <fjl@twurst.com>
Updates the static validation logic to cover additional edge cases
(reflecting the state of the latest devnet branch, except cleaned up
slightly).
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
This PR implements the serving side of the eth71 BAL exchange messages.
Until commit 4cd7092 also contained the requesting side, but since that
part still needs more work, I'm splitting it out into a separate PR.
The test injects BALs directly into rawdb. This can be removed once BAL
generation is integrated into the chain maker.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR finally lands EIP-7928, collecting the block accessList during
the block execution and verifying against the block header.
---------
Co-authored-by: jwasinger <j-wasinger@hotmail.com>
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
This PR fixes a bug in the current blobpool `Reset` function where it
used the Transaction type instead of blobTxForPool.
Decoding transactions fetched from the pool as Transaction type
caused an error because the blobpool stores blobTxForPool types.
## Summary
The `--rpc.telemetry.sample-ratio` flag declares `Value: 1.0` and `geth
--help` advertises `(default: 1)`. In practice, however, omitting the
flag produces a sample ratio of `0`, causing
`sdktrace.TraceIDRatioBased(0)` to drop 100% of spans. Users who enable
`--rpc.telemetry` see the `OpenTelemetry trace export enabled` log line
and a clean startup, but no traces ever leave the process.
The root cause is the interaction between two pieces of code:
1. `cmd/utils/flags.go:setOpenTelemetry` (added in #34062) only copies
the flag value when `ctx.IsSet(...)` returns true:
```go
if ctx.IsSet(RPCTelemetrySampleRatioFlag.Name) {
tcfg.SampleRatio = ctx.Float64(RPCTelemetrySampleRatioFlag.Name)
}
```
That is the right pattern for "don't clobber a config-file value with
the CLI default," but it implies that something else must initialise the
field when neither source sets it.
2. `node/defaults.go:DefaultConfig` never initialises
`OpenTelemetry.SampleRatio`, leaving it at the float64 zero value.
The result for the common CLI-only user (no TOML config) is `SampleRatio
= 0` → every span is silently dropped, despite the documented default of
1.
## Change
Seed `OpenTelemetry: OpenTelemetryConfig{SampleRatio: 1.0}` in
`node.DefaultConfig` so the documented default matches runtime behavior
and the `ctx.IsSet` guard in `setOpenTelemetry` continues to do what it
was designed to do.
This PR extends the journal to track the pre-transaction values of
mutated balances, nonces, and code.
At the end of the transaction, these values are used to filter out no-op
changes, such as balance transitions from a-> b->a. These changes are
excluded from the block-level access list.
Additionally, there is a dedicated `bal.ConstructionBlockAccessList`
objects for gathering the state reads and writes within the current
transaction. These state writes will be keyed by the block accessList
index.
---------
Co-authored-by: jwasinger <j-wasinger@hotmail.com>
This PR introduces OnGasChangeV2 tracing hook, as the pre-requisite for landing
EIP-8037.
---------
Co-authored-by: Sina M <1591639+s1na@users.noreply.github.com>
Passing `--v2=false` currently still selects the v2 binding generator
because the command checks whether the flag was set.
This switches generation to use the boolean flag value, so explicit
false continues to generate legacy bindings while `--v2` keeps selecting
v2.
This PR introduces a separate transaction pool type for sparse blobpool.
In sparse blobpool, PooledTransactions message delivers transactions without
blobs, partial or full cells are downloaded by Cells message. Blobpool no longer
stores transactions with complete sidecars, and it stores transactions without
blobs, along with the corresponding cells. Because of this, a dedicated type
distinct from types.Transaction is required.
This PR introduces a type called `BlobTxForPool` and stores each sidecar field
independently, in order to bypass the assumption that a sidecar always exists as
a complete unit.
Reintroducing the conversion queue was considered, but was ultimately omitted
because type conversion should be sufficiently fast. With sparse blobpool, blob
-> cell computation would take about ~13ms per blob. Not sure whether this is
fast enough, but otherwise we can add the conversion queue later on the sparse
blobpool branch.
In b2843a11d, metrics check len(res) == len(hashes) but res is
pre-allocated with make(), so length is always equal. Partial hit metric
never fires. Count non-nil elements instead.
---------
Co-authored-by: Bosul Mun <bsbs8645@snu.ac.kr>
This is a refactoring PR to wrap all pre/post-execution system calls as
the exported functions, eliminating the duplicated system calls across
the codebase.
There are a few things unchanged but worths highlight:
- ChainMaker is left as unchanged, a significant rewrite is required
- BeaconRoot in header should be non-nil if Cancun is enabled
---------
Co-authored-by: jwasinger <j-wasinger@hotmail.com>
In the --create path, execFunc returns gasLeft as the second return
value, but the rest of the code treats this value as "gas used" (printed
as such, and compared in timedExec). This makes gas reporting incorrect
and can cause benchmark consistency checks to fail.
Every tracer that implements Stop/GetResult held a `reason error` field
that is written by Stop (called from the trace-timeout watchdog
goroutine in api.go) and read by GetResult (called by the RPC handler
main goroutine). These accesses were unsynchronized.
Passing `--dev=false` currently still enters the dev-mode startup path
because a couple of branches check whether the flag was set, not its
boolean value.
This switches those branches to use `ctx.Bool`, so explicit false does
not start dev mode or emit a dev genesis, while `--dev` keeps its
existing behavior.
Removes the appveyor.yml since we moved to github runners.
---------
Co-authored-by: Sina Mahmoodi <itz.s1na@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
I was tracing a signature verification issue in a nocgo build and found
that `VerifySignature` doesn't validate hash length. #33104 added the
check to `Sign` and `sigToPub` but missed this one. The cgo path in
`secp256k1/secp256.go` already rejects non-32-byte hashes, so the nocgo
path should do the same — otherwise a wrong-length hash gets passed to
decred's `Verify` and silently gives a bogus result.
Passing `--graphql=false` currently still registers the GraphQL handler
because the startup path checks whether the flag was set, not its
boolean value.
This switches the registration condition to use `ctx.Bool`, so explicit
false disables GraphQL while the default behavior remains unchanged.
`GetBlobs` returned early when `CellProofsAt` reported
corrupted/out-of-bounds proofs, dropping every blob already collected
and aborting the remaining hashes — a single bad sidecar killed the
whole Engine API batch for consensus clients. Replaced the `return nil,
nil, nil, err` with `log.Error + continue` so the slot stays `nil` per
the sparse-array contract, matching the store/RLP/nil-sidecar branches a
few lines above.
Fixes#34881
This fixes a hang in `Table.waitForNodes`. It is a replacement for PRs
#34890, #33665 which tried to fix the same issue in a different way.
- #34890 doesn't really fix the issue, just makes it less likely
- #33665 tries to fix it by moving the feed send outside of the lock
I created this PR because I want to keep the synchronous node feed
sending in `Table.nodeAdded`.
---------
Co-authored-by: Csaba Kiraly <csaba.kiraly@gmail.com>
This fixes an issue where packets send to the `Unhandled` channel
configured on discv4 could be corrupted when the packet buffer gets
reused.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This PR makes a small update to the `Pending()` method in the legacy
pool. By changing the lock from exclusive to read-only, it aims to
improve concurrency performance.
---------
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
This is an alternative PR for
https://github.com/ethereum/go-ethereum/pull/34746.
This PR implements the second approach among the two possible solutions
mentioned in the above PR.
Requests for unavailable items are possible when the peer is following a
different fork from us. However this is not expected to happen
frequently. Considering the amount of complexity added to the codebase,
the simpler approach (this PR) can be preferred.
This PR adds `AdoptSyncedState()` alongside `Enable()`. It does the same
pathdb bookkeeping (now factored into a shared `resetForReactivation()`
helper), but skips the regeneration. The wiring/calling code lands in
#34626
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
When `io.Copy` succeeds but the buffered `Close` fails (e.g. disk full
on `Flush`), the error was swallowed and verification reported a
misleading hash mismatch instead of the real I/O failure. Keep the
`Close` error when `io.Copy` didn't already produce one.
---------
Co-authored-by: Jared Wasinger <j-wasinger@hotmail.com>
## Summary
Fixes#31917.
`geth era-download` now only prints `is stale` when an existing
downloaded file fails checksum verification. Missing files are still
downloaded normally, but no longer get mislabeled as stale.
## Why
`DownloadFile` used `verifyHash` for both missing files and checksum
mismatches, then printed `is stale` for any error. This made first-time
downloads look like corrupt or outdated files.
## Validation
- `make all`
- `go run ./build/ci.go test`
- `go run ./build/ci.go lint`
- `go run ./build/ci.go check_generate`
- `go run ./build/ci.go check_baddeps`
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
The refactor from `for el := plist.Front(); ...; el = el.Next()` to the
new `iterList` iterator in #34743 silently dropped two things needed by
resetTimeout:
1. `nextTimeout = el.Value.(*replyMatcher)` at the top of the loop. This
assignment is what gives `nextTimeout` its documented meaning ("head of
plist when timeout was last reset"), and what makes the early-return
optimization at the top of resetTimeout work. Without it, nextTimeout is
only ever written to nil, so `nextTimeout == plist.Front().Value` is
always false and the optimization is dead.
2. `nextTimeout.errc <- errClockWarp` in the clock-warp branch now reads
a stale or nil pointer. Prior to the refactor, the inner assignment kept
nextTimeout pointing at the current matcher so its errc was the right
channel to receive the errClockWarp signal. After the refactor, on first
entry into the clock-warp branch nextTimeout is nil, which panics the
UDPv4 loop goroutine with a nil pointer deref and takes discv4 down.
Re-assign `nextTimeout = p` at the head of the loop (restoring the
documented invariant) and send the clock-warp error on `p.errc` rather
than the now-stale `nextTimeout.errc`.
The clock-warp branch triggers only when the system clock jumps backward
after a deadline is assigned (deadline - time.Now() >= 2*respTimeout,
i.e. at least ~500ms backward jump), which is why this regression
slipped past CI - it is not exercised by any existing unit test, and
writing one would require plumbing a clock through the loop.
The reconstruct callback indexes parallel response slices (bodies,
receipts). Passing the accept counter used the wrong element when an
earlier header in the same batch hit a stale slot.
the gapped queue cap was effectively per-sender rather than total — a
sender pool spread across enough distinct addresses could grow
`p.gapped` well past `maxGapped`, defeating the resource bound.
`maxGapped` was being compared against `len(p.gapped)`, which is a
`map[address][]tx` and counts unique senders, not queued txs. Switched
the check to `len(p.gappedSource)` (keyed by tx hash, so its length is
the real total). Also wired up a `blobpool/gapped/count` gauge plus
`promoted`, `evicted`, and `gappedfull` meters so queue size and churn
are actually observable in prod.
Disables the recently added log indexer from a simulated backend.
In most cases the log indexer is not required and unindexed search
should be fast enough.
Fixes https://github.com/ethereum/go-ethereum/issues/32552.
This updates the typed `ethclient` model for `eth_simulateV1` call
results to include `maxUsedGas`, matching the field already returned by
the server-side RPC response.
Follow-up to #32789.
The mux tracer fanned out every standard hook to its children but never
forwarded OnSystemCall{Start,End}. Tracers that rely on these - like
`logger.jsonLogger`, which uses the start hook to silence its opcode
hook for the duration of a system call - never got the signal when
wrapped behind a mux.
In evm t8n, combining `--trace` with `--opcode-count` (default for geth
with exec specs) produces exactly that wrapping. The first system call
(e.g. `ProcessBeaconBlockRoot`) then fires `OnOpcode` on the json logger
before any `OnTxStart` has run, dereferencing a nil env and crashing
t8n.
Forward both hooks through the mux. The V2 fan-out falls back to V1 for
children that only implement the legacy hook, mirroring the precedence
already used in `core/state_processor.go`.
This PR addresses one of the biggest performance issue with binary
tries: storing each internal node individually bloats the index, the
disk, and triggers a lot of write amplifications. To fix this issue,
this PR serializes groups of nodes together.
Because we are still looking for the ideal group size, the "depth" of
the group tree is made a parameter, but that will be removed in the
future, once the perfect size is known.
This is a rebase of #33658
---------
Co-authored-by: Copilot <copilot@github.com>
Next() function in RawIterator returned true on decompression errors.Now it
returns false on those cases. Redundant error check on cmd/era/main.go is also
removed.
---------
Co-authored-by: Bosul Mun <bsbs8645@snu.ac.kr>
The layer-5 diff condition used `i > 50 || i < 85`, which is true for
almost all keys in the 0..255 loop. Use `i > 50 && i < 85` so layer 5
only covers the intended band (51..84), consistent with the snapshot
iterator test fix.
- Fixes an error shadowing issue in the deliver() function, where a
stale result from GetDeliverySlot caused the original failure to be
overwritten by errStaleDelivery.
- Adds errInvalidBody and errInvalidReceipt to the downloader error
checks to properly drop peers who sent invalid responses.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
EIP-7825 caps the transaction gas limit at `MaxTxGas`, but after
Amsterdam/EIP-8037 the transaction gas limit can include state gas
reservoir in addition to the regular gas dimension. Applying the Osaka
cap to the full `tx.Gas()` rejects otherwise valid Amsterdam
transactions that need more than `MaxTxGas` total gas because of state
gas, while their regular gas use remains within the intended limit.
This changes geth to stop applying the full transaction gas cap once
Amsterdam is active:
- txpool stateless validation no longer rejects `tx.Gas() > MaxTxGas`
under Amsterdam
- legacy pool reorg cleanup does not purge high-total-gas transactions
at the Osaka transition if Amsterdam is also active
- execution precheck mirrors the txpool behavior and does not reject
high-total-gas messages under Amsterdam
The block gas limit check remains in place, so transactions still cannot
request more total gas than the current block gas limit.
Validation run:
```
go test ./core/txpool ./core/txpool/legacypool
go test ./core -run TestStateProcessorErrors
```
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Here, we change the EVM stack implementation to use an 'arena', i.e.
a shared allocation pool for sub-call stacks. The stack is now more
GC-friendly, since it is a slice of uint256 values instead of a slice of pointers.
Code that pushes an item to the stack has been changed to get() the top
item, then overwrite it.
The PR is a rewrite/rebase of #30362.
---------
Co-authored-by: Martin Holst Swende <martin@swende.se>
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Save `el.Next()` before calling `plist.Remove(el)` so iteration
continues correctly. Previously the loop exited after removing the first
expired matcher because `Remove` invalidates the element's links.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
`StateSetWithOrigin.decode()` was missing size computation after
deserializing origin data, causing `size` to remain zero after journal
reload. Added the same calculation logic used in
`NewStateSetWithOrigin()`.
This PR updates the BAL structure definition to the latest the spec,
- Balance has been changed from [16]byte to uint256
- Storage key and value has been changed from [32]byte to uint256
- BlockAccessList has been changed from a struct to a slice of
AccountChanges
- TxIndex has been changed from uint16 to uint32
The stateReadList field introduced by #34776 to track the state access
footprint for EIP-7928 was not propagated by StateDB.Copy. Every other
per-transaction field that lives alongside it (accessList,
transientStorage, journal, witness, accessEvents) is copied explicitly,
so this field was simply missed.
After Copy the copy's stateReadList is nil while the original keeps its
entries, so the nil-safe guards on StateAccessList.AddAccount / AddState
silently drop every access recorded on the copy. For any post-Amsterdam
code path that copies a prepared state and keeps reading from the copy,
the BAL footprint becomes incomplete.
Add a Copy method on bal.StateAccessList and invoke it from
StateDB.Copy, matching the pattern used for accessList and accessEvents.
---------
Co-authored-by: jwasinger <j-wasinger@hotmail.com>
The testPeer request counters (nAccountRequests, nStorageRequests,
nBytecodeRequests, nTrienodeRequests) were plain int fields incremented
with ++. These increments happen in Request* methods that are invoked
concurrently by the Syncer from multiple goroutines
(assignBytecodeTasks, assignStorageTasks, etc.), causing a data race
reliably detected by go test -race.
Change the counters to atomic.Int64 so increments and reads are
synchronized without introducing a mutex.
Fixes races detected in TestMultiSyncManyUseless,
TestMultiSyncManyUselessWithLowTimeout,
TestMultiSyncManyUnresponsive, TestSyncWithStorageAndOneCappedPeer,
TestSyncWithStorageAndCorruptPeer, and
TestSyncWithStorageAndNonProvingPeer.
scheduleFetches.func1 is the biggest allocator in the long-duration
profile of node (11% of total alloc_space).
Each peer-iteration pre-allocated make([]common.Hash, 0, maxTxRetrievals),
even for peers that end up collecting no new hashes (all their announces
were already being fetched by someone else).
Defer the slice allocation to the first append. Peers that collect zero hashes
now pay zero allocation, which is the common case on the timeoutTrigger
path where all peers with any announces are iterated.
When `rpc.Client.Close()` is called, the TCP connection is torn down
without sending a WebSocket Close frame. The server sees `websocket:
close 1006 (abnormal closure): unexpected EOF` instead of a clean 1000
(normal closure).
### Root cause
`websocketCodec.close()` delegates to `jsonCodec.close()` which calls
`c.conn.Close()` — gorilla/websocket's `Conn.Close` explicitly "[closes
the underlying network connection without sending or waiting for a close
message](https://pkg.go.dev/github.com/gorilla/websocket#Conn.Close)"
(per RFC 6455).
### Fix
Send a WebSocket Close control frame (opcode 0x8, status 1000) before
closing the underlying connection. Uses `WriteControl` with the same
`encMu` mutex pattern already used by `pingLoop` for write
serialization, and reuses the existing `wsPingWriteTimeout` (5s)
constant.
`WriteControl` errors are safe to ignore — the connection may already be
broken by the time we attempt the close frame.
Fixes#30482
This PR adds three cell-level kzg functions required for the sparse
blobpool (eth/72).
- VerifyCells: Verifies cells corresponding to proofs. This is used to
verify cells received from eth/72 peers.
- ComputeCells: Computes cells from blobs. This is needed because user
submissions and eth/71 transaction deliveries contain blobs, while
eth/72 peers expect cells.
- RecoverBlobs: Recovers blobs from partial cells. This is needed to
support both eth/71 and eth/72
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
scheduleFetches.func1 is the single biggest allocator in the Pyroscope
profile of a busy node (~13.5 GB/hr, 8% of total alloc_space). Each
peer-iteration pre-allocated 'make([]common.Hash, 0, maxTxRetrievals)'
= 8 KB, even for peers that end up collecting no new hashes (all their
announces were already being fetched by someone else).
Defer the slice allocation to the first append. Peers that collect zero
hashes now pay zero allocation, which is the common case on the
timeoutTrigger path where all peers with any announces are iterated.
New benchmarks BenchmarkScheduleFetches_{100peers_10new,
100peers_allFetching, 500peers_3new} (benchstat, 6 samples):
scenario ns/op B/op allocs/op
100p/10new unchanged unchanged unchanged (fast path)
100p/allFetching -62% -92% -20%
500p/3new -22% -44% -7%
geomean -33% -65% -9%
The rlpx ping command mishandled disconnect responses on two counts:
the error return from rlp.DecodeBytes was ignored, so decode failures
silently produced an "invalid disconnect message" error with no context;
and the decoder assumed the spec-compliant list form exclusively, while
older geth and some other implementations send the reason as a bare
byte.
Accept both wire forms (matching the legacy-tolerant behavior already
in p2p.decodeDisconnectMessage), and on decode failure include the raw
payload so operators can see exactly what the peer sent. Add a unit
test for the decoder covering both forms plus the empty-payload error
path.
This PR reverts the last change to the freebsd build, and it fixes the
_direct_ FreeBSD build.
Here, we change the upstream of github.com/karalabe/hid to its new home,
github.com/ethereum/hid. The new dependency includes a dummy.go file
that makes `go mod vendor` work.
##### Origin of the problem
Enrique is maintaining the FreeBSD ports, and FreeBSD ports only support
vendored go modules. It turns out that `go mod vendor` will not include
C files if there is no `.go` file in the directory. Since the C files
were missing for `karalabe/hid`, the ports maintainer tried to use the
version of `hidapi` that is provided by the ports. To do so, he had to
modify the way things are included. This broke the _out of ports_
FreeBSD build.
Difference to Appveyor:
- Missing 386 build. Hit some issue because user-space memory there is
around 2Gbs. Also seems generally extremely niche.
- Not doing the archive step and NSIS installer and uploads (those are
done on the builder).
This PR removes `FinalizeAndAssemble` from the consensus engine
interface
and relocates block assembly logic outside of the consensus engine.
Block assembly is consensus-agnostic. Most validations can be performed
by the caller. For example:
- Withdrawals must be nil prior to Shanghai
- After Shanghai upgrade, withdrawals must be non-nil, even if empty.
The only notable consensus-specific validation is related to uncles. In
clique,
the concept of uncles does not exist, and any block containing uncles
should
be considered invalid.
Within the block production package, the policy is to produce blocks
according
to the latest chain specification. As a result, Clique-specific block
production
is no longer supported. This tradeoff is considered acceptable.
The nodes were named using the byte representation of the path, instead
of the binary representation. This was confusing to other client devs
trying to achieve interop.
## Summary
- Add `grpc://` and `grpcs://` URL scheme support for OTLP trace export
alongside existing `http://`/`https://`
- The OTLP spec defines two transports: HTTP (port 4318) and gRPC (port
4317). Many observability backends (Jaeger, Tempo, Datadog) prefer gRPC
for lower overhead
- Both `otlptracehttp` and `otlptracegrpc` return `*otlptrace.Exporter`,
so only exporter construction changes — everything downstream (batch
processor, tracer provider, lifecycle) is untouched
- Update flag usage strings to be transport-agnostic
## Example usage
```
geth --rpc.telemetry --rpc.telemetry.endpoint grpc://localhost:4317
geth --rpc.telemetry --rpc.telemetry.endpoint grpcs://tempo-grpc.example.com:443
```
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
clarify that `ReadLastPivotNumber` returns `nil` only when snap sync has
never been attempted, since the marker is written during snap sync and
never cleared.
In the recent refactoring, the state commit logic has been abstracted,
making it more flexible to design state databases for various use cases.
For example, execution-only modes where state mutation is disabled.
As part of this change, the database interface was extended with a
Commit function. However, it currently accepts an unexported struct
`stateUpdate`, which prevents downstream projects from customizing
the state commit behavior.
To address this limitation, the stateUpdate type is now exported.
This PR separates the trie reader to mptTrieReader and ubtTrieReader for
improved readability and extensibility.
---------
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
## Summary
Replace the `BinaryNode` interface with `NodeRef uint32` indices into
typed arena pools, eliminating GC-scanned pointers from binary trie
nodes.
Inspired by [fjl's
observation](https://github.com/ethereum/go-ethereum/pull/34034#issuecomment-4075176446):
> *"if the binary trie produces such a large graph, it should probably
be changed so that the trie node type does not contain pointers. The
runtime does not scan objects that do not contain pointers, so it can
really help with the performance to build it this way."*
### The problem
CPU profiling of the binary trie (EIP-7864) showed **44% of CPU time in
garbage collection**. Each `InternalNode` held two `BinaryNode`
interface values (2 pointer-words each), and the GC scanned every one.
With ~25K `InternalNode`s in memory during block processing, this
created enormous GC pressure.
### The solution
`NodeRef` is a compact `uint32` (2-bit kind tag + 30-bit pool index).
`NodeStore` manages chunked typed pools per node kind:
- **InternalNode pool**: ZERO Go pointers (children are `NodeRef`, hash
is `[32]byte`) → noscan spans
- **HashedNode pool**: ZERO Go pointers → noscan spans
- **StemNode pool**: retains `Values [][]byte` (matching existing
format)
The serialization format is unchanged — flat InternalNode
`[type][leftHash][rightHash]` = 65 bytes.
## Benchmark: Apple M4 Pro (`--benchtime=10s --count=3`, on top of
#34021)
| Metric | Baseline | Arena | Delta |
|--------|----------|-------|-------|
| Approve (Mgas/s) | 374 | 382 | **+2.1%** |
| BalanceOf (Mgas/s) | 885 | 901 | **+1.8%** |
| Approve allocs/op | 775K | **607K** | **-21.7%** |
| BalanceOf allocs/op | 265K | **228K** | **-14.0%** |
## Benchmark: AMD EPYC 48-core (50GB state, execution-specs ERC-20, on
top of #34021 + #34032)
| Benchmark | Baseline | Arena | Delta |
|-----------|----------|-------|-------|
| erc20_approve (write) | 22.4 Mgas/s | **27.0 Mgas/s** | **+20.5%** |
| mixed_sload_sstore | 62.9 Mgas/s | **97.3 Mgas/s** | **+54.7%** |
| erc20_balanceof (read) | 180.8 Mgas/s | 167.6 Mgas/s | -7.3% (cold
cache variance) |
The arena benefit scales with heap size — the EPYC (larger heap, more GC
pressure) shows much larger gains than the M4 Pro (efficient unified
memory). The mixed workload baseline was unstable (62.9 vs 16.3 Mgas/s
between runs due to GC-induced throughput collapse); the arena
eliminates this entirely (95-97 Mgas/s, stable).
## Dependencies
Benchmarked with #34021 (H01 N+1 fix) + #34032 (R14 parallel hashing).
No code dependency — applies independently to master.
All test suites pass (`trie/bintrie` with `-race`, `core/state`,
`triedb/pathdb`, `cmd/geth`).
---------
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
This PR introduces a gasBudget struct to track the available gas for EVM
execution.
With the upcoming EIP-8037, multi-dimensional gas accounting will be
introduced, requiring multiple gas budget counters to be tracked
simultaneously. To support this, the counters are grouped into a gasBudget
structure.
This change is a prerequisite for internal refactoring in preparation
for EIP-8037.
---------
Co-authored-by: MariusVanDerWijden <m.vanderwijden@live.de>
In openFreezerFileForAppend, if Seek fails after the file is
successfully opened, the file handle is not closed, leaking a
descriptor.
Similarly in newTable, if opening the meta file fails, the
already-opened index file is not closed. And if newMetadata fails, both
the index and meta files are leaked.
Under repeated error conditions (e.g., corrupted filesystem), these
leaks accumulate and may exhaust the OS file descriptor limit, causing
cascading failures.
## Problem
`mustCopyTrie` in `core/state/database.go` panics on any trie type not
in its type switch:
```go
func mustCopyTrie(t Trie) Trie {
switch t := t.(type) {
case *trie.StateTrie:
return t.Copy()
case *transitiontrie.TransitionTrie:
return t.Copy()
default:
panic(fmt.Errorf("unknown trie type %T", t))
}
}
```
On UBT-backed databases (`state.NewUBTDatabase(...)`, used by
`blockchain.go:2124` when the triedb is configured for binary trie),
`StateDB.trie` is `*bintrie.BinaryTrie` — so every `StateDB.Copy()` call
(hit from `statedb.go:699` and the `*trie.StateTrie` branch of
`state_object.go:546`) crashes with `unknown trie type
*bintrie.BinaryTrie`.
## Fix
Add the `*bintrie.BinaryTrie` case. `BinaryTrie.Copy()` already exists
at `trie/bintrie/trie.go:372` and produces a correct deep copy — this
just wires it into the switch.
## Problem
`BinaryTrie.Commit` unconditionally walked every resolved in-memory node
and flushed it into the `NodeSet`, producing one Pebble write per
resolved internal + stem node on every block — even when the node's
on-disk blob was bitwise identical to the previous commit. On a warm
400M-state workload this meant tens of thousands of redundant 65-byte
writes per block, compounding Pebble compaction pressure on every
commit.
The existing `mustRecompute` flag tracks *hash* staleness, not
*disk-blob* staleness: after `Hash()` completes, `mustRecompute` is
cleared even though the fresh blob has not been persisted. It is
therefore insufficient for a skip-flush optimization.
## Fix
Mirror the MPT committer pattern (`trie/committer.go:51-56`) by adding a
`dirty` flag on `InternalNode` and `StemNode` with the semantics *the
on-disk blob is stale*. The flag is:
- set to `true` wherever the node is created or structurally modified
(the same call sites that already set `mustRecompute = true`);
- set to `false` only after the node has been passed to the `flushfn`
inside `CollectNodes`;
- left `false` on nodes produced by `DeserializeNodeWithHash`, matching
the *loaded from disk, already persisted* semantics.
`CollectNodes` short-circuits on `!dirty` subtrees. The propagation
invariant (an ancestor of any dirty node is itself dirty) is already
maintained by the existing `InsertValuesAtStem` / `Insert` paths, which
now mirror every `mustRecompute = true` setter with a `dirty = true`
setter.
## Benchmark
New `BenchmarkCollectNodes_SparseWrite` measures commit cost when only
one leaf changes between blocks — the common case for state updates.
10,000-stem trie, one-leaf modification + Commit per iteration, Apple M4
Pro:
| | before | after | delta |
|---|---|---|---|
| time / op | 12,653,000 ns | 7,336 ns | **~1,725×** |
| bytes / op | 107,224,740 B | 37,774 B | **~2,839×** |
| allocs / op | 80,953 | 134 | **~604×** |
End-to-end impact on a real workload depends on the
resolved-footprint-to-dirty-path ratio; the new
`TestBinaryTrieCommitIncremental` provides a structural regression guard
(asserts that a Commit following a single-leaf modification flushes a
root-to-leaf path, not the whole tree).
---
Found all of this stuff while bloating my #34706 DB to make some
benchmarks. And saw we were spending A LOT OF TIME on hashing.
Hope this helps the perf a bit. Will rebase the flat-state PR on top of
this once merged.
`timedExec` compares errors by direct interface inequality (haveErr !=
err). If execFunc returns newly constructed errors with the same message
each run, this will panic even though behavior is equivalent.
Adds a 'code' exporter to 'geth db export' that iterates over all
contract bytecode entries (CodePrefix + code_hash -> bytecode).
Usage: geth --datadir <dir> db export code code.rlp
This enables exporting contract bytecode.
This Pr implements some prerequisite changes for #34004 : split the
`CachingDB` into a `MerkleDB` and a `UBTDB`, so that very different
behaviors don't clash as much.
The transition isn't handled by this PR, but after talking to Gary we
agreed that `UBTDB` should receive another `triedb`, which will only be
loaded if the `Ended` flag is set to false in the conversion contract.
If this is too hard to achieve, it makes sense to load it regardless,
and then loading can be prevented at a later stage by adding a
`UBTTransitionFinalizationTime` in `ChainConfig`.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
`trace.noreturndata` is documented as "enable return data output" but
the flag name/value imply it disables return data. This is confusing for
users and likely inverted wording. Update the Usage string to reflect
the actual behavior (disable return data output).
Changes the log handler to check for vmodule level overrides
even for messages above the current level. This enables the user to selectively
hide messages from certain packages, among other things.
Also fixes a bug where handler instances created by WithAttr would not follow
the level setting anymore. The WithAttrs method is calledd by slog.Logger.With,
which we also use in go-ethereum to create context specific loggers with
pre-filled attributes. Under the previous implementation of WithAttrs, if the
application created a long-lived logger (for example, for a specific peer), then
that logger would not be affected by later level changes done on the top-level
logger, leading to potentially missed events.
Closes: #30717
---------
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Co-authored-by: Felix Lange <fjl@twurst.com>
Auto-enable logic for `StatelessSelfValidation` was reading CLI flag
directly via `ctx.Bool()`, bypassing the merged `cfg.EnableWitnessStats`
value. Now uses `cfg.EnableWitnessStats` so config file settings trigger
the same auto-enable behavior as CLI flags.
StateDB.Commit first commits all storage changes into the storage trie,
then updates the account metadata with the new storage root into the
account trie.
Within StateDB.Commit, the new storage trie root has already been
computed and applied as the storage root. This PR explicitly skips the
redundant storage trie root assignment for readability.
This is a copy of #34721 but against `master` (rather than
`bal-devnet-3`), as requested by @jwasinger, since the slotnum logic now
exists on `master` as well.
This PR simplifies the implementation of EIP-7610 by eliminating the
need to check storage emptiness during contract deployment.
EIP-7610 specifies that contract creation must be rejected if the
destination account has a non-zero nonce, non-empty runtime code, or
**non-empty storage**.
After EIP-161, all newly deployed contracts are initialized with a nonce
of one. As a result, such accounts are no longer eligible as deployment
targets unless they are explicitly cleared.
However, prior to EIP-161, contracts were initialized with a nonce of
zero. This made it possible to end up with accounts that have:
- zero nonce
- empty runtime code
- non-empty storage (created during constructor execution)
- non-zero balance
These edge-case accounts complicate the storage emptiness check.
In practice, contract addresses are derived using one of the following
formulas:
- `Keccak256(rlp({sender, nonce}))[12:]`
- `Keccak256([]byte{0xff}, sender, salt[:], initHash)[12:]`
As such, an existing address is not selected as a deployment target
unless a collision occurs, which is extremely unlikely.
---
Previously, verifying storage emptiness relied on GetStorageRoot.
However, with the transition to the block-based access list (BAL),
the storage root is no longer available, as computing it would require
reconstructing the full storage trie from all mutations of preceding
transactions.
To address this, this PR introduces a simplified approach: it hardcodes
the set of known accounts that have zero nonce, empty runtime code,
but non-empty storage and non-zero balance. During contract deployment,
if the destination address belongs to this set, the deployment is
rejected.
This check is applied retroactively back to genesis. Since no address
collision events have occurred in Ethereum’s history, this change does
not
alter existing behavior. Instead, it serves as a safeguard for future
state
transitions.
This fixes the remaining Hive discv5/FindnodeResults failures in the
cmd/devp2p/internal/v5test fixture.
The issue was in the simulator-side bystander behavior, not in
production discovery logic. The existing fixture could get bystanders
inserted into the remote table, but under current geth behavior they
were not stable enough to remain valid FINDNODE results. In
particular, the fixture still had a few protocol/behavior mismatches:
- incomplete WHOAREYOU recovery
- replies not consistently following the UDP envelope source
- incorrect endpoint echoing in PONG
- fixture-originated PING using the wrong ENR sequence
- bystanders answering background FINDNODE with empty NODES
That last point was important because current lookup accounting can
treat repeatedly unhelpful FINDNODE interactions as failures. As a
result, a bystander could become live via PING/PONG and still later be
dropped from the table before the final FindnodeResults assertion.
This change updates the fixture so that bystanders behave more like
stable discv5 peers:
- perform one explicit initial handshake, then switch to passive response handling
- resend the exact challenged packet when handling WHOAREYOU
- reply to the actual UDP packet source and mirror that source in PONG.ToIP / PONG.ToPort
- use the bystander’s own ENR sequence in fixture-originated PING
- prefill each bystander with the bystander ENR set and answer FINDNODE from that set
The result is that the fixture now forms a small self-consistent lookup
environment instead of a set of peers that are live but systematically
poor lookup participants.
Fixes#34108
The UDPv5 test harness (`newUDPV5Test`) uses the default `PingInterval`
of 3 seconds. When tests like `TestUDPv5_findnodeHandling` insert nodes
into the routing table via `fillTable`, the table's revalidation loop
may schedule PING packets for those nodes. Under the race detector or on
slow CI runners, the test runs long enough for revalidation to fire,
causing background pings to be written to the test pipe. The `close()`
method then finds these as unmatched packets and fails.
The fix sets `PingInterval` to a very large value in the test harness so
revalidation never fires during tests.
Verified locally: 100 iterations with `-race -count=100` pass reliably,
where previously the test would fail within ~50 iterations.
runtime.setDefaults was unconditionally assigning cfg.Random =
&common.Hash{}, which silently overwrote any caller-provided Random
value. This made it impossible to simulate a specific PREVRANDAO and
also forced post-merge rules whenever London was active, regardless of
the intended environment.
This change only initializes cfg.Random when it is nil, matching how
other fields in Config are defaulted. Existing callers that did not set
Random keep the same behavior (a non-nil zero hash still enables
post-merge semantics), while callers that explicitly set Random now get
their value respected.
This fixes a truncation bug that results in an invalid serialization of
empty EIP712.
For example:
```json
{
"method": "eth_signTypedData_v4",
"request": {
"types": {
"EIP712Domain": [
{
"name": "version",
"type": "string"
}
],
"Empty": []
},
"primaryType": "Empty",
"domain": {
"version": "0"
},
"message": {}
}
}
```
When calculating the type-hash for the stuct-hash, it will incorrectly
use `Empty)` instead of `Empty()`
# Summary
Replaces the inline `errors.New("event signature mismatch")` in
generated `UnpackXxxEvent` methods with per-event package-level sentinel
errors (e.g. `ErrTransferSignatureMismatch`,
`ErrApprovalSignatureMismatch`), allowing callers to reliably
distinguish a topic mismatch from a genuine decoding failure via
`errors.Is`.
Each event gets its own sentinel, generated via the abigen template:
```go
var ErrTransferSignatureMismatch = errors.New("event signature mismatch")
```
This scoping is intentional — it allows callers to be precise about
*which* event was mismatched, which is useful when routing logs across
multiple unpackers.
# Motivation
Previously, all errors returned from `UnpackXxxEvent` were
indistinguishable without string matching. This is especially
problematic when processing logs sourced from `eth_getBlockReceipts`,
where a caller receives the full set of logs for a block across all
contracts and event types. In that context, a signature mismatch is
expected and should be skipped, while any other error (malformed data,
topic parsing failure) indicates something is genuinely wrong and should
halt execution:
```go
for _, log := range blockLogs {
event, err := contract.UnpackTransferEvent(log)
if errors.Is(err, gen.ErrTransferSignatureMismatch) {
continue // not our event, expected
}
if err != nil {
return fmt.Errorf("unexpected decode failure: %w", err) // alert
}
// process event
}
```
**Changes:**
- `abigen` template: generates a `ErrXxxSignatureMismatch` sentinel per
event and returns it on topic mismatch instead of an inline error
- Existing generated bindings & testdata: regenerated to reflect the
update
Implements #34075
Return ErrInvalidOpCode with the executing opcode and offending
immediate for forbidden DUPN, SWAPN, and EXCHANGE operands. Extend
TestEIP8024_Execution to assert both opcode and operand for all
invalid-immediate paths.
TestUpdatedKeyfileContents was intermittently failing with:
- Emptying account file failed
- wasn't notified of new accounts
Root cause: waitForAccounts required the account list match and an
immediately readable ks.changes notification in the same instant,
creating a timing race between cache update visibility and channel
delivery.
This change keeps the same timeout window but waits until both
conditions are observed, which preserves test intent while removing the
flaky timing dependency.
Validation:
- go test ./accounts/keystore -run '^TestUpdatedKeyfileContents$'
-count=100
The comment formula showed (i+3) but the code multiplies by 9 (Lsh 3 +
add = 8+1).
This was a error when porting from upstream golang.org/x/crypto/bn256
where ξ=i+3.
Go-ethereum changed the constant to ξ=i+9 but forgot to update the inner
formula.
Two fixes for `testing_buildBlockV1`:
1. Add `omitempty` to `SlotNumber` in `ExecutableData` so it is omitted
for pre-Amsterdam payloads. The spec defines the response as
`ExecutionPayloadV3` which does not include `slotNumber`.
2. Pass `res.fees` instead of `new(big.Int)` in `BuildTestingPayload` so
`blockValue` reflects actual priority fees instead of always being zero.
Corresponding fixture update: ethereum/execution-apis#783
Fix `GetAccount` returning **wrong account data** for non-existent
addresses when the trie root is a `StemNode` (single-account trie) — the
`StemNode` branch returned `r.Values` without verifying the queried
address's stem matches.
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
`BinaryTrie.DeleteAccount` was a no-op, silently ignoring the caller's
deletion request and leaving the old `BasicData` and `CodeHash` in the
trie.
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
This tool is designed for the offline translation of an MPT database to
a binary trie. This is to be used for users who e.g. want to prove
equivalence of a binary tree chain shadowing the MPT chain.
It adds a `bintrie` command, cleanly separating the concerns.
This PR fixes https://github.com/ethereum/go-ethereum/issues/34623 by
changing the `vm.StateDB` interface:
Instead of `EmitLogsForBurnAccounts()` emitting burn logs, `LogsForBurnAccounts()
[]*types.Log` just returns these logs which are then emitted by the caller.
This way when tracing is used, `hookedStateDB.AddLog` will be used
automatically and there is no need to duplicate either the burn log
logic or the `OnLog` tracing hook.
ProcessBeaconBlockRoot (EIP-4788) and processRequestsSystemCall
(EIP-7002/7251) do not merge the EVM access events into the state after
execution. ProcessParentBlockHash (EIP-2935) already does this correctly
at line 290-291.
Without this merge, the Verkle witness will be missing the storage
accesses from the beacon root and request system calls, leading to
incomplete witnesses and potential consensus issues when Verkle
activates.
PathDB keys diff layers by state root, not by block hash. That means a
side-chain block can legitimately collide with an existing canonical diff layer
when both blocks produce the same post-state (for example same parent,
same coinbase, no txs).
Today `layerTree.add` blindly inserts that second layer. If the root
already exists, this overwrites `tree.layers[root]` and appends the same
root to the mutation lookup again. Later account/storage lookups resolve
that root to the wrong diff layer, which can corrupt reads for descendant
canonical states.
At runtime, the corruption is silent: no error is logged and no invariant check
fires. State reads against affected descendants simply return stale data
from the wrong diff layer (for example, an account balance that reflects one
fewer block reward), which can propagate into RPC responses and block
validation.
This change makes duplicate-root inserts idempotent. A second layer with
the same state root does not add any new retrievable state to a tree that is
already keyed by root; keeping the original layer preserves the existing parent
chain and avoids polluting the lookup history with duplicate roots.
The regression test imports a canonical chain of two layers followed by
a fork layer at height 1 with the same state root but a different block hash.
Before the fix, account and storage lookups at the head resolve the fork
layer instead of the canonical one. After the fix, the duplicate insert is
skipped and lookups remain correct.
This PR refactors the encoding rules for `AccessListsPacket` in the wire
protocol. Specifically:
- The response is now encoded as a list of `rlp.RawValue`
- `rlp.EmptyString` is used as a placeholder for unavailable BAL objects
This PR adds `GenerateTrie(db, scheme, root)` to the `triedb` package,
which rebuilds all tries from flat snapshot KV data. This is needed by
snap/2 sync so it can rebuild the trie after downloading the flat state.
The shared trie generation pipeline from `pathdb/verifier.go` was moved
into `triedb/internal/conversion.go` so both `GenerateTrie` and
`VerifyState` reuse the same code.
👋
This PR makes it possible to run "Amsterdam" in statetests. I'm aware
that they'll be failing and not in consensus with other clients, yet,
but it's nice to be able to run tests and see what works and what
doesn't
Before the change:
```
$ go run ./cmd/evm statetest ./amsterdam.json
[
{
"name": "00000019-mixed-1",
"pass": false,
"fork": "Amsterdam",
"error": "unexpected error: unsupported fork \"Amsterdam\""
}
]
```
After
```
$ go run ./cmd/evm statetest ./amsterdam.json
{"stateRoot": "0x25b78260b76493a783c77c513125c8b0c5d24e058b4e87130bbe06f1d8b9419e"}
[
{
"name": "00000019-mixed-1",
"pass": false,
"stateRoot": "0x25b78260b76493a783c77c513125c8b0c5d24e058b4e87130bbe06f1d8b9419e",
"fork": "Amsterdam",
"error": "post state root mismatch: got 25b78260b76493a783c77c513125c8b0c5d24e058b4e87130bbe06f1d8b9419e, want 0000000000000000000000000000000000000000000000000000000000000000"
}
]
```
In this PR, the Database interface in `core/state` has been extended
with one more function:
```go
// Iteratee returns a state iteratee associated with the specified state root,
// through which the account iterator and storage iterator can be created.
Iteratee(root common.Hash) (Iteratee, error)
```
With this additional abstraction layer, the implementation details can be hidden
behind the interface. For example, state traversal can now operate directly on
the flat state for Verkle or binary trees, which do not natively support traversal.
Moreover, state dumping will now prefer using the flat state iterator as
the primary option, offering better efficiency.
Edit: this PR also fixes a tiny issue in the state dump, marshalling the
next field in the correct way.
This PR implements the missing functionality for archive nodes by
pruning stale index data.
The current mechanism is relatively simple but sufficient for now:
it periodically iterates over index entries and deletes outdated data
on a per-block basis.
The pruning process is triggered every 90,000 new blocks (approximately
every 12 days), and the iteration typically takes ~30 minutes on a
mainnet node.
This mechanism is only applied with `gcmode=archive` enabled, having
no impact on normal full node.
This PR changes the blsync checkpoint init logic so that even if the
initialization fails with a certain server and an error log message is
printed, the server goes back to its initial state and is allowed to
retry initialization after the failure delay period. The previous logic
had an `ssDone` server state that did put the server in a permanently
unusable state once the checkpoint init failed for an apparently
permanent reason. This was not the correct behavior because different
servers behave differently in case of overload and sometimes the
response to a permanently missing item is not clearly distinguishable
from an overload response. A safer logic is to never assume anything to
be permanent and always give a chance to retry.
The failure delay formula is also fixed; now it is properly capped at
`maxFailureDelay`. The previous formula did allow the delay to grow
unlimited if a retry was attempted immediately after each delay period.
This is a breaking change in the opcode (structLog) tracer. Several fields
will have a slight formatting difference to conform to the newly established
spec at: https://github.com/ethereum/execution-apis/pull/762. The differences
include:
- `memory`: words will have the 0x prefix. Also last word of memory will be padded to 32-bytes.
- `storage`: keys and values will have the 0x prefix.
---------
Co-authored-by: Sina M <1591639+s1na@users.noreply.github.com>
Add persistent storage for Block Access Lists (BALs) in `core/rawdb/`.
This provides read/write/delete accessors for BALs in the active
key-value store.
---------
Co-authored-by: Jared Wasinger <j-wasinger@hotmail.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
In this PR, we add support for protocol version eth/70, defined by EIP-7975.
Overall changes:
- Each response is buffered in the peer’s receipt buffer when the
`lastBlockIncomplete` field is true.
- Continued request uses the same request id of its original
request(`RequestPartialReceipts`).
- Partial responses are verified in `validateLastBlockReceipt`.
- Even if all receipts for partial blocks of the request are collected,
those partial results are not sinked to the downloader, to avoid
complexity. This assumes that partial response and buffering occur only
in exceptional cases.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
`pool.signer.Sender(tx)` bypasses the sender cache used by types.Sender,
which can force an extra signature recovery for every promotable tx
(promotion runs frequently). Use `types.Sender(pool.signer, tx)` here to
keep sender derivation cached and consistent.
This PR enables the block validation of keeper in the womir/openvm zkvm.
It also fixes some issues related to building the executables in CI.
Namely, it activates the build which was actually disabled, and also
resolves some resulting build conflicts by fixing the tags.
Co-authored-by: Leo <leo@powdrlabs.com>
Improve speed of import-history command by two orders of magnitude.
Rework ImportHistory to collect up to 2500 blocks per flush instead of
flushing after each block, reducing database commit overhead.
---------
Co-authored-by: Sina Mahmoodi <itz.s1na@gmail.com>
conn.read() used the actual UDP packet source address for
codec.Decode(), but conn.write() always used tc.remoteAddr. When the
remote node is reachable via multiple Docker networks, the packet source
IP differs from tc.remoteAddr, causing a session key lookup failure in
the codec.
Use tc.remoteAddr.String() consistently in conn.read() so the session
cache key matches what was used during Encode.
In `setOpenTelemetry`, all other fields (Enabled, Endpoint, AuthUser,
AuthPassword, InstanceID, Tags) are guarded by `ctx.IsSet()` checks, so
they only override the config file when explicitly set via CLI flags.
`SampleRatio` was the only field missing this guard, causing the flag
default (`1.0`) to always overwrite whatever was loaded from the config
file.
- Fix OpenTelemetry `SampleRatio` being unconditionally overwritten by
the CLI flag default value (`1.0`), even when the user did not pass
`--rpc.telemetry.sample-ratio`
- This caused config file values for `SampleRatio` to be silently
ignored
Problem:
The max-initcode sentinel moved from core to vm, but RPC pre-check
mapping still depended on core.ErrMaxInitCodeSizeExceeded. This mismatch
could surface inconsistent error mapping when oversized initcode is
submitted through JSON-RPC.
Solution:
- Remove core.ErrMaxInitCodeSizeExceeded from the core pre-check error
set.
- Map max-initcode validation errors in RPC from
vm.ErrMaxInitCodeSizeExceeded.
- Keep the RPC error code mapping unchanged (-38025).
Impact:
- Restores consistent max-initcode error mapping after the sentinel
move.
- Preserves existing JSON-RPC client expectations for error code -38025.
- No consensus, state, or protocol behavior changes.
Fix incorrect key length calculation for `numHashPairings` in
`InspectDatabase`, introduced in #34000.
The `headerHashKey` format is `headerPrefix + num + headerHashSuffix`
(10 bytes), but the check incorrectly included `common.HashLength`,
expecting 42 bytes.
This caused all number -- hash entries to be misclassified as
unaccounted data.
Fix three issues in the binary trie NodeIterator:
1. Empty nodes now properly backtrack to parent and continue iteration
instead of terminating the entire walk early.
2. `HashedNode` resolver handles `nil` data (all-zeros hash) gracefully
by treating it as Empty rather than panicking.
3. Parent update after node resolution guards against stack underflow
when resolving the root node itself.
---------
Co-authored-by: tellabg <249254436+tellabg@users.noreply.github.com>
## Summary
In binary trie mode, `IntermediateRoot` calls `updateTrie()` once per
dirty account. But with the binary trie there is only one unified trie
(`OpenStorageTrie` returns `self`), so each call redundantly does
per-account trie setup: `getPrefetchedTrie`, `getTrie`, slice
allocations for deletions/used, and `prefetcher.used` — all for the same
trie pointer.
This PR replaces the per-account `updateTrie()` calls with a single flat
loop that applies all storage updates directly to `s.trie`. The MPT path
is unchanged. The prefetcher trie replacement is guarded to avoid
overwriting the binary trie that received updates.
This is the phase-1 counterpart to #34021 (H01). H01 fixes the commit
phase (`trie.Commit()` called N+1 times). This PR fixes the update phase
(`updateTrie()` called N times with redundant setup). Same root cause —
unified binary trie operated on per-account — different phases.
## Benchmark (Apple M4 Pro, 500K entries, `--benchtime=10s --count=3`,
on top of #34021)
| Metric | H01 baseline | H01 + this PR | Delta |
|--------|:------------:|:-------------:|:-----:|
| Approve (Mgas/s) | 368 | **414** | **+12.5%** |
| BalanceOf (Mgas/s) | 870 | 875 | +0.6% |
Should be rebased after #34021 is merged.
EIP-7928 brings state reads into consensus by recording accounts and storage accessed during execution in the block access list. As part of the spec, we need to check that there is enough gas available to cover the cost component which doesn't depend on looking up state. If this component can't be covered by the available gas, we exit immediately.
The portion of the call dynamic cost which doesn't depend on state look ups:
- EIP2929 call costs
- value transfer cost
- memory expansion cost
This PR:
- breaks up the "inner" gas calculation for each call variant into a pair of stateless/stateful cost methods
- modifies the gas calculation logic of calls to check stateless cost component first, and go out of gas immediately if it is not covered.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
\`oss-fuzz.sh\` line 38 writes \`#/bin/sh\` instead of \`#!/bin/sh\` as
the shebang of generated fuzz test runner scripts.
\`\`\`diff
-#/bin/sh
+#!/bin/sh
\`\`\`
Without the \`!\`, the kernel does not recognize the interpreter
directive.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This PR introduces a new type HistoryPolicy which captures user intent
as opposed to pruning point stored in the blockchain which persists the
actual tail of data in the database.
It is in preparation for the rolling history expiry feature.
It comes with a semantic change: if database was pruned and geth is
running without a history mode flag (or explicit keep all flag) geth
will emit a warning but continue running as opposed to stopping the
world.
## Summary
At tree depths below `log2(NumCPU)` (clamped to [2, 8]), hash the left
subtree in a goroutine while hashing the right subtree inline. This
exploits available CPU cores for the top levels of the tree where
subtree hashing is most expensive. On single-core machines, the parallel
path is disabled entirely.
Deeper nodes use sequential hashing with the existing `sync.Pool` hasher
where goroutine overhead would exceed the hash computation cost. The
parallel path uses `sha256.Sum256` with a stack-allocated buffer to
avoid pool contention across goroutines.
**Safety:**
- Left/right subtrees are disjoint — no shared mutable state
- `sync.WaitGroup` provides happens-before guarantee for the result
- `defer wg.Done()` + `recover()` prevents goroutine panics from
crashing the process
- `!bt.mustRecompute` early return means clean nodes never enter the
parallel path
- Hash results are deterministic regardless of computation order — no
consensus risk
## Benchmark (AMD EPYC 48-core, 500K entries, `--benchtime=10s
--count=3`, post-H01 baseline)
| Metric | Baseline | Parallel | Delta |
|--------|----------|----------|-------|
| Approve (Mgas/s) | 224.5 ± 7.1 | **259.6 ± 2.4** | **+15.6%** |
| BalanceOf (Mgas/s) | 982.9 ± 5.1 | 954.3 ± 10.8 | -2.9% (noise, clean
nodes skip parallel path) |
| Allocs/op (approve) | ~810K | ~700K | -13.6% |
Found this bug while implementing the Amsterdam changes t8n changes for
benchmark test filling in EELS.
Prefixes were incorrectly being stripped on requests over t8n and this
was leading to `fill` correctly catching hash mismatches on the EELS
side for some BAL tests. Though this was caught there, I think this
change might as well be cherry-picked there instead and merged to
`master`.
This PR brings this behavior to parity with EELS for Osaka filling.
There are still some quirks with regards to invalid block tests but I
did not investigate this further.
This PR improves the pbss archive mode. Initial sync
of an archive mode which has the --gcmode archive
flag enabled will be significantly sped up.
It achieves that with the following changes:
The indexer now attempts to process histories in batch whenever
possible.
Batch indexing is enforced when the node is still syncing and the local
chain
head is behind the network chain head.
In this scenario, instead of scheduling indexing frequently alongside
block
insertion, the indexer waits until a sufficient amount of history has
accumulated
and then processes it in a batch, which is significantly more efficient.
---------
Co-authored-by: Sina M <1591639+s1na@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
## Summary
**Bug fix.** In Verkle mode, all state objects share a single unified
trie (`OpenStorageTrie` returns `self`). During `stateDB.commit()`, the
main account trie is committed via `s.trie.Commit(true)`, which calls
`CollectNodes` to traverse and serialize the entire tree. However, each
dirty account's `obj.commit()` also calls `s.trie.Commit(false)` on the
**same trie object**, redundantly traversing and serializing the full
tree once per dirty account.
With N dirty accounts per block, this causes **N+1 full-tree
traversals** instead of 1. On a write-heavy workload (2250 SSTOREs),
this produces ~131 GB of allocations per block from duplicate NodeSet
creation and serialization. It also causes a latent data race from N+1
goroutines concurrently calling `CollectNodes` on shared `InternalNode`
objects.
This commit adds an `IsVerkle()` early return in `stateObject.commit()`
to skip the redundant `trie.Commit()` call.
## Benchmark (AMD EPYC 48-core, 500K entries, `--benchtime=10s
--count=3`)
| Metric | Baseline | Fixed | Delta |
|--------|----------|-------|-------|
| Approve (Mgas/s) | 4.16 ± 0.37 | **220.2 ± 10.1** | **+5190%** |
| BalanceOf (Mgas/s) | 966.2 ± 8.1 | 971.0 ± 3.0 | +0.5% |
| Allocs/op (approve) | 136.4M | 792K | **-99.4%** |
Resolves the TODO in statedb.go about the account trie commit being
"very heavy" and "something's wonky".
---------
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
cgo builds have been broken in FreeBSD ports because of the hid lib.
@enriquefynn has made a temporary patch, but the fix has been merged in
the master branch, so let's reflect that here.
The signify flag in `doWindowsInstaller` was defined as "signify key"
(with a space), making it impossible to pass via CLI (`-signify
<value>`). This meant the Windows installer signify signing was silently
never executed.
Fix by renaming the flag to "signify", consistent with `doArchive` and
`doKeeperArchive`.
The slotNumber field was being passed as a raw *uint64 to the JSON
marshaler, which serializes it as a plain decimal integer (e.g. 159).
All Ethereum JSON-RPC quantity fields must be hex-encoded per spec.
Wrap with hexutil.Uint64 to match the encoding of other numeric header
fields like blobGasUsed and excessBlobGas.
Co-authored-by: qu0b <stefan@starflinger.eu>
This PR improves `db inspect` classification accuracy in
`core/rawdb/database.go` by tightening key-shape checks for:
- `Block number->hash`
- `Difficulties (deprecated)`
Previously, both categories used prefix/suffix heuristics and could
mis-bucket unrelated entries.
Binary tree hashing is quite slow, owing to many factors. One of them is
the GC pressure that is the consequence of allocating many hashers, as a
binary tree has 4x the size of an MPT. This PR introduces an
optimization that already exists for the MPT: keep a pool of hashers, in
order to reduce the amount of allocations.
Implement EIP7954, This PR raises the maximum contract code size
to 32KiB and initcode size to 64KiB , following https://eips.ethereum.org/EIPS/eip-7954
---------
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
`TestSubscribePendingTxHashes` hangs indefinitely because pending tx
events are permanently missed due to a race condition in
`NewPendingTransactions` (and `NewHeads`). Both handlers called their
event subscription functions (`SubscribePendingTxs`,
`SubscribeNewHeads`) inside goroutines, so the RPC handler returned the
subscription ID to the client before the filter was installed in the
event loop. When the client then sent a transaction, the event fired but
no filter existed to catch it — the event was silently lost.
- Move `SubscribePendingTxs` and `SubscribeNewHeads` calls out of
goroutines so filters are installed synchronously before the RPC
response is sent, matching the pattern already used by `Logs` and
`TransactionReceipts`
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 We'd love your input! Share your thoughts on Copilot coding agent in
our [2 minute survey](https://gh.io/copilot-coding-agent-survey).
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: s1na <1591639+s1na@users.noreply.github.com>
Fixes https://github.com/ethereum/go-ethereum/issues/33907
Notably there is a behavioral change:
- Previously Geth will refuse to restart if the existing trienode
history is gapped with the state data
- With this PR, the gapped trienode history will be entirely reset and
being constructed from scratch
This PR adds a cmd tool fetchpayload which connects to a
node and gets all the information in order to create a serialized
payload that can then be passed to the zkvm.
This PR allows users to prune their nodes up to the Prague fork. It
indirectly depends on #32157 and can't really be merged before eraE
files are widely available for download.
The `--history.chain` flag becomes mandatory for `prune-history`
command. Here I've listed all the edge cases that can happen and how we
behave:
## prune-history Behavior
| From | To | Result |
|-------------|--------------|--------------------------|
| full | postmerge | ✅ prunes |
| full | postprague | ✅ prunes |
| postmerge | postprague | ✅ prunes further |
| postprague | postmerge | ❌ can't unprune |
| any | all | ❌ use import-history |
## Node Startup Behavior
| DB State | Flag | Result |
|-------------|--------------|----------------------------------------------------------------|
| fresh | postprague | ✅ syncs from Prague |
| full | postprague | ❌ "run prune-history first" |
| postmerge | postprague | ❌ "run prune-history first" |
| postprague | postmerge | ❌ "can't unprune, use import-history or fix
flag" |
| pruned | all | ✅ accepts known prune points |
The `--remove.chain` flag incorrectly described itself as selecting
"state data" for removal, which could mislead operators into removing
the wrong data category. This corrects the description to accurately
reflect that the flag targets chain data (block bodies and receipts).
This PR contains two changes:
Firstly, the finalized header will be resolved from local chain if it's
not recently announced via the `engine_newPayload`.
What's more importantly is, in the downloader, originally there are two
code paths to push forward the pivot point block, one in the beacon
header fetcher (`fetchHeaders`), and another one is in the snap content
processer (`processSnapSyncContent`).
Usually if there are new blocks and local pivot block becomes stale, it
will firstly be detected by the `fetchHeaders`. `processSnapSyncContent`
is fully driven by the beacon headers and will only detect the stale pivot
block after synchronizing the corresponding chain segment. I think the
detection here is redundant and useless.
We got a report for a bug in the tracing journal which has the
responsibility to emit events for all state that must be reverted.
The edge case is as follows: on CREATE operations the nonce is
incremented. When a create frame reverts, the nonce increment associated
with it does **not** revert. This works fine on master. Now one step
further: if the parent frame reverts tho, the nonce **should** revert
and there is the bug.
This PR fixes a regression introduced in https://github.com/ethereum/go-ethereum/pull/33836/changes
Before PR 33836, running mainnet would automatically bump the cache size
to 4GB and trigger a cache re-calculation, specifically setting the key-value
database cache to 2GB.
After PR 33836, this logic was removed, and the cache value is no longer
recomputed if no command line flags are specified. The default key-value
database cache is 512MB.
This PR bumps the default key-value database cache size alongside the
default cache size for other components (such as snapshot) accordingly.
I observed failing tests in Hive `engine-withdrawals`:
-
https://hive.ethpandaops.io/#/test/generic/1772351960-ad3e3e460605c670efe1b4f4178eb422?testnumber=146
-
https://hive.ethpandaops.io/#/test/generic/1772351960-ad3e3e460605c670efe1b4f4178eb422?testnumber=147
```shell
DEBUG (Withdrawals Fork on Block 2): NextPayloadID before getPayloadV2:
id=0x01487547e54e8abe version=1
>> engine_getPayloadV2("0x01487547e54e8abe")
<< error: {"code":-38005,"message":"Unsupported fork"}
FAIL: Expected no error on EngineGetPayloadV2: error=Unsupported fork
```
The same failure pattern occurred for Block 3.
Per Shanghai engine_getPayloadV2 spec, pre-Shanghai payloads should be
accepted via V2 and returned as ExecutionPayloadV1:
- executionPayload: ExecutionPayloadV1 | ExecutionPayloadV2
- ExecutionPayloadV1 MUST be returned if payload timestamp < Shanghai
timestamp
- ExecutionPayloadV2 MUST be returned if payload timestamp >= Shanghai
timestamp
Reference:
-
https://github.com/ethereum/execution-apis/blob/main/src/engine/shanghai.md#engine_getpayloadv2
Current implementation only allows GetPayloadV2 on the Shanghai fork
window (`[]forks.Fork{forks.Shanghai}`), so pre-Shanghai payloads are
rejected with Unsupported fork.
If my interpretation of the spec is incorrect, please let me know and I
can adjust accordingly.
---------
Co-authored-by: muzry.li <muzry.li1@ambergroup.io>
Updates go-eth-kzg to
https://github.com/crate-crypto/go-eth-kzg/releases/tag/v1.5.0
Significantly reduces the allocations in VerifyCellProofBatch which is
around ~5% of all allocations on my node
---------
Co-authored-by: Guillaume Ballet <3272758+gballet@users.noreply.github.com>
Reduce allocations in calculation of tx cost.
---------
Co-authored-by: weixie.cui <weixie.cui@okg.com>
Co-authored-by: Sina M <1591639+s1na@users.noreply.github.com>
`GenerateChain` commits trie nodes asynchronously, and it can happen
that some nodes aren't making it to the db in time for `GenerateChain`
to open it and find the data it is looking for.
This is an optimization that existed for verkle and the MPT, but that
got dropped during the rebase.
Mark the nodes that were modified as needing recomputation, and skip the
hash computation if this is not needed. Otherwise, the whole tree is
hashed, which kills performance.
The computation of `MAIN_STORAGE_OFFSET` was incorrect, causing the last
byte of the stem to be dropped. This means that there would be a
collision in the hash computation (at the preimage level, not a hash
collision of course) if two keys were only differing at byte 31.
Eth currently has a flaky test, related to the tx fetcher.
The issue seems to happen when Unsubscribe is called while sub is nil.
It seems that chain.Stop() may be invoked before the loop starts in some
tests, but the exact cause is still under investigation through repeated
runs. I think this change will at least prevent the error.
The BatchSpanProcessor queue size was incorrectly set to
DefaultMaxExportBatchSize (512) instead of DefaultMaxQueueSize (2048).
I noticed the issue on bloatnet when analyzing the block building
traces. During a particular run, the miner was including 1000
transactions in a single block. When telemetry is enabled, the miner
creates a span for each transaction added to the block. With the queue
capped at 512, spans were silently dropped when production outpaced the
span export, resulting in incomplete traces with orphaned spans. While
this doesn't eliminate the possibility of drops under extreme
load, using the correct default restores the 4x buffer between queue
capacity and export batch size that the SDK was designed around.
Pebble maintains a batch pool to recycle the batch object. Unfortunately
batch object must be
explicitly returned via `batch.Close` function. This PR extends the
batch interface by adding
the close function and also invoke batch.Close in some critical code
paths.
Memory allocation must be measured before merging this change. What's
more, it's an open
question that whether we should apply batch.Close as much as possible in
every invocation.
For bal-devnet-3 we need to update the EIP-8024 implementation to the
latest spec changes: https://github.com/ethereum/EIPs/pull/11306
> Note: I deleted tests not specified in the EIP bc maintaining them
through EIP changes is too error prone.
Return the Amsterdam instruction set from `LookupInstructionSet` when
`IsAmsterdam` is true, so Amsterdam rules no longer fall through to the
Osaka jump table.
---------
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
In `buildPayload()`, the background goroutine uses a `select` to wait on
the recommit timer, the stop channel, and the end timer. When both
`timer.C` and `payload.stop` are ready simultaneously, Go's `select`
picks a case non-deterministically. This means the loop can enter the
`timer.C` case and perform an unnecessary `generateWork` call even after
the payload has been resolved.
Add a non-blocking check of `payload.stop` at the top of the `timer.C`
case to exit immediately when the payload has already been delivered.
We got a report that after v1.17.0 a geth-teku node starts to time out
on engine_getBlobsV2 after around 3h of operation. The culprit seems to
be our optional http2 service which Teku attempts first. The exact cause
of the timeout is still unclear.
This PR is more of a workaround than proper fix until we figure out the
underlying issue. But I don't expect http2 to particularly benefit
engine API throughput and latency. Hence it should be fine to disable it
for now.
The payload rebuild loop resets the timer with the full Recommit
duration after generateWork returns, making the actual interval
generateWork_elapsed + Recommit instead of Recommit alone.
Since fillTransactions uses Recommit (2s) as its timeout ceiling, the
effective rebuild interval can reach ~4s under heavy blob workloads —
only 1–2 rebuilds in a 6s half-slot window instead of the intended 3.
Fix by subtracting elapsed time from the timer reset.
### Before this fix
```
t=0s timer fires, generateWork starts
t=2s fillTransactions times out, timer.Reset(2s)
t=4s second rebuild starts
t=6s CL calls getPayload — gets the t=2s result (1 effective rebuild)
```
### After
```
t=0s timer fires, generateWork starts
t=2s fillTransactions times out, timer.Reset(2s - 2s = 0)
t=2s second rebuild starts immediately
t=4s timer.Reset(0), third rebuild starts
t=6s CL calls getPayload — gets the t=4s result (3 effective rebuilds)
```
This PR introduces a threshold (relative to current market base fees),
below which we suppress the diffusion of low fee transactions. Once base
fees go down, and if the transactions were not evicted in the meantime,
we release these transactions.
The PR also updates the bucketing logic to be more sensitive, removing
the extra logarithm. Blobpool description is also
updated to reflect the new behavior.
EIP-7918 changed the maximim blob fee decrease that can happen in a
slot. The PR also updates fee jump calculation to reflect this.
---------
Signed-off-by: Csaba Kiraly <csaba.kiraly@gmail.com>
With this, we are dropping support for protocol version eth/68. The only supported
version is eth/69 now. The p2p receipt encoding logic can be simplified a lot, and
processing of receipts during sync gets a little faster because we now transform
the network encoding into the database encoding directly, without decoding the
receipts first.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
fix the flaky test found in
https://ci.appveyor.com/project/ethereum/go-ethereum/builds/53601688/job/af5ccvufpm9usq39
1. increase the timeout from 3+1s to 15s, and use timer instead of
sleep(in the CI env, it may need more time to sync the 1024 blocks)
2. add `synced.Load()` to ensure the full async chain is finished
Signed-off-by: Delweng <delweng@gmail.com>
We didn't upgrade to 1.25, so this jumps over one version. I want to
upgrade all builds to Go 1.26 soon, but let's start with the Docker
build to get a sense of any possible issues.
Previously, handshake timeouts were recorded as generic peer errors
instead of timeout errors. waitForHandshake passed a raw
p2p.DiscReadTimeout into markError, but markError classified errors only
via errors.Unwrap(err), which returns nil for non-wrapped errors. As a
result, the timeoutError meter was never incremented and all such
failures fell into the peerError bucket.
This change makes markError switch on the base error, using
errors.Unwrap(err) when available and falling back to the original error
otherwise. With this adjustment, p2p.DiscReadTimeout is correctly mapped
to timeoutError, while existing behaviour for the other wrapped sentinel
errors remains unchanged
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
inside tx.GasPrice()/GasFeeCap()/GasTipCap() already new a big.Int.
bench result:
```
goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/core
cpu: Apple M4
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
TransactionToMessage-10 240.1n ± 7% 175.1n ± 7% -27.09% (p=0.000 n=10)
│ old.txt │ new.txt │
│ B/op │ B/op vs base │
TransactionToMessage-10 544.0 ± 0% 424.0 ± 0% -22.06% (p=0.000 n=10)
│ old.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
TransactionToMessage-10 17.00 ± 0% 11.00 ± 0% -35.29% (p=0.000 n=10)
```
benchmark code:
```
// Copyright 2025 The go-ethereum Authors
// This file is part of the go-ethereum library.
//
// The go-ethereum library is free software: you can redistribute it and/or modify
// it under the terms of the GNU Lesser General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// The go-ethereum library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public License
// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>.
package core
import (
"math/big"
"testing"
"github.com/ethereum/go-ethereum/common"
"github.com/ethereum/go-ethereum/core/types"
"github.com/ethereum/go-ethereum/crypto"
"github.com/ethereum/go-ethereum/params"
)
// BenchmarkTransactionToMessage benchmarks the TransactionToMessage function.
func BenchmarkTransactionToMessage(b *testing.B) {
key, _ := crypto.GenerateKey()
signer := types.LatestSigner(params.TestChainConfig)
to := common.HexToAddress("0x000000000000000000000000000000000000dead")
// Create a DynamicFeeTx transaction
txdata := &types.DynamicFeeTx{
ChainID: big.NewInt(1),
Nonce: 42,
GasTipCap: big.NewInt(1000000000), // 1 gwei
GasFeeCap: big.NewInt(2000000000), // 2 gwei
Gas: 21000,
To: &to,
Value: big.NewInt(1000000000000000000), // 1 ether
Data: []byte{0x12, 0x34, 0x56, 0x78},
AccessList: types.AccessList{
types.AccessTuple{
Address: common.HexToAddress("0x0000000000000000000000000000000000000001"),
StorageKeys: []common.Hash{
common.HexToHash("0x0000000000000000000000000000000000000000000000000000000000000001"),
},
},
},
}
tx, _ := types.SignNewTx(key, signer, txdata)
baseFee := big.NewInt(1500000000) // 1.5 gwei
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_, err := TransactionToMessage(tx, signer, baseFee)
if err != nil {
b.Fatal(err)
}
}
}
l
```
From the https://eips.ethereum.org/EIPS/eip-7928
> SELFDESTRUCT (in-transaction): Accounts destroyed within a transaction
MUST be included in AccountChanges without nonce or code changes.
However, if the account had a positive balance pre-transaction, the
balance change to zero MUST be recorded. Storage keys within the self-destructed
contracts that were modified or read MUST be included as a storage_reads
entry.
The storage read against the empty contract (zero storage) should also
be recorded in the BAL's readlist.
https://eips.ethereum.org/EIPS/eip-7928 spec:
> Precompiled contracts: Precompiles MUST be included when accessed.
If a precompile receives value, it is recorded with a balance change.
Otherwise, it is included with empty change lists.
The precompiled contracts are not explicitly touched when they are
invoked since Amsterdam fork.
The fetcher should not fetch transactions that are already on chain.
Until now we were only checking in the txpool, but that does not have
the old transaction. This was leading to extra fetches of transactions
that were announced by a peer but are already on chain.
Here we extend the check to the chain as well.
All five `revert*Request` functions (account, bytecode, storage,
trienode heal, bytecode heal) remove the request from the tracked set
but never restore the peer to its corresponding idle pool. When a
request times out and no response arrives, the peer is permanently lost
from the idle pool, preventing new work from being assigned to it.
In normal operation mode (snap-sync full state) this bug is masked by
pivot movement (which resets idle pools via new Sync() cycles every ~15
minutes) and peer churn (reconnections re-add peers via Register()).
However in scenarios like the one I have running my (partial-stateful
node)[https://github.com/ethereum/go-ethereum/pull/33764] with
long-running sync cycles and few peers, all peers can eventually leak
out of the idle pools, stalling sync entirely.
Fix: after deleting from the request map, restore the peer to its idle
pool if it is still registered (guards against the peer-drop path where
Unregister already removed the peer). This mirrors the pattern used in
all five On* response handlers.
This only seems to manifest in peer-thirstly scenarios as where I find
myself when testing snapsync for the partial-statefull node).
Still, thought was at least good to raise this point. Unsure if required
to discuss or not
Implements the new eth_getStorageValues method. It returns storage
values for a list of contracts.
Spec: https://github.com/ethereum/execution-apis/pull/756
---------
Co-authored-by: Sina Mahmoodi <itz.s1na@gmail.com>
The PR exposes the InfuxDB reporting interval as a CLI parameter, which
was previously fixed 10s. Default is still kept at 10s.
Note that decreasing the interval comes with notable extra traffic and
load on InfluxDB.
This changes the challenge resend logic again to use the existing
`ChallengeData` field of `v5wire.Whoareyou` instead of storing a second
copy of the packet in `Whoareyou.Encoded`. It's more correct this way
since `ChallengeData` is supposed to be the data that is used by the ID
verification procedure.
Also adapts the cross-client test to verify this behavior.
Follow-up to #31543
In src/ethereum/forks/amsterdam/vm/interpreter.py:299-304, the caller
address is
only tracked for block level accessList when there's a value transfer:
```python
if message.should_transfer_value and message.value != 0:
# Track value transfer
sender_balance = get_account(state, message.caller).balance
recipient_balance = get_account(state, message.current_target).balance
track_address(message.state_changes, message.caller) # Line 304
```
Since system transactions have should_transfer_value=False and value=0,
this condition is never met, so the caller (SYSTEM_ADDRESS) is not
tracked.
This condition is applied for the syscall in the geth implementation,
aligning with the spec of EIP7928.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Adds `--opcode.count=<file>` flag to `evm t8n` that writes per-opcode
execution frequency counts to a JSON file (relative to
`--output.basedir`).
---------
Co-authored-by: MariusVanDerWijden <m.vanderwijden@live.de>
Co-authored-by: Sina Mahmoodi <itz.s1na@gmail.com>
- **Keep changes minimal and focused.** Only modify code directly related to the task at hand. Do not refactor unrelated code, rename existing variables or functions for style, or bundle unrelated fixes into the same commit or PR.
- **Do not add, remove, or update dependencies** unless the task explicitly requires it.
## Pre-Commit Checklist
Before every commit, run **all** of the following checks and ensure they pass:
### 1. Formatting
Before committing, always run `gofmt` and `goimports` on all modified files:
```sh
gofmt -w <modifiedfiles>
goimports -w <modifiedfiles>
```
### 2. Build All Commands
Verify that all tools compile successfully:
```sh
make all
```
This builds all executables under `cmd/`, including `keeper` which has special build requirements.
### 3. Tests
While iterating during development, use `-short` for faster feedback:
```sh
go run ./build/ci.go test -short
```
Before committing, run the full test suite **without**`-short` to ensure all tests pass, including the Ethereum execution-spec tests and all state/block test permutations:
```sh
go run ./build/ci.go test
```
### 4. Linting
```sh
go run ./build/ci.go lint
```
This runs additional style checks. Fix any issues before committing.
### 5. Generated Code
```sh
go run ./build/ci.go check_generate
```
Ensures that all generated files (e.g., `gen_*.go`) are up to date. If this fails, first install the required code generators by running `make devtools`, then run the appropriate `go generate` commands and include the updated files in your commit.
### 6. Dependency Hygiene
```sh
go run ./build/ci.go check_baddeps
```
Verifies that no forbidden dependencies have been introduced.
## What to include in commits
Do not commit binaries, whether they are produced by the main build or byproducts of investigations.
## Commit Message Format
Commit messages must be prefixed with the package(s) they modify, followed by a short lowercase description:
```
<package(s)>: description
```
Examples:
- `core/vm: fix stack overflow in PUSH instruction`
- `eth, rpc: make trace configs optional`
- `cmd/geth: add new flag for sync mode`
Use comma-separated package names when multiple areas are affected. Keep the description concise.
## Pull Request Title Format
PR titles follow the same convention as commit messages:
```
<listofmodifiedpaths>: description
```
Examples:
- `core/vm: fix stack overflow in PUSH instruction`
- `trie/archiver: streaming subtree archival to fix OOM`
Use the top-level package paths, comma-separated if multiple areas are affected. Only mention the directories with functional changes, interface changes that trickle all over the codebase should not generate an exhaustive list. The description should be a short, lowercase summary of the change.
| **`geth`** | Our main Ethereum CLI client. It is the entry point into the Ethereum network (main-, test- or private net), capable of running as a full node (default), archive node (retaining all historical state) or a light node (retrieving data live). It can be used by other processes as a gateway into the Ethereum network via JSON RPC endpoints exposed on top of HTTP, WebSocket and/or IPC transports. `geth --help` and the [CLI page](https://geth.ethereum.org/docs/fundamentals/command-line-options) for command line options. |
| `clef` | Stand-alone signing tool, which can be used as a backend signer for `geth`. |
| `devp2p` | Utilities to interact with nodes on the networking layer, without running a full blockchain. |
| `abigen` | Source code generator to convert Ethereum contract definitions into easy-to-use, compile-time type-safe Go packages. It operates on plain [Ethereum contract ABIs](https://docs.soliditylang.org/en/develop/abi-spec.html) with expanded functionality if the contract bytecode is also available. However, it also accepts Solidity source files, making development much more streamlined. Please see our [Native DApps](https://geth.ethereum.org/docs/developers/dapp-developer/native-bindings) page for details. |
| `evm` | Developer utility version of the EVM (Ethereum Virtual Machine) that is capable of running bytecode snippets within a configurable environment and execution mode. Its purpose is to allow isolated, fine-grained debugging of EVM opcodes (e.g. `evm --code 60ff60ff --debug run`). |
//lint:ignore ST1005 brand name displayed on the console
returncommon.Address{},nil,fmt.Errorf("Ledger version >= 1.9.0 required for EIP-2930/EIP-1559 signing (found version v%d.%d.%d)",w.version[0],w.version[1],w.version[2])
}
casetypes.SetCodeTxType:
ifledgerVersionLessThan(w.version,1,17,0){
//lint:ignore ST1005 brand name displayed on the console
returncommon.Address{},nil,fmt.Errorf("Ledger version >= 1.17.0 required for EIP-7702 signing (found version v%d.%d.%d)",w.version[0],w.version[1],w.version[2])
//lint:ignore ST1005 brand name displayed on the console
returncommon.Address{},nil,fmt.Errorf("Ledger v%d.%d.%d doesn't support signing this transaction, please update to v1.0.3 at least",w.version[0],w.version[1],w.version[2])
}
@ -184,7 +206,7 @@ func (w *ledgerDriver) SignTypedMessage(path accounts.DerivationPath, domainHash
returnnil,accounts.ErrWalletClosed
}
// Ensure the wallet is capable of signing the given transaction
Clef can be used to sign transactions and data and is meant as a(n eventual) replacement for Geth's account management. This allows DApps to not depend on Geth's account management. When a DApp wants to sign data (or a transaction), it can send the content to Clef, which will then provide the user with context and ask for permission to sign the content. If the user grants the signing request, Clef will send the signature back to the DApp.
This setup allows a DApp to connect to a remote Ethereum node and send transactions that are locally signed. This can help in situations when a DApp is connected to an untrusted remote Ethereum node, because a local one is not available, not synchronized with the chain, or is a node that has no built-in (or limited) account management.
Clef can run as a daemon on the same machine, off a usb-stick like [USB armory](https://inversepath.com/usbarmory), or even a separate VM in a [QubesOS](https://www.qubes-os.org/) type setup.
Check out the
* [CLI tutorial](tutorial.md) for some concrete examples on how Clef works.
* [Setup docs](docs/setup.md) for information on how to configure Clef on QubesOS or USB Armory.
* [Data types](datatypes.md) for details on the communication messages between Clef and an external UI.
## Command line flags
Clef accepts the following command line options:
```
COMMANDS:
init Initialize the signer, generate secret storage
attest Attest that a js-file is to be used
setpw Store a credential for a keystore file
delpw Remove a credential for a keystore file
gendoc Generate documentation about json-rpc format
help Shows a list of commands or help for one command
GLOBAL OPTIONS:
--loglevel value log level to emit to the screen (default: 4)
--keystore value Directory for the keystore (default: "$HOME/.ethereum/keystore")
--configdir value Directory for Clef configuration (default: "$HOME/.clef")
--chainid value Chain id to use for signing (1=mainnet, 17000=Holesky) (default: 1)
--lightkdf Reduce key-derivation RAM & CPU usage at some expense of KDF strength
--nousb Disables monitoring for and managing USB hardware wallets
--pcscdpath value Path to the smartcard daemon (pcscd) socket file (default: "/run/pcscd/pcscd.comm")
--http.addr value HTTP-RPC server listening interface (default: "localhost")
--http.vhosts value Comma separated list of virtual hostnames from which to accept requests (server enforced). Accepts '*' wildcard. (default: "localhost")
--ipcdisable Disable the IPC-RPC server
--ipcpath Filename for IPC socket/pipe within the datadir (explicit paths escape it)
--http Enable the HTTP-RPC server
--http.port value HTTP-RPC server listening port (default: 8550)
--signersecret value A file containing the (encrypted) master seed to encrypt Clef data, e.g. keystore credentials and ruleset hash
--4bytedb-custom value File used for writing new 4byte-identifiers submitted via API (default: "./4byte-custom.json")
--auditlog value File used to emit audit logs. Set to "" to disable (default: "audit.log")
--rules value Path to the rule file to auto-authorize requests with
--stdio-ui Use STDIN/STDOUT as a channel for an external UI. This means that an STDIN/STDOUT is used for RPC-communication with a e.g. a graphical user interface, and can be used when Clef is started by an external process.
--stdio-ui-test Mechanism to test interface between Clef and UI. Requires 'stdio-ui'.
--advanced If enabled, issues warnings instead of rejections for suspicious requests. Default off
--suppress-bootwarn If set, does not show the warning during boot
--help, -h show help
--version, -v print the version
```
Example:
```
$ clef -keystore /my/keystore -chainid 4
```
## Security model
The security model of Clef is as follows:
* One critical component (the Clef binary / daemon) is responsible for handling cryptographic operations: signing, private keys, encryption/decryption of keystore files.
* Clef has a well-defined 'external' API.
* The 'external' API is considered UNTRUSTED.
* Clef also communicates with whatever process that invoked the binary, via stdin/stdout.
* This channel is considered 'trusted'. Over this channel, approvals and passwords are communicated.
The general flow for signing a transaction using e.g. Geth is as follows:

In this case, `geth` would be started with `--signer http://localhost:8550` and would relay requests to `eth.sendTransaction`.
## TODOs
Some snags and todos
* [ ] Clef should take a startup param "--no-change", for UIs that do not contain the capability to perform changes to things, only approve/deny. Such a UI should be able to start the signer in a more secure mode by telling it that it only wants approve/deny capabilities.
* [x] It would be nice if Clef could collect new 4byte-id:s/method selectors, and have a secondary database for those (`4byte_custom.json`). Users could then (optionally) submit their collections for inclusion upstream.
* [ ] It should be possible to configure Clef to check if an account is indeed known to it, before passing on to the UI. The reason it currently does not, is that it would make it possible to enumerate accounts if it immediately returned "unknown account" (side channel attack).
* [x] It should be possible to configure Clef to auto-allow listing (certain) accounts, instead of asking every time.
* [x] Done Upon startup, Clef should spit out some info to the caller (particularly important when executed in `stdio-ui`-mode), invoking methods with the following info:
* [x] Version info about the signer
* [x] Address of API (HTTP/IPC)
* [ ] List of known accounts
* [ ] Have a default timeout on signing operations, so that if the user has not answered within e.g. 60 seconds, the request is rejected.
* [ ] `account_signRawTransaction`
* [ ] `account_bulkSignTransactions([] transactions)` should
* only exist if enabled via config/flag
* only allow non-data-sending transactions
* all txs must use the same `from`-account
* let the user confirm, showing
* the total amount
* the number of unique recipients
* Geth todos
- The signer should pass the `Origin` header as call-info to the UI. As of right now, the way that info about the request is put together is a bit of a hack into the HTTP server. This could probably be greatly improved.
- Relay: Geth should be started in `geth --signer localhost:8550`.
- Currently, the Geth APIs use `common.Address` in the arguments to transaction submission (e.g `to` field). This type is 20 `bytes`, and is incapable of carrying checksum information. The signer uses `common.MixedcaseAddress`, which retains the original input.
- The Geth API should switch to use the same type, and relay `to`-account verbatim to the external API.
* [x] Storage
* [x] An encrypted key-value storage should be implemented.
* See [rules.md](rules.md) for more info about this.
* Another potential thing to introduce is pairing.
* To prevent spurious requests which users just accept, implement a way to "pair" the caller with the signer (external API).
* Thus Geth/cpp would cryptographically handshake and afterwards the caller would be allowed to make signing requests.
* This feature would make the addition of rules less dangerous.
* Wallets / accounts. Add API methods for wallets.
## Communication
### External API
Clef listens to HTTP requests on `http.addr`:`http.port` (or to IPC on `ipcpath`), with the same JSON-RPC standard as Geth. The messages are expected to be [JSON-RPC 2.0 standard](https://www.jsonrpc.org/specification).
Some of these calls can require user interaction. Clients must be aware that responses may be delayed significantly or may never be received if a user decides to ignore the confirmation request.
The External API is **untrusted**: it does not accept credentials, nor does it expect that requests have any authority.
### Internal UI API
Clef has one native console-based UI, for operation without any standalone tools. However, there is also an API to communicate with an external UI. To enable that UI, the signer needs to be executed with the `--stdio-ui` option, which allocates `stdin` / `stdout` for the UI API.
An example (insecure) proof-of-concept has been implemented in `pythonsigner.py`.
The model is as follows:
* The user starts the UI app (`pythonsigner.py`).
* The UI app starts `clef` with `--stdio-ui`, and listens to the
process output for confirmation-requests.
* `clef` opens the external HTTP API.
* When the `signer` receives requests, it sends a JSON-RPC request via `stdout`.
* The UI app prompts the user accordingly, and responds to `clef`.
* `clef` signs (or not), and responds to the original request.
## External API
See the [external API changelog](extapi_changelog.md) for information about changes to this API.
### Encoding
- number: positive integers that are hex encoded
- data: hex encoded data
- string: ASCII string
All hex encoded values must be prefixed with `0x`.
### account_new
#### Create new password protected account
The signer will generate a new private key, encrypt it according to [web3 keystore spec](https://ethereum.org/en/developers/docs/data-structures-and-encoding/web3-secret-storage/) and store it in the keystore directory.
The client is responsible for creating a backup of the keystore. If the keystore is lost there is no method of retrieving lost accounts.
#### Arguments
None
#### Result
- address [string]: account address that is derived from the generated key
List all accounts that this signer currently manages
#### Arguments
None
#### Result
- array with account records:
- account.address [string]: account address that is derived from the generated key
#### Sample call
```json
{
"id": 1,
"jsonrpc": "2.0",
"method": "account_list"
}
```
Response
```json
{
"id": 1,
"jsonrpc": "2.0",
"result": [
"0xafb2f771f58513609765698f65d3f2f0224a956f",
"0xbea9183f8f4f03d427f6bcea17388bdff1cab133"
]
}
```
### account_signTransaction
#### Sign transactions
Signs a transaction and responds with the signed transaction in RLP-encoded and JSON forms.
#### Arguments
1. transaction object:
- `from` [address]: account to send the transaction from
- `to` [address]: receiver account. If omitted or `0x`, will cause contract creation.
- `gas` [number]: maximum amount of gas to burn
- `gasPrice` [number]: gas price
- `value` [number:optional]: amount of Wei to send with the transaction
- `data` [data:optional]: input data
- `nonce` [number]: account nonce
2. method signature [string:optional]
- The method signature, if present, is to aid decoding the calldata. Should consist of `methodname(paramtype,...)`, e.g. `transfer(uint256,address)`. The signer may use this data to parse the supplied calldata, and show the user. The data, however, is considered totally untrusted, and reliability is not expected.
#### Result
- raw [data]: signed transaction in RLP encoded form
Signs a chunk of structured data conformant to [EIP-712](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-712.md) and returns the calculated signature.
"message": "Transaction data did not match ABI-interface: WARNING: Supplied data is stuffed with extra data. \nWant 0000000000000002000000000000000000000000000000000000000000000012\nHave 0000000000000000000000000000000000000000000000000000000000000012\nfor method safeSend(address)"
}
],
"meta": {
"remote": "127.0.0.1:48492",
"local": "localhost:8550",
"scheme": "HTTP/1.1"
}
}
]
}
```
One which has missing `to`, but with no `data`:
```json
{
"jsonrpc": "2.0",
"id": 3,
"method": "ui_approveTx",
"params": [
{
"transaction": {
"from": "",
"to": null,
"gas": "0x0",
"gasPrice": "0x0",
"value": "0x0",
"nonce": "0x0",
"data": null,
"input": null
},
"call_info": [
{
"type": "CRITICAL",
"message": "Tx will create contract with empty code!"
}
],
"meta": {
"remote": "signer binary",
"local": "main",
"scheme": "in-proc"
}
}
]
}
```
### ApproveListing / `ui_approveListing`
Invoked when a request for account listing has been made.
Invoked when a request for creating a new account has been made.
#### Sample call
```json
{
"jsonrpc": "2.0",
"id": 4,
"method": "ui_approveNewAccount",
"params": [
{
"meta": {
"remote": "signer binary",
"local": "main",
"scheme": "in-proc"
}
}
]
}
```
### ShowInfo / `ui_showInfo`
The UI should show the info (a single message) to the user. Does not expect response.
#### Sample call
```json
{
"jsonrpc": "2.0",
"id": 9,
"method": "ui_showInfo",
"params": [
"Tests completed"
]
}
```
### ShowError / `ui_showError`
The UI should show the error (a single message) to the user. Does not expect response.
```json
{
"jsonrpc": "2.0",
"id": 2,
"method": "ui_showError",
"params": [
"Something bad happened!"
]
}
```
### OnApprovedTx / `ui_onApprovedTx`
`OnApprovedTx` is called when a transaction has been approved and signed. The call contains the return value that will be sent to the external caller. The return value from this method is ignored - the reason for having this callback is to allow the ruleset to keep track of approved transactions.
When implementing rate-limited rules, this callback should be used.
TLDR; Use this method to keep track of signed transactions, instead of using the data in `ApproveTx`.
This method provides the UI with information about what API version the signer uses (both internal and external) as well as build-info and external API,
in k/v-form.
Example call:
```json
{
"jsonrpc": "2.0",
"id": 1,
"method": "ui_onSignerStartup",
"params": [
{
"info": {
"extapi_http": "http://localhost:8550",
"extapi_ipc": null,
"extapi_version": "2.0.0",
"intapi_version": "1.2.0"
}
}
]
}
```
### OnInputRequired / `ui_onInputRequired`
Invoked when Clef requires user input (e.g. a password).
Example call:
```json
{
"jsonrpc": "2.0",
"id": 1,
"method": "ui_onInputRequired",
"params": [
{
"title": "Account password",
"prompt": "Please enter the password for account 0x694267f14675d7e1b9494fd8d72fefe1755710fa",
"isPassword": true
}
]
}
```
### Rules for UI apis
A UI should conform to the following rules.
* A UI MUST NOT load any external resources that were not embedded/part of the UI package.
* For example, not load icons, stylesheets from the internet
* Not load files from the filesystem, unless they reside in the same local directory (e.g. config files)
* A Graphical UI MUST show the blocky-identicon for ethereum addresses.
* A UI MUST warn display appropriate warning if the destination-account is formatted with invalid checksum.
* A UI MUST NOT open any ports or services
* The signer opens the public port
* A UI SHOULD verify the permissions on the signer binary, and refuse to execute or warn if permissions allow non-user write.
* A UI SHOULD inform the user about the `SHA256` or `MD5` hash of the binary being executed
* A UI SHOULD NOT maintain a secondary storage of data, e.g. list of accounts
* The signer provides accounts
* A UI SHOULD, to the best extent possible, use static linking / bundling, so that required libraries are bundled
along with the UI.
### UI Implementations
There are a couple of implementation for a UI. We'll try to keep this list up to date.
| Name | Repo | UI type| No external resources| Blocky support| Verifies permissions | Hash information | No secondary storage | Statically linked| Can modify parameters|
These data types are defined in the channel between clef and the UI
### SignDataRequest
SignDataRequest contains information about a pending request to sign some data. The data to be signed can be of various types, defined by content-type. Clef has done most of the work in canonicalizing and making sense of the data, and it's up to the UI to present the user with the contents of the `message`
SignTxRequest contains information about a pending request to sign a transaction. Aside from the transaction itself, there is also a `call_info`-struct. That struct contains messages of various types, that the user should be informed of.
As in any request, it's important to consider that the `meta` info also contains untrusted data.
The `transaction` (on input into clef) can have either `data` or `input` -- if both are set, they must be identical, otherwise an error is generated. However, Clef will always use `data` when passing this struct on (if Clef does otherwise, please file a ticket)
"message": "Something looks odd, show this message as a warning"
},
{
"type": "Info",
"message": "User should see this as well"
}
],
"meta": {
"remote": "localhost:9999",
"local": "localhost:8545",
"scheme": "http",
"User-Agent": "Firefox 3.2",
"Origin": "www.malicious.ru"
}
}
```
### SignTxResponse - approve
Response to request to sign a transaction. This response needs to contain the `transaction`, because the UI is free to make modifications to the transaction.
Response to SignTxRequest. When denying a request, there's no need to provide the transaction in return
Example:
```json
{
"transaction": {
"from": "0x",
"to": null,
"gas": "0x0",
"gasPrice": "0x0",
"value": "0x0",
"nonce": "0x0",
"data": null
},
"approved": false
}
```
### OnApproved - SignTransactionResult
SignTransactionResult is used in the call `clef` -> `OnApprovedTx(result)`
This occurs _after_ successful completion of the entire signing procedure, but right before the signed transaction is passed to the external caller. This method (and data) can be used by the UI to signal to the user that the transaction was signed, but it is primarily useful for ruleset implementations.
A ruleset that implements a rate limitation needs to know what transactions are sent out to the external interface. By hooking into this methods, the ruleset can maintain track of that count.
**OBS:** Note that if an attacker can restore your `clef` data to a previous point in time (e.g through a backup), the attacker can reset such windows, even if he/she is unable to decrypt the content.
The `OnApproved` method cannot be responded to, it's purely informative
Sent when clef needs the user to provide data. If 'password' is true, the input field should be treated accordingly (echo-free)
Example:
```json
{
"prompt": "The question to ask the user",
"title": "The title here",
"isPassword": true
}
```
### UserInputResponse
Response to UserInputRequest
Example:
```json
{
"text": "The textual response from user"
}
```
### ListRequest
Sent when a request has been made to list addresses. The UI is provided with the full `account`s, including local directory names. Note: this information is not passed back to the external caller, who only sees the `address`es.
Response to list request. The response contains a list of all addresses to show to the caller. Note: the UI is free to respond with any address the caller, regardless of whether it exists or not
Not all fields are required, though. This method is really just a UX helper, which massages the
input to conform to the `EIP-712` [specification](https://docs.safe.global/core-api/transaction-service-reference/gnosis)
for the Gnosis Safe, and making the output be directly importable to by a relay service.
### 6.0.0
* `New` was changed to deliver only an address, not the full `Account` data
* `Export` was moved from External API to the UI Server API
#### 5.0.0
* The external `account_EcRecover`-method was reimplemented.
* The external method `account_sign(address, data)` was replaced with `account_signData(contentType, address, data)`.
The addition of `contentType` makes it possible to use the method for different types of objects, such as:
* signing data with an intended validator (not yet implemented)
* signing clique headers,
* signing plain personal messages,
* The external method `account_signTypedData` implements [EIP-712](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-712.md) and makes it possible to sign typed data.
#### 4.0.0
* The external `account_Ecrecover`-method was removed.
* The external `account_Import`-method was removed.
#### 3.0.0
* The external `account_List`-method was changed to not expose `url`, which contained info about the local filesystem. It now returns only a list of addresses.
#### 2.0.0
* Commit `73abaf04b1372fa4c43201fb1b8019fe6b0a6f8d`, move `from` into `transaction` object in `signTransaction`. This
makes the `accounts_signTransaction` identical to the old `eth_signTransaction`.
The API uses [semantic versioning](https://semver.org/).
TL;DR: Given a version number MAJOR.MINOR.PATCH, increment the:
* MAJOR version when you make incompatible API changes,
* MINOR version when you add functionality in a backwards-compatible manner, and
* PATCH version when you make backwards-compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.
### 7.0.1
Added `clef_New` to the internal API callable from a UI.
> `New` creates a new password protected Account. The private key is protected with
> the given password. Users are responsible to backup the private key that is stored
> in the keystore location that was specified when this API was created.
> This method is the same as New on the external API, the difference being that
> this implementation does not ask for confirmation, since it's initiated by
> the user
### 7.0.0
- The `message` field was renamed to `messages` in all data signing request methods to better reflect that it's a list, not a value.
- The `storage.Put` and `storage.Get` methods in the rule execution engine were lower-cased to `storage.put` and `storage.get` to be consistent with JavaScript call conventions.
### 6.0.0
Removed `password` from responses to operations which require them. This is for two reasons,
- Consistency between how rulesets operate and how manual processing works. A rule can `Approve` but require the actual password to be stored in the clef storage.
With this change, the same stored password can be used even if rulesets are not enabled, but storage is.
- It also removes the usability-shortcut that a UI might otherwise want to implement; remembering passwords. Since we now will not require the
password on every `Approve`, there's no need for the UI to cache it locally.
- In a future update, we'll likely add `clef_storePassword` to the internal API, so the user can store it via his UI (currently only CLI works).
Affected datatypes:
- `SignTxResponse`
- `SignDataResponse`
- `NewAccountResponse`
If `clef` requires a password, the `OnInputRequired` will be used to collect it.
### 5.0.0
Changed the namespace format to adhere to the legacy ethereum format: `name_methodName`. Changes:
* `ApproveTx` -> `ui_approveTx`
* `ApproveSignData` -> `ui_approveSignData`
* `ApproveExport` -> `removed`
* `ApproveImport` -> `removed`
* `ApproveListing` -> `ui_approveListing`
* `ApproveNewAccount` -> `ui_approveNewAccount`
* `ShowError` -> `ui_showError`
* `ShowInfo` -> `ui_showInfo`
* `OnApprovedTx` -> `ui_onApprovedTx`
* `OnSignerStartup` -> `ui_onSignerStartup`
* `OnInputRequired` -> `ui_onInputRequired`
### 4.0.0
* Bidirectional communication implemented, so the UI can query `clef` via the stdin/stdout RPC channel. Methods implemented are:
- `clef_listWallets`
- `clef_listAccounts`
- `clef_listWallets`
- `clef_deriveAccount`
- `clef_importRawKey`
- `clef_openWallet`
- `clef_chainId`
- `clef_setChainId`
- `clef_export`
- `clef_import`
* The type `Account` was modified (the json-field `type` was removed), to consist of
```go
type Account struct {
Address common.Address `json:"address"` // Ethereum account address derived from the key
URL URL `json:"url"` // Optional resource locator within a backend
}
```
### 3.2.0
* Make `ShowError`, `OnApprovedTx`, `OnSignerStartup` be json-rpc [notifications](https://www.jsonrpc.org/specification#notification):
> A Notification is a Request object without an "id" member. A Request object that is a Notification signifies the Client's lack of interest in the corresponding Response object, and as such no Response object needs to be returned to the client. The Server MUST NOT reply to a Notification, including those that are within a batch request.
>
> Notifications are not confirmable by definition, since they do not have a Response object to be returned. As such, the Client would not be aware of any errors (like e.g. "Invalid params","Internal error"
### 3.1.0
* Add `ContentType``string` to `SignDataRequest` to accommodate the latest [EIP-191](https://eips.ethereum.org/EIPS/eip-191) and [EIP-712](https://eips.ethereum.org/EIPS/eip-712) implementations.
### 3.0.0
* Make use of `OnInputRequired(info UserInputRequest)` for obtaining master password during startup
### 2.1.0
* Add `OnInputRequired(info UserInputRequest)` to internal API. This method is used when Clef needs user input, e.g. passwords.
The following structures are used:
```go
UserInputRequest struct {
Prompt string `json:"prompt"`
Title string `json:"title"`
IsPassword bool `json:"isPassword"`
}
UserInputResponse struct {
Text string `json:"text"`
}
```
### 2.0.0
* Modify how `call_info` on a transaction is conveyed. New format:
"\tDo you want to allow listing the following accounts?\n"
"\t-{addrs}\n"
"\n"
"->Auto-answering No\n"
)
meta=req.get("meta",{})
accounts=req.get("accounts",[])
addrs=[x.get("address")forxinaccounts]
sys.stdout.write(
message.format(
addrs="\n\t-".join(addrs),
meta_string=metaString(meta)
)
)
return{}
@public
defonInputRequired(self,req):
"""
Examplerequest:
{"jsonrpc":"2.0","id":1,"method":"ui_onInputRequired","params":[{"title":"Master Password","prompt":"Please enter the password to decrypt the master seed","isPassword":true}]}
The `signer` binary contains a ruleset engine, implemented with [OttoVM](https://github.com/robertkrimen/otto)
It enables use cases like the following:
* I want to auto-approve transactions with contract `CasinoDapp`, with up to `0.05 ether` in value to maximum `1 ether` per 24h period
* I want to auto-approve transaction to contract `EthAlarmClock` with `data`=`0xdeadbeef`, if `value=0`, `gas < 44k` and `gasPrice < 40Gwei`
The two main features that are required for this to work well are:
1. Rule Implementation: how to create, manage, and interpret rules in a flexible but secure manner
2. Credential management and credentials; how to provide auto-unlock without exposing keys unnecessarily.
The section below deals with both of them
## Rule Implementation
A ruleset file is implemented as a `js` file. Under the hood, the ruleset engine is a `SignerUI`, implementing the same methods as the `json-rpc` methods
defined in the UI protocol. Example:
```js
function asBig(str) {
if (str.slice(0, 2) == "0x") {
return new BigNumber(str.slice(2), 16)
}
return new BigNumber(str)
}
// Approve transactions to a certain contract if the value is below a certain limit
function ApproveTx(req) {
var limit = new BigNumber("0xb1a2bc2ec50000")
var value = asBig(req.transaction.value);
if (req.transaction.to.toLowerCase() == "0xae967917c465db8578ca9024c205720b1a3651a9" && value.lt(limit)) {
return "Approve"
}
// If we return "Reject", it will be rejected.
// By not returning anything, it will be passed to the next UI, for manual processing
}
// Approve listings if request made from IPC
function ApproveListing(req){
if (req.metadata.scheme == "ipc"){ return "Approve"}
}
```
Whenever the external API is called (and the ruleset is enabled), the `signer` calls the UI, which is an instance of a ruleset-engine. The ruleset-engine
invokes the corresponding method. In doing so, there are three possible outcomes:
1. JS returns "Approve"
* Auto-approve request
2. JS returns "Reject"
* Auto-reject request
3. Error occurs, or something else is returned
* Pass on to `next` ui: the regular UI channel.
A more advanced example can be found below, "Example 1: ruleset for a rate-limited window", using `storage` to `Put` and `Get``string`s by key.
* At the time of writing, storage only exists as an ephemeral unencrypted implementation, to be used during testing.
### Things to note
The Otto vm has a few [caveats](https://github.com/robertkrimen/otto):
* "use strict" will parse, but does nothing.
* The regular expression engine (re2/regexp) is not fully compatible with the ECMA5 specification.
* Otto targets ES5. ES6 features (eg: Typed Arrays) are not supported.
Additionally, a few more have been added
* The rule execution cannot load external javascript files.
* The only preloaded library is [`bignumber.js`](https://github.com/MikeMcl/bignumber.js) version `2.0.3`. This one is fairly old, and is not aligned with the documentation at the GitHub repository.
* Each invocation is made in a fresh virtual machine. This means that you cannot store data in global variables between invocations. This is a deliberate choice -- if you want to store data, use the disk-backed `storage`, since rules should not rely on ephemeral data.
* Javascript API parameters are _always_ an object. This is also a design choice, to ensure that parameters are accessed by _key_ and not by order. This is to prevent mistakes due to missing parameters or parameter changes.
* The JS engine has access to `storage` and `console`.
#### Security considerations
##### Security of ruleset
Some security precautions can be made, such as:
* Never load `ruleset.js` unless the file is `readonly` (`r-??-??-?`). If the user wishes to modify the ruleset, he must make it writeable and then set back to readonly.
* This is to prevent attacks where files are dropped on the users disk.
* Since we're going to have to have some form of secure storage (not defined in this section), we could also store the `sha3` of the `ruleset.js` file in there.
* If the user wishes to modify the ruleset, he'd then have to perform e.g. `signer --attest /path/to/ruleset --credential <creds>`
##### Security of implementation
The drawback of this very flexible solution is that the `signer` needs to contain a javascript engine. This is pretty simple to implement since it's already
implemented for `geth`. There are no known security vulnerabilities in it, nor have we had any security problems with it so far.
The javascript engine would be an added attack surface; but if the validation of `rulesets` is made good (with hash-based attestation), the actual javascript cannot be considered
an attack surface -- if an attacker can control the ruleset, a much simpler attack would be to implement an "always-approve" rule instead of exploiting the js vm. The only benefit
to be gained from attacking the actual `signer` process from the `js` side would be if it could somehow extract cryptographic keys from memory.
##### Security in usability
Javascript is flexible, but also easy to get wrong, especially when users assume that `js` can handle large integers natively. Typical errors
include trying to multiply `gasCost` with `gas` without using `bigint`:s.
It's unclear whether any other DSL could be more secure; since there's always the possibility of erroneously implementing a rule.
## Credential management
The ability to auto-approve transactions means that the signer needs to have the necessary credentials to decrypt keyfiles. These passwords are hereafter called `ksp` (keystore pass).
### Example implementation
Upon startup of the signer, the signer is given a switch: `--seed <path/to/masterseed>`
The `seed` contains a blob of bytes, which is the master seed for the `signer`.
The `signer` uses the `seed` to:
* Generate the `path` where the settings are stored.
First things first, Clef needs to store some data itself. Since that data might be sensitive (passwords, signing rules, accounts), Clef's entire storage is encrypted. To support encrypting data, the first step is to initialize Clef with a random master seed, itself too encrypted with your chosen password:
```text
$ clef init
WARNING!
Clef is an account management tool. It may, like any software, contain bugs.
Please take care to
- backup your keystore files,
- verify that the keystore(s) can be opened with your password.
Clef is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE. See the GNU General Public License for more details.
Enter 'ok' to proceed:
> ok
The master seed of clef will be locked with a password.
Please specify a password. Do not forget this password!
Password:
Repeat password:
A master seed has been generated into /home/martin/.clef/masterseed.json
This is required to be able to store credentials, such as:
* Passwords for keystores (used by rule engine)
* Storage for JavaScript auto-signing rules
* Hash of JavaScript rule-file
You should treat 'masterseed.json' with utmost secrecy and make a backup of it!
* The password is necessary but not enough, you need to back up the master seed too!
* The master seed does not contain your accounts, those need to be backed up separately!
```
*For readability purposes, we'll remove the WARNING printout, user confirmation and the unlocking of the master seed in the rest of this document.*
## Remote interactions
Clef is capable of managing both key-file based accounts as well as hardware wallets. To evaluate clef, we're going to point it to our Rinkeby testnet keystore and specify the Rinkeby chain ID for signing (Clef doesn't have a backing chain, so it doesn't know what network it runs on).
INFO [07-01|11:00:46.392] IPC endpoint opened url=$HOME/.clef/clef.ipc
------- Signer info -------
* intapi_version : 7.0.0
* extapi_version : 6.0.0
* extapi_http : n/a
* extapi_ipc : $HOME/.clef/clef.ipc
```
By default, Clef starts up in CLI (Command Line Interface) mode. Arbitrary remote processes may *request* account interactions (e.g. sign a transaction), which the user will need to individually *confirm*.
To test this out, we can *request* Clef to list all account via its *External API endpoint*:
Apart from listing accounts, you can also *request* creating a new account; signing transactions and data; and recovering signatures. You can find the available methods in the Clef [External API Spec](https://github.com/ethereum/go-ethereum/tree/master/cmd/clef#external-api-1) and the [External API Changelog](https://github.com/ethereum/go-ethereum/blob/master/cmd/clef/extapi_changelog.md).
*Note, the number of things you can do from the External API is deliberately small, since we want to limit the power of remote calls by as much as possible! Clef has an [Internal API](https://github.com/ethereum/go-ethereum/tree/master/cmd/clef#ui-api-1) too for the UI (User Interface) which is much richer and can support custom interfaces on top. But that's out of scope here.*
## Automatic rules
For most users, manually confirming every transaction is the way to go. However, there are cases when it makes sense to set up some rules which permit Clef to sign a transaction without prompting the user. One such example would be running a signer on Rinkeby or other PoA networks.
For starters, we can create a rule file that automatically permits anyone to list our available accounts without user confirmation. The rule file is a tiny JavaScript snippet that you can program however you want:
```js
function ApproveListing() {
return "Approve"
}
```
Of course, Clef isn't going to just accept and run arbitrary scripts you give it, that would be dangerous if someone changes your rule file! Instead, you need to explicitly *attest* the rule file, which entails injecting its hash into Clef's secure store.
In `$HOME/.clef`, the `masterseed.json` file was created, containing the master seed. This seed was then used to derive a few other things:
- **Vault location**: in this case `02f90c0603f4f2f60188`.
- If you use a different master seed, a different vault location will be used that does not conflict with each other (e.g. `clef --signersecret /path/to/file`). This allows you to run multiple instances of Clef, each with its own rules (e.g. mainnet + testnet).
- **`config.json`**: the encrypted key/value storage for configuration data, currently only containing the key `ruleset_sha256`, the attested hash of the automatic rules to use.
## Advanced rules
In order to make more useful rules - like signing transactions - the signer needs access to the passwords needed to unlock keys from the keystore. You can inject an unlock password via `clef setpw`.
Please enter a password to store for this address:
Password:
Repeat password:
Decrypt master seed of clef
Password:
INFO [07-01|14:05:56.031] Credential store updated key=0xd9c9cd5f6779558b6e0ed4e6acf6b1947e7fa1f3
```
Now let's update the rules to make use of the new credentials:
```js
function ApproveListing() {
return "Approve"
}
function ApproveSignData(req) {
if (req.address.toLowerCase() == "0xd9c9cd5f6779558b6e0ed4e6acf6b1947e7fa1f3") {
if (req.messages[0].value.indexOf("bazonk") >= 0) {
return "Approve"
}
return "Reject"
}
// Otherwise goes to manual processing
}
```
In this example:
- Any requests to sign data with the account `0xd9c9...` will be:
- Auto-approved if the message contains `bazonk`,
- Auto-rejected if the message does not contain `bazonk`,
- Any other requests will be passed along for manual confirmation.
*Note, to make this example work, please use you own accounts. You can create a new account either via Clef or the traditional account CLI tools. If the latter was chosen, make sure both Clef and Geth use the same keystore by specifying `--keystore path/to/your/keystore` when running Clef.*
Attest the new rule file so that Clef will accept loading it:
For more details on writing automatic rules, please see the [rules spec](https://github.com/ethereum/go-ethereum/blob/master/cmd/clef/rules.md).
## Geth integration
Of course, as awesome as Clef is, it's not feasible to interact with it via JSON RPC by hand. Long term, we're hoping to convince the general Ethereum community to support Clef as a general signer (it's only 3-5 methods), thus allowing your favorite DApp, Metamask, MyCrypto, etc to request signatures directly.
Until then however, we're trying to pave the way via Geth. Geth v1.9.0 has built in support via `--signer <API endpoint>` for using a local or remote Clef instance as an account backend!
We can try this by running Clef with our previous rules on Rinkeby (for now it's a good idea to allow auto-listing accounts, since Geth likes to retrieve them once in a while).
Additional HTTP header data, provided by the external caller:
User-Agent:
Origin:
-------------------------------------------
Approve? [y/N]:
> y
```
:boom:
*Note, if you enable the external signer backend in Geth, all other account management is disabled. This is because long term we want to remove account management from Geth.*