## Overview
This PR fixes a race condition during blockchain shutdown where snapshot
generation could continue accessing the trie database after it has been
closed, leading to iterator errors. We noticed this in one of our nodes
on https://github.com/ava-labs/avalanchego, which relies on an older
version of geth with the same issue (so this behavior does happen!).
During node shutdown, the following sequence occurs:
1. `BlockChain.Stop()` calls `snaps.Release()` to clean up snapshot
resources
2. `Release()` only resets the cache but doesn't stop the generator
goroutine
3. The trie database is then closed via `triedb.Close()`
4. The still-running generator attempts to iterate storage tries
5. Iterator fails because the database is closed (`"Generator failed to
iterate storage trie"`)
## Problem
There are three related bugs:
1. `Release()` doesn't stop generation: The `diskLayer.Release()` method
only resets the cache without stopping ongoing snapshot generation,
leaving the generator goroutine running after database closure.
2. `stopGeneration()` has an incorrect completion check: The
`stopGeneration()` method checks `genMarker != nil` to determine if
generation is running. However, `genMarker` is set to nil when
generation completes successfully, even though the generator goroutine
is still waiting for the abort signal at the end of `generate()`. See
line 705 in `generate.go`:
eaaa5b716d/core/state/snapshot/generate.go (L699-L707)
This means `stopGeneration()` returns early without sending the abort
signal.
3. Node shutdown doesn't stop generation: During shutdown, no code path
calls `stopGeneration()` or sends the abort signal to the generator,
causing the generator to access a closed database and error.
## Fix
- Modified `diskLayer.Release()` to call `stopGeneration()` before
releasing resources
- Added cancelation architecture, removing reliance on someone having to
wait
- Fixed `stopGeneration()` to properly and safely stop snapshot
generation
- Added `TestGenerateGoroutineLeak` to verify the fix and prevent
regression. The test fails without the fix and passes with it.
- The test creates a snapshot with active generation, waits for
completion, then calls `Release()`, and uses `go.uber.org/goleak` to
assert no generator goroutine survives.
- Without the fix, the test fails: `Release()` returns without stopping
the generator, which stays parked at `generate.go:705` waiting for an
abort signal that never comes:
```
--- FAIL: TestGenerateGoroutineLeak (0.88s)
generate_test.go: found unexpected goroutines:
[Goroutine 6 in state chan receive, with
core/state/snapshot.(*diskLayer).generate on top of the stack:
core/state/snapshot.(*diskLayer).generate(...)
core/state/snapshot/generate.go:705
created by core/state/snapshot.generateSnapshot
core/state/snapshot/generate.go:79 ]
```
- With the fix, the test passes: `Release()` -> `stopGeneration()`
blocks until the generator goroutine has fully exited, so nothing leaks
Note that this fix follows the same pattern used in `Tree.Disable()` in
https://github.com/ethereum/go-ethereum/pull/30040, which introduced
`stopGeneration()` for use in `Disable()` and `Rebuild()` but didn't
address the shutdown path.
The test follows the same pattern used in
`TestCheckSimBackendGoroutineLeak`
The upstream libray has removed the assembly-based implementation of
keccak. We need to maintain our own library to avoid a peformance
regression.
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
Previously, PathDB used a single buffer to aggregate database writes,
which needed to be flushed atomically. However, flushing large amounts
of data (e.g., 256MB) caused significant overhead, often blocking the
system for around 3 seconds during the flush.
To mitigate this overhead and reduce performance spikes, a double-buffer
mechanism is introduced. When the active buffer fills up, it is marked
as frozen and a background flushing process is triggered. Meanwhile, a
new buffer is allocated for incoming writes, allowing operations to
continue uninterrupted.
This approach reduces system blocking times and provides flexibility in
adjusting buffer parameters for improved performance.
In this pull request, snapshot generation in pathdb has been ported from
the legacy state snapshot implementation. Additionally, when running in
path mode, legacy state snapshot data is now managed by the pathdb
based snapshot logic.
Note: Existing snapshot data will be re-generated, regardless of whether
it was previously fully constructed.
This pull request removes the destruct flag from the state snapshot to
simplify the code.
Previously, this flag indicated that an account was removed during a
state transition, making all associated storage slots inaccessible.
Because storage deletion can involve a large number of slots, the actual
deletion is deferred until the end of the process, where it is handled
in batches.
With the deprecation of self-destruct in the Cancun fork, storage
deletions are no longer expected. Historically, the largest storage
deletion event in Ethereum was around 15 megabytes—manageable in memory.
In this pull request, the single destruct flag is replaced by a set of
deletion markers for individual storage slots. Each deleted storage slot
will now appear in the Storage set with a nil value.
This change will simplify a lot logics, such as storage accessing,
storage flushing, storage iteration and so on.
This change makes use of uin256 to represent balance in state. It touches primarily upon statedb, stateobject and state processing, trying to avoid changes in transaction pools, core types, rpc and tracers.
This PR replaces Geth's logger package (a fork of [log15](https://github.com/inconshreveable/log15)) with an implementation using slog, a logging library included as part of the Go standard library as of Go1.21.
Main changes are as follows:
* removes any log handlers that were unused in the Geth codebase.
* Json, logfmt, and terminal formatters are now slog handlers.
* Verbosity level constants are changed to match slog constant values. Internal translation is done to make this opaque to the user and backwards compatible with existing `--verbosity` and `--vmodule` options.
* `--log.backtraceat` and `--log.debug` are removed.
The external-facing API is largely the same as the existing Geth logger. Logger method signatures remain unchanged.
A small semantic difference is that a `Handler` can only be set once per `Logger` and not changed dynamically. This just means that a new logger must be instantiated every time the handler of the root logger is changed.
----
For users of the `go-ethereum/log` module. If you were using this module for your own project, you will need to change the initialization. If you previously did
```golang
log.Root().SetHandler(log.LvlFilterHandler(log.LvlInfo, log.StreamHandler(os.Stderr, log.TerminalFormat(true))))
```
You now instead need to do
```golang
log.SetDefault(log.NewLogger(log.NewTerminalHandlerWithLevel(os.Stderr, log.LevelInfo, true)))
```
See more about reasoning here: https://github.com/ethereum/go-ethereum/issues/28558#issuecomment-1820606613
This change makes the StateDB track the state key value diff of a block transition.
We already tracked current account and storage values for the purpose of updating
the state snapshot. With this PR, we now also track the original (pre-transition) values
of accounts and storage slots.
In this PR, all TryXXX(e.g. TryGet) APIs of trie are renamed to XXX(e.g. Get) with an error returned.
The original XXX(e.g. Get) APIs are renamed to MustXXX(e.g. MustGet) and does not return any error -- they print a log output. A future PR will change the behaviour to panic on errorrs.
The EmptyRootHash and EmptyCodeHash are defined everywhere in the codebase, this PR replaces all of them with unified one defined in core/types package, and also defines constants for TxRoot, WithdrawalsRoot and UncleRoot
This change ports some changes from the main PBSS PR:
- get rid of callback function in `trie.Database.Commit` which is not required anymore
- rework the `nodeResolver` in `trie.Iterator` to make it compatible with multiple state scheme
- some other shallow changes in tests and typo-fixes
This PR moves some trie-related db accessor methods to a different file, and also removes the schema type. Instead of the schema type, a string is used to distinguish between hashbased/pathbased db accessors.
This also moves some code from trie package to rawdb package.
This PR is intended to be a no-functionality-change prep PR for #25963 .
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
This PR introduces a node scheme abstraction. The interface is only implemented by `hashScheme` at the moment, but will be extended by `pathScheme` very soon.
Apart from that, a few changes are also included which is worth mentioning:
- port the changes in the stacktrie, tracking the path prefix of nodes during commit
- use ethdb.Database for constructing trie.Database. This is not necessary right now, but it is required for path-based used to open reverse diff freezer
This changes the CI / release builds to use the latest Go version. It also
upgrades golangci-lint to a newer version compatible with Go 1.19.
In Go 1.19, godoc has gained official support for links and lists. The
syntax for code blocks in doc comments has changed and now requires a
leading tab character. gofmt adapts comments to the new syntax
automatically, so there are a lot of comment re-formatting changes in this
PR. We need to apply the new format in order to pass the CI lint stage with
Go 1.19.
With the linter upgrade, I have decided to disable 'gosec' - it produces
too many false-positive warnings. The 'deadcode' and 'varcheck' linters
have also been removed because golangci-lint warns about them being
unmaintained. 'unused' provides similar coverage and we already have it
enabled, so we don't lose much with this change.
* core: use TryGetAccount to read where TryUpdateAccount has been used to write
* Gary's review feedback
* implement Gary's suggestion
* fix bug + rename NewSecure into NewStateTrie
* trie: add backwards-compatibility aliases for SecureTrie
* Update database.go
* make the linter happy
Co-authored-by: Felix Lange <fjl@twurst.com>
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
* core/state/snapshot: exit Geth if generator hits missing trie nodes
* core/state/snapshot: error instead of hard die on generator fault
* core/state/snapshot: don't enable logging on the tests