go-ethereum

mirror of https://github.com/ethereum/go-ethereum.git synced 2026-06-06 06:58:39 +00:00

Author	SHA1	Message	Date
Jonathan Oppenheimer	bc1967f088	core/state/snapshot: snapshot generation shutdown race condition (#33540 ) ## Overview This PR fixes a race condition during blockchain shutdown where snapshot generation could continue accessing the trie database after it has been closed, leading to iterator errors. We noticed this in one of our nodes on https://github.com/ava-labs/avalanchego, which relies on an older version of geth with the same issue (so this behavior does happen!). During node shutdown, the following sequence occurs: 1. `BlockChain.Stop()` calls `snaps.Release()` to clean up snapshot resources 2. `Release()` only resets the cache but doesn't stop the generator goroutine 3. The trie database is then closed via `triedb.Close()` 4. The still-running generator attempts to iterate storage tries 5. Iterator fails because the database is closed (`"Generator failed to iterate storage trie"`) ## Problem There are three related bugs: 1. `Release()` doesn't stop generation: The `diskLayer.Release()` method only resets the cache without stopping ongoing snapshot generation, leaving the generator goroutine running after database closure. 2. `stopGeneration()` has an incorrect completion check: The `stopGeneration()` method checks `genMarker != nil` to determine if generation is running. However, `genMarker` is set to nil when generation completes successfully, even though the generator goroutine is still waiting for the abort signal at the end of `generate()`. See line 705 in `generate.go`: `eaaa5b716d/core/state/snapshot/generate.go (L699-L707)` This means `stopGeneration()` returns early without sending the abort signal. 3. Node shutdown doesn't stop generation: During shutdown, no code path calls `stopGeneration()` or sends the abort signal to the generator, causing the generator to access a closed database and error. ## Fix - Modified `diskLayer.Release()` to call `stopGeneration()` before releasing resources - Added cancelation architecture, removing reliance on someone having to wait - Fixed `stopGeneration()` to properly and safely stop snapshot generation - Added `TestGenerateGoroutineLeak` to verify the fix and prevent regression. The test fails without the fix and passes with it. - The test creates a snapshot with active generation, waits for completion, then calls `Release()`, and uses `go.uber.org/goleak` to assert no generator goroutine survives. - Without the fix, the test fails: `Release()` returns without stopping the generator, which stays parked at `generate.go:705` waiting for an abort signal that never comes: ``` --- FAIL: TestGenerateGoroutineLeak (0.88s) generate_test.go: found unexpected goroutines: [Goroutine 6 in state chan receive, with core/state/snapshot.(diskLayer).generate on top of the stack: core/state/snapshot.(diskLayer).generate(...) core/state/snapshot/generate.go:705 created by core/state/snapshot.generateSnapshot core/state/snapshot/generate.go:79 ] ``` - With the fix, the test passes: `Release()` -> `stopGeneration()` blocks until the generator goroutine has fully exited, so nothing leaks Note that this fix follows the same pattern used in `Tree.Disable()` in https://github.com/ethereum/go-ethereum/pull/30040, which introduced `stopGeneration()` for use in `Disable()` and `Rebuild()` but didn't address the shutdown path. The test follows the same pattern used in `TestCheckSimBackendGoroutineLeak`	2026-06-04 21:22:58 -05:00
rjl493456442	6485d5e3ff	core, triedb: remove destruct flag in state snapshot (#30752 ) This pull request removes the destruct flag from the state snapshot to simplify the code. Previously, this flag indicated that an account was removed during a state transition, making all associated storage slots inaccessible. Because storage deletion can involve a large number of slots, the actual deletion is deferred until the end of the process, where it is handled in batches. With the deprecation of self-destruct in the Cancun fork, storage deletions are no longer expected. Historically, the largest storage deletion event in Ethereum was around 15 megabytes—manageable in memory. In this pull request, the single destruct flag is replaced by a set of deletion markers for individual storage slots. Each deleted storage slot will now appear in the Storage set with a nil value. This change will simplify a lot logics, such as storage accessing, storage flushing, storage iteration and so on.	2024-11-22 16:55:43 +08:00
rjl493456442	d71831255d	core/state/snapshot: port changes from 29995 (#30040 ) #29995 has been reverted due to an unexpected flaw in the state snapshot process. Specifically, it attempts to stop the state snapshot generation, which could potentially cause the system to halt if the generation is not currently running. This pull request ports the changes made in #29995 and fixes the flaw.	2024-09-06 18:02:34 +03:00
rjl493456442	fe91d476ba	all: remove the dependency from trie to triedb (#28824 ) This change removes the dependency from trie package to triedb package.	2024-02-13 14:49:53 +01:00
Martin Holst Swende	c1d5a012ea	core/state, tests: fix memory leak via fastcache (#28387 ) This change fixes a memory leak, when running either state-tests or blockchain-tests, we allocate a `1MB` fastcache during snapshot generation. `fastcache` is a bit special, and requires a `Reset()` (it has it's own memory allocator). The `1MB` was hidden [here](https://github.com/ethereum/go-ethereum/blob/master/tests/state_test_util.go#L333) and [here](https://github.com/ethereum/go-ethereum/blob/master/tests/block_test_util.go#L146) respectively.	2023-10-20 13:35:49 +02:00
rjl493456442	0e5d2c7c53	core/state/snapshot, core/types, eth: move account definition to type (#27323 ) * core/state/snapshot, core/types, eth: move account definition to type * core, eth: revert snapshot Account API change	2023-06-06 11:17:39 +03:00
Melvin Junhee Woo	d2e1b17f18	snapshot, trie: fixed typos, mostly in snapshot pkg (#22133 )	2021-01-07 08:36:21 +02:00
Péter Szilágyi	a4cf279494	core/state: extend snapshotter to handle account resurrections	2020-03-03 15:52:00 +02:00
Péter Szilágyi	6e05ccd845	core/state/snapshot, tests: sync snap gen + snaps in consensus tests	2020-03-03 09:17:13 +02:00
Péter Szilágyi	6ddb92a089	core/state/snapshot: full featured account iteration	2020-02-25 12:51:14 +02:00
Péter Szilágyi	22c494d399	core/state/snapshot: bloom, metrics and prefetcher fixes	2020-02-25 12:51:11 +02:00
Péter Szilágyi	351a5903b0	core/rawdb, core/state/snapshot: runtime snapshot generation	2020-02-25 12:51:08 +02:00
Martin Holst Swende	f300c0df01	core/state/snapshot: replace bigcache with fastcache	2020-02-25 12:51:08 +02:00
Péter Szilágyi	d754091a87	core/state/snapshot: unlink snapshots from blocks, quad->linear cleanup	2020-02-25 12:51:07 +02:00
Péter Szilágyi	d7d81d7c12	core/state/snapshot: extract and split cap method, cover corners	2020-02-25 12:51:05 +02:00
Martin Holst Swende	e146fbe4e7	core/state: lazy sorting, snapshot invalidation	2020-02-25 12:51:05 +02:00
Péter Szilágyi	542df8898e	core: initial version of state snapshots	2020-02-25 12:51:04 +02:00

17 commits