Commit graph

17 commits

Author SHA1 Message Date
Jonathan Oppenheimer
bc1967f088
core/state/snapshot: snapshot generation shutdown race condition (#33540)
## Overview

This PR fixes a race condition during blockchain shutdown where snapshot
generation could continue accessing the trie database after it has been
closed, leading to iterator errors. We noticed this in one of our nodes
on https://github.com/ava-labs/avalanchego, which relies on an older
version of geth with the same issue (so this behavior does happen!).

During node shutdown, the following sequence occurs:

1. `BlockChain.Stop()` calls `snaps.Release()` to clean up snapshot
resources
2. `Release()` only resets the cache but doesn't stop the generator
goroutine
3. The trie database is then closed via `triedb.Close()`
4. The still-running generator attempts to iterate storage tries
5. Iterator fails because the database is closed (`"Generator failed to
iterate storage trie"`)

## Problem

There are three related bugs:

1. `Release()` doesn't stop generation: The `diskLayer.Release()` method
only resets the cache without stopping ongoing snapshot generation,
leaving the generator goroutine running after database closure.
2. `stopGeneration()` has an incorrect completion check: The
`stopGeneration()` method checks `genMarker != nil` to determine if
generation is running. However, `genMarker` is set to nil when
generation completes successfully, even though the generator goroutine
is still waiting for the abort signal at the end of `generate()`. See
line 705 in `generate.go`:
eaaa5b716d/core/state/snapshot/generate.go (L699-L707)
This means `stopGeneration()` returns early without sending the abort
signal.
3. Node shutdown doesn't stop generation: During shutdown, no code path
calls `stopGeneration()` or sends the abort signal to the generator,
causing the generator to access a closed database and error.

## Fix

- Modified `diskLayer.Release()` to call `stopGeneration()` before
releasing resources
- Added cancelation architecture, removing reliance on someone having to
wait
- Fixed `stopGeneration()` to properly and safely stop snapshot
generation
- Added `TestGenerateGoroutineLeak` to verify the fix and prevent
regression. The test fails without the fix and passes with it.
- The test creates a snapshot with active generation, waits for
completion, then calls `Release()`, and uses `go.uber.org/goleak` to
assert no generator goroutine survives.
- Without the fix, the test fails: `Release()` returns without stopping
the generator, which stays parked at `generate.go:705` waiting for an
abort signal that never comes:

    ```
    --- FAIL: TestGenerateGoroutineLeak (0.88s)
        generate_test.go: found unexpected goroutines:
        [Goroutine 6 in state chan receive, with
         core/state/snapshot.(*diskLayer).generate on top of the stack:
         core/state/snapshot.(*diskLayer).generate(...)
            core/state/snapshot/generate.go:705
         created by core/state/snapshot.generateSnapshot
            core/state/snapshot/generate.go:79 ]
    ```
- With the fix, the test passes: `Release()` -> `stopGeneration()`
blocks until the generator goroutine has fully exited, so nothing leaks

Note that this fix follows the same pattern used in `Tree.Disable()` in
https://github.com/ethereum/go-ethereum/pull/30040, which introduced
`stopGeneration()` for use in `Disable()` and `Rebuild()` but didn't
address the shutdown path.

The test follows the same pattern used in
`TestCheckSimBackendGoroutineLeak`
2026-06-04 21:22:58 -05:00
rjl493456442
6485d5e3ff
core, triedb: remove destruct flag in state snapshot (#30752)
This pull request removes the destruct flag from the state snapshot to
simplify the code.

Previously, this flag indicated that an account was removed during a
state transition, making all associated storage slots inaccessible.
Because storage deletion can involve a large number of slots, the actual
deletion is deferred until the end of the process, where it is handled
in batches.

With the deprecation of self-destruct in the Cancun fork, storage
deletions are no longer expected. Historically, the largest storage
deletion event in Ethereum was around 15 megabytes—manageable in memory.

In this pull request, the single destruct flag is replaced by a set of
deletion markers for individual storage slots. Each deleted storage slot
will now appear in the Storage set with a nil value.

This change will simplify a lot logics, such as storage accessing,
storage flushing, storage iteration and so on.
2024-11-22 16:55:43 +08:00
rjl493456442
d71831255d
core/state/snapshot: port changes from 29995 (#30040)
#29995 has been reverted due to an unexpected flaw in the state snapshot
process.

Specifically, it attempts to stop the state snapshot generation, which
could potentially
cause the system to halt if the generation is not currently running.

This pull request ports the changes made in #29995 and fixes the flaw.
2024-09-06 18:02:34 +03:00
rjl493456442
fe91d476ba
all: remove the dependency from trie to triedb (#28824)
This change removes the dependency from trie package to triedb package.
2024-02-13 14:49:53 +01:00
Martin Holst Swende
c1d5a012ea
core/state, tests: fix memory leak via fastcache (#28387)
This change fixes a memory leak, when running either state-tests or blockchain-tests, we allocate a `1MB` fastcache during snapshot generation. `fastcache` is a bit special, and requires a `Reset()` (it has it's own memory allocator). 

The `1MB` was hidden [here](https://github.com/ethereum/go-ethereum/blob/master/tests/state_test_util.go#L333) and [here](https://github.com/ethereum/go-ethereum/blob/master/tests/block_test_util.go#L146) respectively.
2023-10-20 13:35:49 +02:00
rjl493456442
0e5d2c7c53
core/state/snapshot, core/types, eth: move account definition to type (#27323)
* core/state/snapshot, core/types, eth: move account definition to type

* core, eth: revert snapshot Account API change
2023-06-06 11:17:39 +03:00
Melvin Junhee Woo
d2e1b17f18
snapshot, trie: fixed typos, mostly in snapshot pkg (#22133) 2021-01-07 08:36:21 +02:00
Péter Szilágyi
a4cf279494
core/state: extend snapshotter to handle account resurrections 2020-03-03 15:52:00 +02:00
Péter Szilágyi
6e05ccd845
core/state/snapshot, tests: sync snap gen + snaps in consensus tests 2020-03-03 09:17:13 +02:00
Péter Szilágyi
6ddb92a089
core/state/snapshot: full featured account iteration 2020-02-25 12:51:14 +02:00
Péter Szilágyi
22c494d399
core/state/snapshot: bloom, metrics and prefetcher fixes 2020-02-25 12:51:11 +02:00
Péter Szilágyi
351a5903b0
core/rawdb, core/state/snapshot: runtime snapshot generation 2020-02-25 12:51:08 +02:00
Martin Holst Swende
f300c0df01
core/state/snapshot: replace bigcache with fastcache 2020-02-25 12:51:08 +02:00
Péter Szilágyi
d754091a87
core/state/snapshot: unlink snapshots from blocks, quad->linear cleanup 2020-02-25 12:51:07 +02:00
Péter Szilágyi
d7d81d7c12
core/state/snapshot: extract and split cap method, cover corners 2020-02-25 12:51:05 +02:00
Martin Holst Swende
e146fbe4e7
core/state: lazy sorting, snapshot invalidation 2020-02-25 12:51:05 +02:00
Péter Szilágyi
542df8898e
core: initial version of state snapshots 2020-02-25 12:51:04 +02:00