go-ethereum/core
Jonathan Oppenheimer bc1967f088
core/state/snapshot: snapshot generation shutdown race condition (#33540)
## Overview

This PR fixes a race condition during blockchain shutdown where snapshot
generation could continue accessing the trie database after it has been
closed, leading to iterator errors. We noticed this in one of our nodes
on https://github.com/ava-labs/avalanchego, which relies on an older
version of geth with the same issue (so this behavior does happen!).

During node shutdown, the following sequence occurs:

1. `BlockChain.Stop()` calls `snaps.Release()` to clean up snapshot
resources
2. `Release()` only resets the cache but doesn't stop the generator
goroutine
3. The trie database is then closed via `triedb.Close()`
4. The still-running generator attempts to iterate storage tries
5. Iterator fails because the database is closed (`"Generator failed to
iterate storage trie"`)

## Problem

There are three related bugs:

1. `Release()` doesn't stop generation: The `diskLayer.Release()` method
only resets the cache without stopping ongoing snapshot generation,
leaving the generator goroutine running after database closure.
2. `stopGeneration()` has an incorrect completion check: The
`stopGeneration()` method checks `genMarker != nil` to determine if
generation is running. However, `genMarker` is set to nil when
generation completes successfully, even though the generator goroutine
is still waiting for the abort signal at the end of `generate()`. See
line 705 in `generate.go`:
eaaa5b716d/core/state/snapshot/generate.go (L699-L707)
This means `stopGeneration()` returns early without sending the abort
signal.
3. Node shutdown doesn't stop generation: During shutdown, no code path
calls `stopGeneration()` or sends the abort signal to the generator,
causing the generator to access a closed database and error.

## Fix

- Modified `diskLayer.Release()` to call `stopGeneration()` before
releasing resources
- Added cancelation architecture, removing reliance on someone having to
wait
- Fixed `stopGeneration()` to properly and safely stop snapshot
generation
- Added `TestGenerateGoroutineLeak` to verify the fix and prevent
regression. The test fails without the fix and passes with it.
- The test creates a snapshot with active generation, waits for
completion, then calls `Release()`, and uses `go.uber.org/goleak` to
assert no generator goroutine survives.
- Without the fix, the test fails: `Release()` returns without stopping
the generator, which stays parked at `generate.go:705` waiting for an
abort signal that never comes:

    ```
    --- FAIL: TestGenerateGoroutineLeak (0.88s)
        generate_test.go: found unexpected goroutines:
        [Goroutine 6 in state chan receive, with
         core/state/snapshot.(*diskLayer).generate on top of the stack:
         core/state/snapshot.(*diskLayer).generate(...)
            core/state/snapshot/generate.go:705
         created by core/state/snapshot.generateSnapshot
            core/state/snapshot/generate.go:79 ]
    ```
- With the fix, the test passes: `Release()` -> `stopGeneration()`
blocks until the generator goroutine has fully exited, so nothing leaks

Note that this fix follows the same pattern used in `Tree.Disable()` in
https://github.com/ethereum/go-ethereum/pull/30040, which introduced
`stopGeneration()` for use in `Disable()` and `Rebuild()` but didn't
address the shutdown path.

The test follows the same pattern used in
`TestCheckSimBackendGoroutineLeak`
2026-06-04 21:22:58 -05:00
..
filtermaps
forkid
history
overlay
rawdb triedb: reconcile stale storage roots in GenerateTrie, add cancel support (#34807) 2026-06-03 15:08:09 +08:00
state core/state/snapshot: snapshot generation shutdown race condition (#33540) 2026-06-04 21:22:58 -05:00
stateless eth/catalyst: implement engine_newPayloadWithWitnessV5 and use witness field spec ordering (#35009) 2026-05-21 21:00:57 +02:00
tracing core: introduce GasChangeHook v2 (#34946) 2026-05-13 10:53:47 +02:00
txpool core/txpool: drop reorged v0 blob sidecars (#35099) 2026-06-03 21:26:18 +08:00
types core/types: BlobHashes should iterate Commitments (#35109) 2026-06-04 11:17:46 -06:00
vm core, consensus, internal, eth, miner: construct block accessList (#34957) 2026-05-19 21:51:53 +08:00
.gitignore
bal_test.go core, cmd, internal: rework BAL json marshalling to adhere EELS (#34972) 2026-05-20 09:12:13 -04:00
bench_test.go
bintrie_witness_test.go core, consensus, internal, eth, miner: construct block accessList (#34957) 2026-05-19 21:51:53 +08:00
block_validator.go core/types/bal: add additional static validation for access lists (#34967) 2026-05-20 09:35:28 +08:00
block_validator_test.go
blockchain.go core/rawdb, ethdb, cmd, triedb: manage finalized block-accessList in freezer (#34977) 2026-06-01 11:01:42 +08:00
blockchain_insert.go
blockchain_reader.go eth/protocols/eth: implement eth71 bal response (#34879) 2026-05-19 20:25:13 +02:00
blockchain_repair_test.go
blockchain_sethead_test.go
blockchain_snapshot_test.go
blockchain_stats.go core: add code cache hit/miss meters (#34821) 2026-05-22 11:33:21 +08:00
blockchain_test.go core/rawdb, ethdb, cmd, triedb: manage finalized block-accessList in freezer (#34977) 2026-06-01 11:01:42 +08:00
chain_makers.go core: add slot number (#35036) 2026-05-26 12:27:07 +02:00
chain_makers_test.go
dao_test.go
error.go
eth_transfer_logs_test.go
events.go
evm.go core: use uint256 in core.Message (#34934) 2026-05-11 22:25:57 +08:00
gaspool.go
gen_genesis.go
genesis.go core, consensus, internal, eth, miner: construct block accessList (#34957) 2026-05-19 21:51:53 +08:00
genesis_alloc.go
genesis_test.go
headerchain.go
headerchain_test.go
jumpdest.go core/vm: global cache for jumpdest bitmaps (#34850) 2026-05-27 09:01:05 +02:00
mkalloc.go
rlp_test.go
sender_cacher.go
state_prefetcher.go core/vm: global cache for jumpdest bitmaps (#34850) 2026-05-27 09:01:05 +02:00
state_processor.go rpc, internal/telemetry: trace JSON-RPC response writes (#35049) 2026-06-02 14:13:06 +02:00
state_processor_test.go
state_transition.go core, consensus, internal, eth, miner: construct block accessList (#34957) 2026-05-19 21:51:53 +08:00
state_transition_test.go
stateless.go core/vm: global cache for jumpdest bitmaps (#34850) 2026-05-27 09:01:05 +02:00
txindexer.go core, core/txpool, eth: move subscriptions to constructor (#35048) 2026-06-01 08:13:59 +08:00
txindexer_test.go
types.go core/vm: global cache for jumpdest bitmaps (#34850) 2026-05-27 09:01:05 +02:00