go-ethereum/trie/bintrie
CPerezz fad11d5795
trie/bintrie: skip clean nodes in CollectNodes to reduce commit write amplification
BinaryTrie.Commit unconditionally walked every resolved in-memory node
and flushed it into the NodeSet, producing one Pebble write per resolved
internal + stem node on every block — even when the node's on-disk blob
was bitwise identical to the previous commit. On a warm 400M-state
workload this meant tens of thousands of redundant 65-byte writes per
block, compounding Pebble compaction pressure on every commit.

The existing mustRecompute flag tracks hash staleness, not disk-blob
staleness: after Hash() completes, mustRecompute is cleared even though
the fresh blob has not been persisted. It is therefore insufficient for
a skip-flush optimization.

This change mirrors MPT's committer pattern (trie/committer.go:51-56)
by adding a dirty flag on InternalNode and StemNode with the semantics
"the on-disk blob is stale". The flag is:

  - set to true wherever the node is created or structurally modified
    (the same call sites that already set mustRecompute = true),
  - set to false only after the node has been passed to the flushfn
    inside CollectNodes,
  - left false on nodes produced by DeserializeNodeWithHash, matching
    the "loaded from disk, already persisted" semantics.

CollectNodes short-circuits on !dirty subtrees; the propagation
invariant (an ancestor of any dirty node is itself dirty) is already
maintained by the existing InsertValuesAtStem / Insert paths, which now
mirror every mustRecompute = true setter with a dirty = true setter.

Serialization format, hash computation, state root, and the pathdb
write path are untouched. Empty NodeSets are already tolerated by
triedb/pathdb.writeNodes.

BenchmarkCollectNodes_SparseWrite (10,000-stem trie, one-leaf
modification + Commit per iteration, Apple M4 Pro):

  before   12,653,000 ns/op   107,224,740 B/op   80,953 allocs/op
  after         7,336 ns/op        37,774 B/op      134 allocs/op

  speedup: ~1,725x   memory: ~2,839x less   allocs: ~604x fewer

End-to-end impact on a benchmarked geth build depends on workload;
the new TestBinaryTrieCommitIncremental provides a structural
regression guard.
2026-04-17 15:54:16 +02:00
..
binary_node.go trie/bintrie: skip clean nodes in CollectNodes to reduce commit write amplification 2026-04-17 15:54:16 +02:00
binary_node_test.go cmd/evm/internal/t8ntool, trie: support for verkle-at-genesis, use UBT, and move the transition tree to its own package (#32445) 2025-11-14 15:25:30 +01:00
empty.go trie/bintrie: cache hashes of clean nodes so as not to rehash the whole tree (#33961) 2026-03-06 18:06:24 +01:00
empty_test.go trie/bintrie: add eip7864 binary trees and run its tests (#32365) 2025-09-01 21:06:51 +08:00
hashed_node.go trie/bintrie: cache hashes of clean nodes so as not to rehash the whole tree (#33961) 2026-03-06 18:06:24 +01:00
hashed_node_test.go cmd/evm/internal/t8ntool, trie: support for verkle-at-genesis, use UBT, and move the transition tree to its own package (#32445) 2025-11-14 15:25:30 +01:00
hasher.go trie/bintrie: use a sync.Pool when hashing binary tree nodes (#33989) 2026-03-12 10:20:12 +01:00
internal_node.go trie/bintrie: skip clean nodes in CollectNodes to reduce commit write amplification 2026-04-17 15:54:16 +02:00
internal_node_test.go trie/bintrie: skip clean nodes in CollectNodes to reduce commit write amplification 2026-04-17 15:54:16 +02:00
iterator.go trie/bintrie: fix NodeIterator Empty node handling and expose tree accessors (#34056) 2026-03-20 13:53:14 -04:00
iterator_test.go trie/bintrie: fix NodeIterator Empty node handling and expose tree accessors (#34056) 2026-03-20 13:53:14 -04:00
key_encoding.go trie/bintrie: spec change, big endian hashing of slot key (#34670) 2026-04-13 09:42:37 +02:00
stem_node.go trie/bintrie: skip clean nodes in CollectNodes to reduce commit write amplification 2026-04-17 15:54:16 +02:00
stem_node_test.go trie/bintrie: skip clean nodes in CollectNodes to reduce commit write amplification 2026-04-17 15:54:16 +02:00
trie.go cmd, core, trie, triedb: split CachingDB into merkle + binary dbs. (#34700) 2026-04-17 08:55:54 +08:00
trie_test.go trie/bintrie: skip clean nodes in CollectNodes to reduce commit write amplification 2026-04-17 15:54:16 +02:00