BinaryTrie.Commit unconditionally walked every resolved in-memory node
and flushed it into the NodeSet, producing one Pebble write per resolved
internal + stem node on every block — even when the node's on-disk blob
was bitwise identical to the previous commit. On a warm 400M-state
workload this meant tens of thousands of redundant 65-byte writes per
block, compounding Pebble compaction pressure on every commit.
The existing mustRecompute flag tracks hash staleness, not disk-blob
staleness: after Hash() completes, mustRecompute is cleared even though
the fresh blob has not been persisted. It is therefore insufficient for
a skip-flush optimization.
This change mirrors MPT's committer pattern (trie/committer.go:51-56)
by adding a dirty flag on InternalNode and StemNode with the semantics
"the on-disk blob is stale". The flag is:
- set to true wherever the node is created or structurally modified
(the same call sites that already set mustRecompute = true),
- set to false only after the node has been passed to the flushfn
inside CollectNodes,
- left false on nodes produced by DeserializeNodeWithHash, matching
the "loaded from disk, already persisted" semantics.
CollectNodes short-circuits on !dirty subtrees; the propagation
invariant (an ancestor of any dirty node is itself dirty) is already
maintained by the existing InsertValuesAtStem / Insert paths, which now
mirror every mustRecompute = true setter with a dirty = true setter.
Serialization format, hash computation, state root, and the pathdb
write path are untouched. Empty NodeSets are already tolerated by
triedb/pathdb.writeNodes.
BenchmarkCollectNodes_SparseWrite (10,000-stem trie, one-leaf
modification + Commit per iteration, Apple M4 Pro):
before 12,653,000 ns/op 107,224,740 B/op 80,953 allocs/op
after 7,336 ns/op 37,774 B/op 134 allocs/op
speedup: ~1,725x memory: ~2,839x less allocs: ~604x fewer
End-to-end impact on a benchmarked geth build depends on workload;
the new TestBinaryTrieCommitIncremental provides a structural
regression guard.
This is an optimization that existed for verkle and the MPT, but that
got dropped during the rebase.
Mark the nodes that were modified as needing recomputation, and skip the
hash computation if this is not needed. Otherwise, the whole tree is
hashed, which kills performance.
This is broken off of #31730 to only focus on testing networks that
start with verkle at genesis.
The PR has seen a lot of work since its creation, and it now targets
creating and re-executing tests for a binary tree testnet without the
transition (so it starts at genesis). The transition tree has been moved
to its own package. It also replaces verkle with the binary tree for
this specific application.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Implement the binary tree as specified in [eip-7864](https://eips.ethereum.org/EIPS/eip-7864).
This will gradually replace verkle trees in the codebase. This is only
running the tests and will not be executed in production, but will help
me rebase some of my work, so that it doesn't bitrot as much.
---------
Signed-off-by: Guillaume Ballet
Co-authored-by: Parithosh Jayanthi <parithosh.jayanthi@ethereum.org>
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>