Gballet asked on PR #34055 (comments 3100043116, 3100050542, and the
bit-check dedup at 3100114416 / 3100878310) to revert StemNode from the
packed-bytes representation to the straightforward array-of-slices.
Before: StemNode carried a bitmap, a concatenated valueData []byte, a
count, and a shared COW flag. Every read/write went through a bit-count
posInData lookup; every mutation through ensureWritable COW.
After: values [StemNodeWidth][]byte — 256 slots, nil == absent. No
bitmap lookup, no COW. Direct sn.values[suffix] access.
Supporting changes:
- Drop posInData, ensureWritable; rewrite getValue/hasValue/allValues/
setValue as trivial slice access.
- Hash() iterates sn.values directly, matching master's shape.
- SerializeNode emits the bitmap + concatenated bytes on the wire from
the array-of-slices at serialize time; wire format unchanged.
- decodeNode populates sn.values[i] slots by aliasing the serialized
buffer (zero-copy).
- NodeStore.Copy deep-copies each slot.
- splitStemValuesInsert + the insertSingleInternal paths write directly
to sn.values[i].
Trade-off: stems now carry 256 []byte headers (6144 B) instead of 1
concatenated slice (~32 B) + bitmap. Stem-pool scan cost returns to
parity with master (the existing valueData pointer already made the
pool non-noscan; rollback adds 255 more pointers per stem). The primary
arena win — pointer-free InternalNode pool — is preserved.
Gballet asked (comment 3099953085) to leave the sha256Sum256 / constant
parallelHashDepth optimisation out of this PR: it's an orthogonal
microbenchmark concern that should be revisited post-group-depth under
Go 1.26.
- Delete the sha256Sum256 helper from hasher.go.
- Delete the const parallelHashDepth = 4 from hasher.go.
- Restore master's dynamic parallelDepth() helper in store_commit.go
(copy verbatim — min(bits.Len(NumCPU), 8)).
- In hashInternal's shallow-parallel branch, call sha256.Sum256 directly
(std-lib, stack-allocated [32]byte; common.Hash is a type alias for
[32]byte so no conversion needed).
- In hashInternal's deep-sequential branch, use the pooled newSha256 /
returnSha256 hasher (matches master's internal_node.go:170-185).
Intentional trade-off: the deep branch now re-introduces per-hash
sync.Pool Get/Put plus a 32-byte h.Sum(nil) allocation. Zero regression
vs master; foregoes the arena's proposed stack-based hashing until
Go 1.26 + post-group-depth benchmarks.
Gballet asked on PR #34055 to unexport nodeRef, nodeKind, and makeRef
(comments 3099846639, 3099847640, 3100717855) — none are used outside
trie/bintrie. Cascade to the internal-only support symbols and methods:
NodeKind → nodeKind
KindEmpty/... → kindEmpty/...
NodeRef → nodeRef
EmptyRef → emptyRef
MakeRef → makeRef
NodeStore.Root → deleted; inlined to s.root field access (same pkg)
NodeStore.SetRoot → deleted; inlined to s.root = ref
NodeStore.ComputeHash/SerializeNode/DeserializeNode(WithHash)/
CollectNodes/ToDot/GetHeight → lowercased
All 9 method signatures took or returned nodeRef so their export would
have tripped revive:unexported-return after the type rename. Zero
external callers means no API break. The private deserializeNode helper
was renamed to decodeNode to free the name for the newly-private
deserializeNode public function.
Pure rename; no behaviour change.
Master added (via PR #34754) a dirty bool to InternalNode/StemNode plus a
CollectNodes short-circuit that skips clean subtrees — the arena branch
diverged before that landed. Port the semantics onto the arena shape:
- Add dirty bool to InternalNode and StemNode.
- Wire dirty=true alongside every existing mustRecompute=true setter in
node_store.go (newInternalRef, newStemRef) and store_ops.go (8 mutation
sites across InsertSingle/insertSingleInternal/InsertValuesAtStem/
insertValuesAtStem/splitStemInsert/splitStemValuesInsert).
- Add 'if !node.dirty { return nil }' gate at the top of CollectNodes for
both KindInternal and KindStem; clear dirty after flushfn runs.
- Plumb a dirty parameter through deserializeNode; DeserializeNode passes
dirty=true (safe default), DeserializeNodeWithHash passes dirty=false
(loaded from disk, blob matches).
The arena test in trie_test.go that was auto-merged from master used
master-shape struct literals (tr.root, NewBinaryNode) that don't exist on
arena; delete those and replace with TestCommitSkipCleanSubtrees, an
arena-native version that asserts first-Commit flushes all nodes, no-op
Commit flushes none, and single-leaf Commit flushes only the root-to-leaf
path.
A prior commit aggressively trimmed comments. This restores the
ones that carried real information — ownership contracts on
returned slices, the index-tracking semantics inside Next(),
the Parent() grandparent note, and the "at a leaf" stem rule —
so future readers aren't left guessing. Also elaborates the
parallel-hashing rationale in hashInternal.
Replace the single-field struct with a type alias on common.Hash.
Both have identical layout (32 bytes, no pointers) and noscan span
placement, but the alias matches master's style and reads more
naturally. A zero-arg Hash() method keeps call sites terse.
Replace the BinaryNode interface (which uses Go interface pointers that
the GC must scan) with NodeRef uint32 indices into typed arena pools.
NodeRef packs a 2-bit kind tag and 30-bit pool index into a single
uint32, making it invisible to the garbage collector.
NodeStore manages chunked typed pools per node kind:
- InternalNode pool: ZERO Go pointers (children are NodeRef, hash is
[32]byte) → allocated in noscan spans, GC skips entirely
- HashedNode pool: ZERO Go pointers → noscan spans
- StemNode pool: ONE pointer per node (valueData []byte) → minimal GC
For a trie with 25K InternalNodes, this reduces GC-scanned pointer-words
from ~125K to ~10K (85% reduction). CPU profiling showed 44% of time
in GC; this refactor directly addresses that bottleneck.
Serialization format is unchanged — the on-disk representation is
fully compatible. All existing tests pass.