Commit graph

13 commits

Author SHA1 Message Date
CPerezz
33227e7e6d
trie/bintrie: merge *Single paths into *ValuesAtStem
Per gballet's comment 3101751325 on PR #34055: the *Single functions
are essentially the same thing as *ValuesAtStem with one slot set. The
original design dispatched through *ValuesAtStem for dedup; this commit
restores that shape on the arena side.

- GetValue now delegates to GetValuesAtStem and indexes the returned
  256-slot array header (no allocation — the stem node returns its own
  inline values array as a slice).
- InsertSingle now builds a stack-allocated [StemNodeWidth][]byte with
  only the target slot set and delegates to InsertValuesAtStem.
- Delete the insertSingleInternal tree walker (~90 LOC) and the whole
  splitStemInsert (~60 LOC) — the *ValuesAtStem / splitStemValuesInsert
  pair already handles every case.

Addresses gballet comments 3101751325, 3101739001, 3101724199, 3101721238
(the last three subsumed by the consolidation — the duplicated helper
bodies no longer exist).

Net: ~150 LOC removed from store_ops.go. Allocation cost for InsertSingle
is bounded by the stack-allocated 256-slot array (one stack frame, no
heap allocation on the hot path).
2026-04-18 18:57:23 +02:00
CPerezz
bbf062c746
trie/bintrie: inline getSingle into GetValue
Gballet asked (comment 3101679920) to fold the unexported getSingle helper
into its single caller, and (comment 3101677731) to rename GetSingle ('bad
name: get single what?') with a top-level docstring.

- Inline getSingle into GetValue (one function instead of two).
- Rename GetSingle → GetValue and add a docstring.
- Drop the hasParent tracker that was only used for the 'hashed at root'
  guard — that case is now handled by kindEmpty / the top-level
  NewBinaryTrie-time root resolution, so remove the check rather than
  keep dead state.

CE2 will later fold this into GetValuesAtStem; this commit closes the
naming + inline asks independently.
2026-04-18 18:54:26 +02:00
CPerezz
e1859ea864
trie/bintrie: simplify StemNode to array-of-slices representation
Gballet asked on PR #34055 (comments 3100043116, 3100050542, and the
bit-check dedup at 3100114416 / 3100878310) to revert StemNode from the
packed-bytes representation to the straightforward array-of-slices.

Before: StemNode carried a bitmap, a concatenated valueData []byte, a
count, and a shared COW flag. Every read/write went through a bit-count
posInData lookup; every mutation through ensureWritable COW.

After: values [StemNodeWidth][]byte — 256 slots, nil == absent. No
bitmap lookup, no COW. Direct sn.values[suffix] access.

Supporting changes:
- Drop posInData, ensureWritable; rewrite getValue/hasValue/allValues/
  setValue as trivial slice access.
- Hash() iterates sn.values directly, matching master's shape.
- SerializeNode emits the bitmap + concatenated bytes on the wire from
  the array-of-slices at serialize time; wire format unchanged.
- decodeNode populates sn.values[i] slots by aliasing the serialized
  buffer (zero-copy).
- NodeStore.Copy deep-copies each slot.
- splitStemValuesInsert + the insertSingleInternal paths write directly
  to sn.values[i].

Trade-off: stems now carry 256 []byte headers (6144 B) instead of 1
concatenated slice (~32 B) + bitmap. Stem-pool scan cost returns to
parity with master (the existing valueData pointer already made the
pool non-noscan; rollback adds 255 more pointers per stem). The primary
arena win — pointer-free InternalNode pool — is preserved.
2026-04-18 18:53:07 +02:00
CPerezz
2d44d8a4b6
trie/bintrie: unexport package-internal arena identifiers
Gballet asked on PR #34055 to unexport nodeRef, nodeKind, and makeRef
(comments 3099846639, 3099847640, 3100717855) — none are used outside
trie/bintrie. Cascade to the internal-only support symbols and methods:

  NodeKind          → nodeKind
  KindEmpty/...     → kindEmpty/...
  NodeRef           → nodeRef
  EmptyRef          → emptyRef
  MakeRef           → makeRef
  NodeStore.Root    → deleted; inlined to s.root field access (same pkg)
  NodeStore.SetRoot → deleted; inlined to s.root = ref
  NodeStore.ComputeHash/SerializeNode/DeserializeNode(WithHash)/
  CollectNodes/ToDot/GetHeight → lowercased

All 9 method signatures took or returned nodeRef so their export would
have tripped revive:unexported-return after the type rename. Zero
external callers means no API break. The private deserializeNode helper
was renamed to decodeNode to free the name for the newly-private
deserializeNode public function.

Pure rename; no behaviour change.
2026-04-18 18:49:04 +02:00
CPerezz
939b36345f
trie/bintrie: port dirty flag + CollectNodes skip-clean from master
Master added (via PR #34754) a dirty bool to InternalNode/StemNode plus a
CollectNodes short-circuit that skips clean subtrees — the arena branch
diverged before that landed. Port the semantics onto the arena shape:

- Add dirty bool to InternalNode and StemNode.
- Wire dirty=true alongside every existing mustRecompute=true setter in
  node_store.go (newInternalRef, newStemRef) and store_ops.go (8 mutation
  sites across InsertSingle/insertSingleInternal/InsertValuesAtStem/
  insertValuesAtStem/splitStemInsert/splitStemValuesInsert).
- Add 'if !node.dirty { return nil }' gate at the top of CollectNodes for
  both KindInternal and KindStem; clear dirty after flushfn runs.
- Plumb a dirty parameter through deserializeNode; DeserializeNode passes
  dirty=true (safe default), DeserializeNodeWithHash passes dirty=false
  (loaded from disk, blob matches).

The arena test in trie_test.go that was auto-merged from master used
master-shape struct literals (tr.root, NewBinaryNode) that don't exist on
arena; delete those and replace with TestCommitSkipCleanSubtrees, an
arena-native version that asserts first-Commit flushes all nodes, no-op
Commit flushes none, and single-leaf Commit flushes only the root-to-leaf
path.
2026-04-18 18:45:12 +02:00
CPerezz
b3b86a873a
trie/bintrie: merge makeKeyPath into keyToPath
Drop the panic-on-error variant. All callers are inside methods
that already propagate errors, so the error-returning form is the
right default.
2026-04-18 18:38:37 +02:00
CPerezz
84c61897b3
trie/bintrie: use type alias for HashedNode
Replace the single-field struct with a type alias on common.Hash.
Both have identical layout (32 bytes, no pointers) and noscan span
placement, but the alias matches master's style and reads more
naturally. A zero-arg Hash() method keeps call sites terse.
2026-04-18 18:38:37 +02:00
CPerezz
5f94d26db8
trie/bintrie: update copyright year on newly added files
These four files were introduced in this PR and should carry the
current year.
2026-04-18 18:38:37 +02:00
CPerezz
b4a7118d06
trie/bintrie: trim verbose doc comments to essentials 2026-04-18 18:38:37 +02:00
CPerezz
efc164b850
trie/bintrie: move ensureWritable into setValue 2026-04-18 18:38:37 +02:00
CPerezz
0f6b066156
trie/bintrie: add nil-resolver guards for hashed node resolution 2026-04-18 18:38:36 +02:00
CPerezz
b6d385e702
trie/bintrie: add depth bound to splitStemInsert 2026-04-18 18:38:36 +02:00
CPerezz
8a5e777fde
trie/bintrie: replace BinaryNode interface with GC-free NodeRef arena
Replace the BinaryNode interface (which uses Go interface pointers that
the GC must scan) with NodeRef uint32 indices into typed arena pools.
NodeRef packs a 2-bit kind tag and 30-bit pool index into a single
uint32, making it invisible to the garbage collector.

NodeStore manages chunked typed pools per node kind:
- InternalNode pool: ZERO Go pointers (children are NodeRef, hash is
  [32]byte) → allocated in noscan spans, GC skips entirely
- HashedNode pool: ZERO Go pointers → noscan spans
- StemNode pool: ONE pointer per node (valueData []byte) → minimal GC

For a trie with 25K InternalNodes, this reduces GC-scanned pointer-words
from ~125K to ~10K (85% reduction). CPU profiling showed 44% of time
in GC; this refactor directly addresses that bottleneck.

Serialization format is unchanged — the on-disk representation is
fully compatible. All existing tests pass.
2026-04-18 18:38:15 +02:00