This pull request optimizes trie hashing by reducing memory allocation
overhead. Specifically:
- define a fullNodeEncoder pool to reuse encoders and avoid memory
allocations.
- simplify the encoding logic for shortNode and fullNode by getting rid
of the Go interfaces.
This pull request removes the node copy operation to reduce memory
allocation. Key Changes as below:
**(a) Use `decodeNodeUnsafe` for decoding nodes retrieved from the trie
node reader**
In the current implementation of the MPT, once a trie node blob is
retrieved, it is passed to `decodeNode` for decoding. However,
`decodeNode` assumes the supplied byte slice might be mutated later, so
it performs a deep copy internally before parsing the node.
Given that the node reader is implemented by the path database and the
hash database, both of which guarantee the immutability of the returned
byte slice. By restricting the node reader interface to explicitly
guarantee that the returned byte slice will not be modified, we can
safely replace `decodeNode` with `decodeNodeUnsafe`. This eliminates the
need for a redundant byte copy during each node resolution.
**(b) Modify the trie in place**
In the current implementation of the MPT, a copy of a trie node is
created before any modifications are made. These modifications include:
- Node resolution: Converting the value from a hash to the actual node.
- Node hashing: Tagging the hash into its cache.
- Node commit: Replacing the children with its hash.
- Structural changes: For example, adding a new child to a fullNode or
replacing a child of a shortNode.
This mechanism ensures that modifications only affect the live tree,
leaving all previously created copies unaffected.
Unfortunately, this property leads to a huge memory allocation
requirement. For example, if we want to modify the fullNode for n times,
the node will be copied for n times.
In this pull request, all the trie modifications are made in place. In
order to make sure all previously created copies are unaffected, the
`Copy` function now will deep-copy all the live nodes rather than the
root node itself.
With this change, while the `Copy` function becomes more expensive, it's
totally acceptable as it's not a frequently used one. For the normal
trie operations (Get, GetNode, Hash, Commit, Insert, Delete), the node
copy is not required anymore.
This avoids copying the input []byte while decoding trie nodes. In most
cases, particularly when the input slice is provided by the underlying
database, this optimization is safe to use.
For cases where the origin of the input slice is unclear, the copying version
is retained. The new code performs better even when the input must be
copied, because it is now only copied once in decodeNode.
This change speeds up trie hashing and all other activities that require
RLP encoding of trie nodes by approximately 20%. The speedup is achieved by
avoiding reflection overhead during node encoding.
The interface type trie.node now contains a method 'encode' that works with
rlp.EncoderBuffer. Management of EncoderBuffers is left to calling code.
trie.hasher, which is pooled to avoid allocations, now maintains an
EncoderBuffer. This means memory resources related to trie node encoding
are tied to the hasher pool.
Co-authored-by: Felix Lange <fjl@twurst.com>
The current trie memory database/cache that we do pruning on stores
trie nodes as binary rlp encoded blobs, and also stores the node
relationships/references for GC purposes. However, most of the trie
nodes (everything apart from a value node) is in essence just a
collection of references.
This PR switches out the RLP encoded trie blobs with the
collapsed-but-not-serialized trie nodes. This permits most of the
references to be recovered from within the node data structure,
avoiding the need to track them a second time (expensive memory wise).
Commit 40cdcf1183 broke the optimisation which kept nodes resolved
during Get in the trie. The decoder assigned cache generation 0
unconditionally, causing resolved nodes to get flushed on Commit.
This commit fixes it and adds two tests.
* trie: store nodes as pointers
This avoids memory copies when unwrapping node interface values.
name old time/op new time/op delta
Get 388ns ± 8% 215ns ± 2% -44.56% (p=0.000 n=15+15)
GetDB 363ns ± 3% 202ns ± 2% -44.21% (p=0.000 n=15+15)
UpdateBE 1.57µs ± 2% 1.29µs ± 3% -17.80% (p=0.000 n=13+15)
UpdateLE 1.92µs ± 2% 1.61µs ± 2% -16.25% (p=0.000 n=14+14)
HashBE 2.16µs ± 6% 2.18µs ± 6% ~ (p=0.436 n=15+15)
HashLE 7.43µs ± 3% 7.21µs ± 3% -2.96% (p=0.000 n=15+13)
* trie: close temporary databases in GetDB benchmark
* trie: don't keep []byte from DB load around
Nodes decoded from a DB load kept hashes and values as sub-slices of
the DB value. This can be a problem because loading from leveldb often
returns []byte with a cap that's larger than necessary, increasing
memory usage.
* trie: unload old cached nodes
* trie, core/state: use cache unloading for account trie
* trie: use explicit private flags (fixes Go 1.5 reflection issue).
* trie: fixup cachegen overflow at request of nick
* core/state: rename journal size constant