I removed `Iterator.Count` in #33840, because it appeared to be unused
and did not provide the documented invariant: the returned count should
always be an upper bound on the number of iterations allowed by `Next`.
In order to make `Count` work, the semantics of `CountValues` has to
change to return the number of items up and including the invalid one. I
have reviewed all callsites of `CountValues` to assess if changing this
is safe. There aren't that many, and the only call that doesn't check
the error and return is in the trie node parser,
`trie.decodeNodeUnsafe`. There, we distinguish the node type based on
the number of items, and it previously returned an error for item count
zero. In order to avoid any potential issue that could result from this
change, I'm adding an error check in that function, though it isn't
necessary.
This changes `RawList` to ensure the count of items is always valid.
Lists with invalid structure, i.e. ones where an element exceeds the
size of the container, are now detected during decoding of the `RawList`
and thus cannot exist.
Also remove `RawList.Empty` since it is now fully redundant, and
`Iterator.Count` since it returns incorrect results in the presence of
invalid input. There are no callers of these methods (yet).
Most uses of the iterator are like this:
it, _ := rlp.NewListIterator(data)
for it.Next() {
do(it.Value())
}
This doesn't require the iterator to be a pointer and it's better to
have it stack-allocated. AFAIK the compiler cannot prove it is OK to
stack-allocate when it is returned as a pointer because the methods of
`Iterator` use pointer receiver and also mutate the object.
The iterator type was not exported until very recently, so I think it is
still OK to change this API.
This adds a new type wrapper that decodes as a list, but does not
actually decode the contents of the list. The type parameter exists as a
marker, and enables decoding the elements lazily. RawList can also be
used for building a list incrementally.
This pull request introduces a mechanism to compress trienode history by
storing only the node diffs between consecutive versions.
- For full nodes, only the modified children are recorded in the history;
- For short nodes, only the modified value is stored;
If the node type has changed, or if the node is newly created or
deleted, the entire node value is stored instead.
To mitigate the overhead of reassembling nodes from diffs during history
reads, checkpoints are introduced by periodically storing full node values.
The current checkpoint interval is set to every 16 mutations, though
this parameter may be made configurable in the future.
Instead of using a limit of three nodes per message, we can pack more nodes
into each message based on ENR size. In my testing, this halves the number
of sent NODES messages, because ENR size is usually < 300 bytes.
This also adds RLP helper functions that compute the encoded size of
[]byte and string.
Co-authored-by: Martin Holst Swende <martin@swende.se>
This change significantly improves the performance of RLPx message reads
and writes. In the previous implementation, reading and writing of
message frames performed multiple reads and writes on the underlying
network connection, and allocated a new []byte buffer for every read.
In the new implementation, reads and writes re-use buffers, and perform
much fewer system calls on the underlying connection. This doubles the
theoretically achievable throughput on a single connection, as shown by
the benchmark result:
name old speed new speed delta
Throughput-8 70.3MB/s ± 0% 155.4MB/s ± 0% +121.11% (p=0.000 n=9+8)
The change also removes support for the legacy, pre-EIP-8 handshake encoding.
As of May 2021, no actively maintained client sends this format.
This PR contains a minor optimization in derivesha, by exposing the RLP
int-encoding and making use of it to write integers directly to a
buffer (an RLP integer is known to never require more than 9 bytes
total). rlp.AppendUint64 might be useful in other places too.
The code assumes, just as before, that the hasher (a trie) will copy the
key internally, which it does when doing keybytesToHex(key).
Co-authored-by: Felix Lange <fjl@twurst.com>