This is something interesting I came across during my benchmarks, we
spent ~3.8% of all allocations allocating the header number on the heap.
```
(pprof) list GetHeaderByHash
Total: 38197204475
ROUTINE ======================== github.com/ethereum/go-ethereum/core.(*BlockChain).GetHeaderByHash in github.com/ethereum/go-ethereum/core/blockchain_reader.go
0 5786566117 (flat, cum) 15.15% of Total
. . 79:func (bc *BlockChain) GetHeaderByHash(hash common.Hash) *types.Header {
. 5786566117 80: return bc.hc.GetHeaderByHash(hash)
. . 81:}
. . 82:
. . 83:// GetHeaderByNumber retrieves a block header from the database by number,
. . 84:// caching it (associated with its hash) if found.
. . 85:func (bc *BlockChain) GetHeaderByNumber(number uint64) *types.Header {
ROUTINE ======================== github.com/ethereum/go-ethereum/core.(*HeaderChain).GetHeaderByHash in github.com/ethereum/go-ethereum/core/headerchain.go
0 5786566117 (flat, cum) 15.15% of Total
. . 404:func (hc *HeaderChain) GetHeaderByHash(hash common.Hash) *types.Header {
. 1471264309 405: number := hc.GetBlockNumber(hash)
. . 406: if number == nil {
. . 407: return nil
. . 408: }
. 4315301808 409: return hc.GetHeader(hash, *number)
. . 410:}
. . 411:
. . 412:// HasHeader checks if a block header is present in the database or not.
. . 413:// In theory, if header is present in the database, all relative components
. . 414:// like td and hash->number should be present too.
(pprof) list GetBlockNumber
Total: 38197204475
ROUTINE ======================== github.com/ethereum/go-ethereum/core.(*HeaderChain).GetBlockNumber in github.com/ethereum/go-ethereum/core/headerchain.go
94438817 1471264309 (flat, cum) 3.85% of Total
. . 100:func (hc *HeaderChain) GetBlockNumber(hash common.Hash) *uint64 {
94438817 94438817 101: if cached, ok := hc.numberCache.Get(hash); ok {
. . 102: return &cached
. . 103: }
. 1376270828 104: number := rawdb.ReadHeaderNumber(hc.chainDb, hash)
. . 105: if number != nil {
. 554664 106: hc.numberCache.Add(hash, *number)
. . 107: }
. . 108: return number
. . 109:}
. . 110:
. . 111:type headerWriteResult struct {
(pprof) list ReadHeaderNumber
Total: 38197204475
ROUTINE ======================== github.com/ethereum/go-ethereum/core/rawdb.ReadHeaderNumber in github.com/ethereum/go-ethereum/core/rawdb/accessors_chain.go
204606513 1376270828 (flat, cum) 3.60% of Total
. . 146:func ReadHeaderNumber(db ethdb.KeyValueReader, hash common.Hash) *uint64 {
109577863 1281242178 147: data, _ := db.Get(headerNumberKey(hash))
. . 148: if len(data) != 8 {
. . 149: return nil
. . 150: }
95028650 95028650 151: number := binary.BigEndian.Uint64(data)
. . 152: return &number
. . 153:}
. . 154:
. . 155:// WriteHeaderNumber stores the hash->number mapping.
. . 156:func WriteHeaderNumber(db ethdb.KeyValueWriter, hash common.Hash, number uint64) {
```
Opening this to discuss the idea, I know that rawdb.EmptyNumber is not a
great name for the variable, open to suggestions
This pull request refines the filtermap implementation, defining key
APIs for map and
epoch calculations to improve readability.
This pull request doesn't change any logic, it's a pure cleanup.
---------
Co-authored-by: zsfelfoldi <zsfelfoldi@gmail.com>
Reading a single transaction out of a block shouldn't need decoding the
entire body
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
This PR fixes a bug in the map renderer that sometimes used an obsolete
block log value pointer to initialize the iterator for rendering from a
snapshot. This bug was triggered by chain reorgs and sometimes caused
indexing errors and invalid search results. A few other conditions are
also made safer that were not reported to cause issues yet but could
potentially be unsafe in some corner cases. A new unit test is also
added that reproduced the bug but passes with the new fixes.
Fixes https://github.com/ethereum/go-ethereum/issues/31593
Might also fix https://github.com/ethereum/go-ethereum/issues/31589
though this issue has not been reproduced yet, but it appears to be
related to a log index database corruption around a specific block,
similarly to the other issue.
Note that running this branch resets and regenerates the log index
database. For this purpose a `Version` field has been added to
`rawdb.FilterMapsRange` which will also make this easier in the future
if a breaking database change is needed or the existing one is
considered potentially broken due to a bug, like in this case.
This PR adds `rawdb.SafeDeleteRange` and uses it for range deletion in
`core/filtermaps`. This includes deleting the old bloombits database,
resetting the log index database and removing index data for unindexed
tail epochs (which previously weren't properly implemented for the
fallback case).
`SafeDeleteRange` either calls `ethdb.DeleteRange` if the node uses the
new path based state scheme or uses an iterator based fallback method
that safely skips trie nodes in the range if the old hash based state
scheme is used. Note that `ethdb.DeleteRange` also has its own iterator
based fallback implementation in `ethdb/leveldb`. If a path based state
scheme is used and the backing db is pebble (as it is on the majority of
new nodes) then `rawdb.SafeDeleteRange` uses the fast native range
delete.
Also note that `rawdb.SafeDeleteRange` has different semantics from
`ethdb.DeleteRange`, it does not automatically return if the operation
takes a long time. Instead it receives a `stopCallback` that can
interrupt the process if necessary. This is because in the safe mode
potentially a lot of entries are iterated without being deleted (this is
definitely the case when deleting the old bloombits database which has a
single byte prefix) and therefore restarting the process every time a
fixed number of entries have been iterated would result in a quadratic
run time in the number of skipped entries.
When running in safe mode, unindexing an epoch takes about a second,
removing bloombits takes around 10s while resetting a full log index
might take a few minutes. If a range delete operation takes a
significant amount of time then log messages are printed. Also, any
range delete operation can be interrupted by shutdown (tail uinindexing
can also be interrupted by head indexing, similarly to how tail indexing
works). If the last unindexed epoch might have "dirty" index data left
then the indexed map range points to the first valid epoch and
`cleanedEpochsBefore` points to the previous, potentially dirty one. At
startup it is always assumed that the epoch before the first fully
indexed one might be dirty. New tail maps are never rendered and also no
further maps are unindexed before the previous unindexing is properly
cleaned up.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
Instead of reporting all filtermaps stuff in one line, I'm breaking it
down into the three separate kinds of entries here.
```
+-----------------------+-----------------------------+------------+------------+
| DATABASE | CATEGORY | SIZE | ITEMS |
+-----------------------+-----------------------------+------------+------------+
| Key-Value store | Log index filter-map rows | 59.21 GiB | 616077345 |
| Key-Value store | Log index last-block-of-map | 12.35 MiB | 269755 |
| Key-Value store | Log index block-lv | 421.70 MiB | 22109169 |
```
Also added some other changes to make it easier to debug:
- restored bloombits into the inspect output, so we notice if it doesn't
get deleted for some reason
- tracking of unaccounted key examples
This adds a new subcommand 'geth prune-history' that removes the pre-merge history
on supported networks. Geth is not fully ready to work in this mode, please do not run
this command on your production node.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
In #31384 we unindex TXes prior to the merge block. However when the
node starts up it will try to re-index those back if the config is to index the
whole chain. This change makes the indexer aware of the history cutoff block,
avoiding reindexing in that segment.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
This change introduces garbage collection for the light client. Historical
chain data is deleted periodically. If you want to disable the GC, use
the --light.nopruning flag.
* cmd, core, eth: init tx lookup in background
* core/rawdb: tiny log fixes to make it clearer what's happening
* core, eth: fix rebase errors
* core/rawdb: make reindexing less generic, but more optimal
* rlp: implement rlp list iterator
* core/rawdb: new implementation of tx indexing/unindex using generic tx iterator and hashing rlp-data
* core/rawdb, cmd/utils: fix review concerns
* cmd/utils: fix merge issue
* core/rawdb: add some log formatting polishes
Co-authored-by: rjl493456442 <garyrong0905@gmail.com>
Co-authored-by: Péter Szilágyi <peterke@gmail.com>
* core: reinit chain from freezer in batches
* core/rawdb: concurrent database reinit from freezer dump
* core/rawdb: reinit from freezer in sequential order
* all: freezer style syncing
core, eth, les, light: clean up freezer relative APIs
core, eth, les, trie, ethdb, light: clean a bit
core, eth, les, light: add unit tests
core, light: rewrite setHead function
core, eth: fix downloader unit tests
core: add receipt chain insertion test
core: use constant instead of hardcoding table name
core: fix rollback
core: fix setHead
core/rawdb: remove canonical block first and then iterate side chain
core/rawdb, ethdb: add hasAncient interface
eth/downloader: calculate ancient limit via cht first
core, eth, ethdb: lots of fixes
* eth/downloader: print ancient disable log only for fast sync
* core: lookup txs by block number instead of block hash
Transaction hashes now store a reference to their corresponding
block number as opposed to their hash. In benchmarks this was
shown to reduce storage by over 12 GB.
The main limitation of this approach is that transactions on
non-canonical blocks could never be looked up, however that is
currently not supported.
The database version has been upgraded to version 5 and the
transaction lookup process is backwards-compatible with the
prior two transaction lookup formats prexisting in the
database instance. Tests have been added to ensure this.
* core/rawdb: tiny review nit fixes
This PR is a more advanced form of the dirty-to-clean cacher (#18995),
where we reuse previous database write batches as datasets to uncache,
saving a dirty-trie-iteration and a dirty-trie-rlp-reencoding per block.