This is something interesting I came across during my benchmarks, we
spent ~3.8% of all allocations allocating the header number on the heap.
```
(pprof) list GetHeaderByHash
Total: 38197204475
ROUTINE ======================== github.com/ethereum/go-ethereum/core.(*BlockChain).GetHeaderByHash in github.com/ethereum/go-ethereum/core/blockchain_reader.go
0 5786566117 (flat, cum) 15.15% of Total
. . 79:func (bc *BlockChain) GetHeaderByHash(hash common.Hash) *types.Header {
. 5786566117 80: return bc.hc.GetHeaderByHash(hash)
. . 81:}
. . 82:
. . 83:// GetHeaderByNumber retrieves a block header from the database by number,
. . 84:// caching it (associated with its hash) if found.
. . 85:func (bc *BlockChain) GetHeaderByNumber(number uint64) *types.Header {
ROUTINE ======================== github.com/ethereum/go-ethereum/core.(*HeaderChain).GetHeaderByHash in github.com/ethereum/go-ethereum/core/headerchain.go
0 5786566117 (flat, cum) 15.15% of Total
. . 404:func (hc *HeaderChain) GetHeaderByHash(hash common.Hash) *types.Header {
. 1471264309 405: number := hc.GetBlockNumber(hash)
. . 406: if number == nil {
. . 407: return nil
. . 408: }
. 4315301808 409: return hc.GetHeader(hash, *number)
. . 410:}
. . 411:
. . 412:// HasHeader checks if a block header is present in the database or not.
. . 413:// In theory, if header is present in the database, all relative components
. . 414:// like td and hash->number should be present too.
(pprof) list GetBlockNumber
Total: 38197204475
ROUTINE ======================== github.com/ethereum/go-ethereum/core.(*HeaderChain).GetBlockNumber in github.com/ethereum/go-ethereum/core/headerchain.go
94438817 1471264309 (flat, cum) 3.85% of Total
. . 100:func (hc *HeaderChain) GetBlockNumber(hash common.Hash) *uint64 {
94438817 94438817 101: if cached, ok := hc.numberCache.Get(hash); ok {
. . 102: return &cached
. . 103: }
. 1376270828 104: number := rawdb.ReadHeaderNumber(hc.chainDb, hash)
. . 105: if number != nil {
. 554664 106: hc.numberCache.Add(hash, *number)
. . 107: }
. . 108: return number
. . 109:}
. . 110:
. . 111:type headerWriteResult struct {
(pprof) list ReadHeaderNumber
Total: 38197204475
ROUTINE ======================== github.com/ethereum/go-ethereum/core/rawdb.ReadHeaderNumber in github.com/ethereum/go-ethereum/core/rawdb/accessors_chain.go
204606513 1376270828 (flat, cum) 3.60% of Total
. . 146:func ReadHeaderNumber(db ethdb.KeyValueReader, hash common.Hash) *uint64 {
109577863 1281242178 147: data, _ := db.Get(headerNumberKey(hash))
. . 148: if len(data) != 8 {
. . 149: return nil
. . 150: }
95028650 95028650 151: number := binary.BigEndian.Uint64(data)
. . 152: return &number
. . 153:}
. . 154:
. . 155:// WriteHeaderNumber stores the hash->number mapping.
. . 156:func WriteHeaderNumber(db ethdb.KeyValueWriter, hash common.Hash, number uint64) {
```
Opening this to discuss the idea, I know that rawdb.EmptyNumber is not a
great name for the variable, open to suggestions
Downloading from a range was failing because it would return and error
early with an error misinterpreting "start-end".
---------
Co-authored-by: shantichanal <158101918+shantichanal@users.noreply.github.com>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
This changes the era1 downloader to place the files into the correct
location where geth will actually use them. Also adds integration with
the new --datadir.era flag.
This implements a backing store for chain history based on era1 files.
The new store is integrated with the freezer. Queries for blocks and receipts
below the current freezer tail are handled by the era store.
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
Co-authored-by: lightclient <lightclient@protonmail.com>
This adds a geth subcommand for downloading era1 files and placing them into
the correct location. The tool can be used even while geth is already running
on the datadir. Downloads are checked against a hard-coded list of checksums
for mainnet and sepolia.
```
./geth download-era --server $SERVER --block 333333
./geth download-era --server $SERVER --block 333333-444444
./geth download-era --server $SERVER --epoch 0-10
./geth download-era --server $SERVER --all
```
The implementation reuses the file downloader we already had for
fetching build tools. I've done some refactoring on it to make sure it
can support the new use case, and there are some changes to the build
here as well.
I added the history mode configuration in eth/ethconfig initially, since
it seemed like the logical place. But it turns out we need access to the
intended pruning setting at a deeper level, and it actually needs to be
integrated with the blockchain startup procedure.
With this change applied, if a node previously had its history pruned,
and is subsequently restarted **without** the `--history.chain
postmerge` flag, the `BlockChain` initialization code will now verify
the freezer tail against the known pruning point of the predefined
network and will restore pruning status. Note that this logic is quite
restrictive, we allow non-zero tail only for known networks, and only
for the specific pruning point that is defined.
This pull request introduces new sync logic for pruning mode. The downloader will now skip
insertion of block bodies and receipts before the configured history cutoff point.
Originally, in snap sync, the header chain and other components (bodies and receipts) were
inserted separately. However, in Proof-of-Stake, this separation is unnecessary since the
sync target is already verified by the CL.
To simplify the process, this pull request modifies `InsertReceiptChain` to insert headers
along with block bodies and receipts together. Besides, `InsertReceiptChain` doesn't have
the notion of reorg, as the common ancestor is always be found before the sync and extra
side chain is truncated at the beginning if they fall in the ancient store. The stale
canonical chain flags will always be rewritten by the new chain. Explicit reorg logic is
no longer required in `InsertReceiptChain`.
This adds a new subcommand 'geth prune-history' that removes the pre-merge history
on supported networks. Geth is not fully ready to work in this mode, please do not run
this command on your production node.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
When I press Ctrl-C during the import of multiple files, the import
process will still attempt to import the subsequent files. However, in
normal circumstances, users would expect the import to stop immediately
upon pressing Ctrl-C.
And because the current file was not finished importing, subsequent
import tasks often fail due to an `unknown ancestor` error.
---------
Signed-off-by: jsvisa <delweng@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
This pull request refactors the genesis setup function, the major
changes are highlighted here:
**(a) Triedb is opened in verkle mode if `EnableVerkleAtGenesis` is
configured in chainConfig or the database has been initialized previously with
`EnableVerkleAtGenesis` configured**.
A new config field `EnableVerkleAtGenesis` has been added in the
chainConfig. This field must be configured with True if Geth wants to initialize
the genesis in Verkle mode.
In the verkle devnet-7, the verkle transition is activated at genesis.
Therefore, the verkle rules should be used since the genesis. In production
networks (mainnet and public testnets), verkle activation always occurs after
the genesis block. Therefore, this flag is only made for devnet and should be
deprecated later. Besides, verkle transition at non-genesis block hasn't been
implemented yet, it should be done in the following PRs.
**(b) The genesis initialization condition has been simplified**
There is a special mode supported by the Geth is that: Geth can be
initialized with an existing chain segment, which can fasten the node sync
process by retaining the chain freezer folder.
Originally, if the triedb is regarded as uninitialized and the genesis block can
be found in the chain freezer, the genesis block along with genesis state will be
committed. This condition has been simplified to checking the presence of chain
config in key-value store. The existence of chain config can represent the genesis
has been committed.
This PR modifies how the metrics library handles `Enabled`: previously,
the package `init` decided whether to serve real metrics or just
dummy-types.
This has several drawbacks:
- During pkg init, we need to determine whether metrics are enabled or
not. So we first hacked in a check if certain geth-specific
commandline-flags were enabled. Then we added a similar check for
geth-env-vars. Then we almost added a very elaborate check for
toml-config-file, plus toml parsing.
- Using "real" types and dummy types interchangeably means that
everything is hidden behind interfaces. This has a performance penalty,
and also it just adds a lot of code.
This PR removes the interface stuff, uses concrete types, and allows for
the setting of Enabled to happen later. It is still assumed that
`metrics.Enable()` is invoked early on.
The somewhat 'heavy' operations, such as ticking meters and exp-decay,
now checks the enable-flag to prevent resource leak.
The change may be large, but it's mostly pretty trivial, and from the
last time I gutted the metrics, I ensured that we have fairly good test
coverage.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
This pull request introduces a state.Reader interface for state
accessing.
The interface could be implemented in various ways. It can be pure trie
only reader, or the combination of trie and state snapshot. What's more,
this interface allows us to have more flexibility in the future, e.g.
the
archive reader (for accessing archive state).
Additionally, this pull request removes the following metrics
- `chain/snapshot/account/reads`
- `chain/snapshot/storage/reads`
* cmd/geth, ethdb/pebble: polish method naming and code comment
* implement db stat for pebble
* cmd, core, ethdb, internal, trie: remove db property selector
* cmd, core, ethdb: fix function description
---------
Co-authored-by: prpeh <prpeh@proton.me>
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Here we add a Go API for running tracing plugins within the main block import process.
As an advanced user of geth, you can now create a Go file in eth/tracers/live/, and within
that file register your custom tracer implementation. Then recompile geth and select your tracer
on the command line. Hooks defined in the tracer will run whenever a block is processed.
The hook system is defined in package core/tracing. It uses a struct with callbacks, instead of
requiring an interface, for several reasons:
- We plan to keep this API stable long-term. The core/tracing hook API does not depend on
on deep geth internals.
- There are a lot of hooks, and tracers will only need some of them. Using a struct allows you
to implement only the hooks you want to actually use.
All existing tracers in eth/tracers/native have been rewritten to use the new hook system.
This change breaks compatibility with the vm.EVMLogger interface that we used to have.
If you are a user of vm.EVMLogger, please migrate to core/tracing, and sorry for breaking
your stuff. But we just couldn't have both the old and new tracing APIs coexist in the EVM.
---------
Co-authored-by: Matthieu Vachon <matthieu.o.vachon@gmail.com>
Co-authored-by: Delweng <delweng@gmail.com>
Co-authored-by: Martin HS <martin@swende.se>
* all: implement era format, add history importer/export
* internal/era/e2store: refactor e2store to provide ReadAt interface
* internal/era/e2store: export HeaderSize
* internal/era: refactor era to use ReadAt interface
* internal/era: elevate anonymous func to named
* cmd/utils: don't store entire era file in-memory during import / export
* internal/era: better abstraction between era and e2store
* cmd/era: properly close era files
* cmd/era: don't let defers stack
* cmd/geth: add description for import-history
* cmd/utils: better bytes buffer
* internal/era: error if accumulator has more records than max allowed
* internal/era: better doc comment
* internal/era/e2store: rm superfluous reader, rm superfluous testcases, add fuzzer
* internal/era: avoid some repetition
* internal/era: simplify clauses
* internal/era: unexport things
* internal/era,cmd/utils,cmd/era: change to iterator interface for reading era entries
* cmd/utils: better defer handling in history test
* internal/era,cmd: add number method to era iterator to get the current block number
* internal/era/e2store: avoid double allocation during write
* internal/era,cmd/utils: fix lint issues
* internal/era: add ReaderAt func so entry value can be read lazily
Co-authored-by: lightclient <lightclient@protonmail.com>
Co-authored-by: Martin Holst Swende <martin@swende.se>
* internal/era: improve iterator interface
* internal/era: fix rlp decode of header and correctly read total difficulty
* cmd/era: fix rebase errors
* cmd/era: clearer comments
* cmd,internal: fix comment typos
---------
Co-authored-by: Martin Holst Swende <martin@swende.se>
There were several problems related to dumping state.
- If a preimage was missing, even if we had set the `OnlyWithAddresses` to `false`, to export them anyway, the way the mapping was constructed (using `common.Address` as key) made the entries get lost anyway. Concerns both state- and blockchain tests.
- Blockchain test execution was not configured to store preimages.
This changes makes it so that the block test executor takes a callback, just like the state test executor already does. This callback can be used to examine the post-execution state, e.g. to aid debugging of test failures.
Adds a subcommand: `geth snapshot export-preimages`, to export preimages of every hash found during a snapshot enumeration: that is, it exports _only the active state_, and not _all_ preimages that have been used but are no longer part of the state.
This tool is needed for the verkle transition, in order to distribute the preimages needed for the conversion. Since only the 'active' preimages are exported, the output is shrunk from ~70GB to ~4GB.
The order of the output is the order used by the snapshot enumeration, which avoids database thrashing. However, it also means that storage-slot preimages are not deduplicated.
This change allows the creation of a genesis block for verkle testnets. This makes for a chunk of code that is easier to review and still touches many discussion points.
* cmd, core: resolve scheme from a read-write database
* cmd, core, eth: move the scheme check in the ethereum constructor
* cmd/geth: dump should in ro mode
* cmd: reverts
This PR introduces a node scheme abstraction. The interface is only implemented by `hashScheme` at the moment, but will be extended by `pathScheme` very soon.
Apart from that, a few changes are also included which is worth mentioning:
- port the changes in the stacktrie, tracking the path prefix of nodes during commit
- use ethdb.Database for constructing trie.Database. This is not necessary right now, but it is required for path-based used to open reverse diff freezer
This PR fixes a regression causing snapshots not to be generated in "geth --import" mode. It also fixes the geth export command to be truly readonly, and adds a new test for geth export.
`geth dumpgenesis` currently does not respect the content of the data directory. Instead, it outputs the genesis block created by command-line flags. This PR fixes it to read the genesis from the database, if the database already exists.
Co-authored-by: Martin Holst Swende <martin@swende.se>
This change updates our urfave/cli dependency to the v2 branch of the library.
There are some Go API changes in cli v2:
- Flag values can now be accessed using the methods ctx.Bool,
ctx.Int, ctx.String, ... regardless of whether the flag is 'local' or
'global'.
- v2 has built-in support for flag categories. Our home-grown category
system is removed and the categories of flags are assigned as part of
the flag definition.
For users, there is only one observable difference with cli v2: flags must now
strictly appear before regular arguments. For example, the following command is
now invalid:
geth account import mykey.json --password file.txt
Instead, the command must be invoked as follows:
geth account import --password file.txt mykey.json
This PR groups all built-in network flags together and list them in the command as a whole.
And all database path flags(datadir, ancient) are also grouped, since usually these two are
used together.