go-ethereum

mirror of https://github.com/ethereum/go-ethereum.git synced 2026-02-26 15:47:21 +00:00

Author	SHA1	Message	Date
rjl493456442	902ec5baae	cmd, core, eth, triedb/pathdb: track node origins in the path database (#32418 ) This PR is the first step in the trienode history series. It introduces the `nodeWithOrigin` struct in the path database, which tracks the original values of dirty nodes to support trienode history construction. Note, the original value is always empty in this PR, so it won't break the existing journal for encoding and decoding. The compatibility of journal should be handled in the following PR.	2025-09-05 10:37:05 +08:00
rjl493456442	7f78fa6912	triedb/pathdb, core: keep root->id mappings after truncation (#32502 ) This pull request preserves the root->ID mappings in the path database even after the associated state histories are truncated, regardless of whether the truncation occurs at the head or the tail. The motivation is to support an additional history type, trienode history. Since the root->ID mappings are shared between two history instances, they must not be removed by either one. As a consequence, the root->ID mappings remain in the database even after the corresponding histories are pruned. While these mappings may become dangling, it is safe and cheap to keep them. Additionally, this pull request enhances validation during historical reader construction, ensuring that only canonical historical state will be served.	2025-08-29 15:43:58 +08:00
rjl493456442	95ab643bb8	triedb/pathdb: refactor state history write (#32497 ) This pull request refactors the internal implementation in path database a bit, specifically: - purge the state index data in batch - simplify the logic of state history construction and index, make it more readable	2025-08-26 21:53:55 +08:00
rjl493456442	8c58f4920d	triedb/pathdb: rename history to state history (#32498 ) This is a internal refactoring PR, renaming the history to stateHistory. It's a pre-requisite PR for merging trienode history, avoid the name conflict.	2025-08-26 08:52:39 +02:00
rjl493456442	9c5c0e37bf	core/rawdb, triedb/pathdb: implement history indexer (#31156 ) This pull request is part-1 for shipping the core part of archive node in PBSS mode.	2025-06-24 14:36:12 +02:00
rjl493456442	21920207e4	triedb/pathdb, eth: use double-buffer mechanism in pathdb (#30464 ) Some checks are pending / Linux Build (push) Waiting to run Details / Linux Build (arm) (push) Waiting to run Details / Docker Image (push) Waiting to run Details Previously, PathDB used a single buffer to aggregate database writes, which needed to be flushed atomically. However, flushing large amounts of data (e.g., 256MB) caused significant overhead, often blocking the system for around 3 seconds during the flush. To mitigate this overhead and reduce performance spikes, a double-buffer mechanism is introduced. When the active buffer fills up, it is marked as frozen and a background flushing process is triggered. Meanwhile, a new buffer is allocated for incoming writes, allowing operations to continue uninterrupted. This approach reduces system blocking times and provides flexibility in adjusting buffer parameters for improved performance.	2025-06-22 20:40:54 +08:00
rjl493456442	8b9f2d4e36	triedb/pathdb: introduce lookup structure to optimize state access (#30971 ) This pull request introduces a mechanism to improve state lookup efficiency in pathdb by maintaining a lookup structure that eliminates unnecessary iteration over diff layers. The core idea is to track a mutation history for each dirty state entry residing in the diff layers. This history records the state roots of all layers in which the entry was modified, sorted from oldest to newest. During state lookup, this mutation history is queried to find the most recent layer whose state root either matches the target root or is a descendant of it. This allows us to quickly identify the layer containing the relevant data, avoiding the need to iterate through all diff layers from top to bottom. Besides, the overhead for state lookup is constant, no matter how many diff layers are retained in the pathdb, which unlocks the potential to hold more diff layers. Of course, maintaining this lookup structure introduces some overhead. For each state transition, we need to: (a) update the mutation records for the modified state entries, and (b) remove stale mutation records associated with outdated layers. On our benchmark machine, it will introduce around 1ms overhead which is acceptable.	2025-05-28 13:31:42 +02:00
rjl493456442	892a661ee2	core, triedb/pathdb: final integration (snapshot integration pt 5) (#30661 ) In this pull request, snapshot generation in pathdb has been ported from the legacy state snapshot implementation. Additionally, when running in path mode, legacy state snapshot data is now managed by the pathdb based snapshot logic. Note: Existing snapshot data will be re-generated, regardless of whether it was previously fully constructed.	2025-05-16 18:29:38 +08:00
Marius van der Wijden	0eb2eeea90	all: create global hasher pool (#31769 ) This PR creates a global hasher pool that can be used by all packages. It also removes a bunch of the package local pools. It also updates a few locations to use available hashers or the global hashing pool to reduce allocations all over the codebase. This change should reduce global allocation count by ~1% --------- Co-authored-by: Gary Rong <garyrong0905@gmail.com>	2025-05-09 13:52:40 +08:00
rjl493456442	0f48cbf017	core, triedb/pathdb: bail out error if write state history fails (#31781 ) This PR fixes an issue that could lead to data corruption. Writing the state history may fail due to insufficient disk space or other potential errors. With this change, the entire state insertion will be aborted instead of silently ignoring the error. Without this fix, state transitions would continue while the associated state history is lost. After a restart, the resulting gap would be detected, making recovery impossible.	2025-05-08 22:27:01 +08:00
rjl493456442	a840e9b59f	triedb/pathdb: fix state revert on v2 history (#31060 ) State history v2 has been shipped and will take effect after the Cancun fork. However, the state revert function does not differentiate between v1 and v2, instead blindly using the storage map key for state reversion. This mismatch between the keys of the live state set and the state history can trigger a panic: `non-existent storage slot for reverting`. This flaw has been fixed in this PR.	2025-01-22 14:06:36 +01:00
rjl493456442	a7f9523ae1	all: implement state history v2 (#30107 ) This pull request delivers the new version of the state history, where the raw storage key is used instead of the hash. Before the cancun fork, it's supported by protocol to destruct a specific account and therefore, all the storage slot owned by it should be wiped in the same transition. Technically, storage wiping should be performed through storage iteration, and only the storage key hash will be available for traversal if the state snapshot is not available. Therefore, the storage key hash is chosen as the identifier in the old version state history. Fortunately, account self-destruction has been deprecated by the protocol since the Cancun fork, and there are no empty accounts eligible for deletion under EIP-158. Therefore, we can conclude that no storage wiping should occur after the Cancun fork. In this case, it makes no sense to keep using hash. Besides, another big reason for making this change is the current format state history is unusable if verkle is activated. Verkle tree has a different key derivation scheme (merkle uses keccak256), the preimage of key hash must be provided in order to make verkle rollback functional. This pull request is a prerequisite for landing verkle. Additionally, the raw storage key is more human-friendly for those who want to manually check the history, even though Solidity already performs some hashing to derive the storage location. --- This pull request doesn't bump the database version, as I believe the database should still be compatible if users degrade from the new geth version to old one, the only side effect is the persistent new version state history will be unusable. --------- Co-authored-by: Zsolt Felfoldi <zsfelfoldi@gmail.com>	2025-01-17 02:59:02 +01:00
rjl493456442	05148d972c	triedb/pathdb: track flat state changes in pathdb (snapshot integration pt 2) (#30643 ) This pull request ports some changes from the main state snapshot integration one, specifically introducing the flat state tracking in pathdb. Note, the tracked flat state changes are only held in memory and won't be persisted in the disk. Meanwhile, the correspoding state retrieval in persistent state is also not supported yet. The states management in disk is more complicated and will be implemented in a separate pull request. Part 1: https://github.com/ethereum/go-ethereum/pull/30752	2024-11-29 19:30:45 +08:00
rjl493456442	b6c62d5887	core, trie, triedb: minor changes from snapshot integration (#30599 ) This change ports some non-important changes from https://github.com/ethereum/go-ethereum/pull/30159, including interface renaming and some trivial refactorings.	2024-10-18 17:06:31 +02:00
rjl493456442	eff0bed91b	core/rawdb: freezer index repair (#29792 ) This pull request removes the `fsync` of index files in freezer.ModifyAncients function for performance gain. Originally, fsync is added after each freezer write operation to ensure the written data is truly transferred into disk. Unfortunately, it turns out `fsync` can be relatively slow, especially on macOS (see https://github.com/ethereum/go-ethereum/issues/28754 for more information). In this pull request, fsync for index file is removed as it turns out index file can be recovered even after a unclean shutdown. But fsync for data file is still kept, as we have no meaningful way to validate the data correctness after unclean shutdown. --- But why do we need the `fsync` in the first place? As it's necessary for freezer to survive/recover after the machine crash (e.g. power failure). In linux, whenever the file write is performed, the file metadata update and data update are not necessarily performed at the same time. Typically, the metadata will be flushed/journalled ahead of the file data. Therefore, we make the pessimistic assumption that the file is first extended with invalid "garbage" data (normally zero bytes) and that afterwards the correct data replaces the garbage. We have observed that the index file of the freezer often contain garbage entry with zero value (filenumber = 0, offset = 0) after a machine power failure. It proves that the index file is extended without the data being flushed. And this corruption can destroy the whole freezer data eventually. Performing fsync after each write operation can reduce the time window for data to be transferred to the disk and ensure the correctness of the data in the disk to the greatest extent. --- How can we maintain this guarantee without relying on fsync? Because the items in the index file are strictly in order, we can leverage this characteristic to detect the corruption and truncate them when freezer is opened. Specifically these validation rules are performed for each index file: For two consecutive index items: - If their file numbers are the same, then the offset of the latter one MUST not be less than that of the former. - If the file number of the latter one is equal to that of the former plus one, then the offset of the latter one MUST not be 0. - If their file numbers are not equal, and the latter's file number is not equal to the former plus 1, the latter one is valid And also, for the first non-head item, it must refer to the earliest data file, or the next file if the earliest file is not sufficient to place the first item(very special case, only theoretical possible in tests) With these validation rules, we can detect the invalid item in index file with greatest possibility. --- But unfortunately, these scenarios are not covered and could still lead to a freezer corruption if it occurs: All items in index file are in zero value It's impossible to distinguish if they are truly zero (e.g. all the data entries maintained in freezer are zero size) or just the garbage left by OS. In this case, these index items will be kept by truncating the entire data file, namely the freezer is corrupted. However, we can consider that the probability of this situation occurring is quite low, and even if it occurs, the freezer can be considered to be close to an empty state. Rerun the state sync should be acceptable. Index file is integral while relative data file is corrupted It might be possible the data file is corrupted whose file size is extended correctly with garbage filled (e.g. zero bytes). In this case, it's impossible to detect the corruption by index validation. We can either choose to `fsync` the data file, or blindly believe that if index file is integral then the data file could be integral with very high chance. In this pull request, the first option is taken.	2024-10-01 18:16:16 +02:00
rjl493456442	bfda8ae0c6	core: add metrics for state access (#30353 ) This pull request adds a few more performance metrics, specifically: - The average time cost of an account read - The average time cost of a storage read - The rate of account reads - The rate of storage reads	2024-08-26 20:02:10 +08:00
rjl493456442	045b9718d5	trie: relocate state execution logic into pathdb package (#29861 )	2024-06-27 20:30:39 +08:00
rjl493456442	9f96e07c1c	core/rawdb, trie: improve db APIs for accessing trie nodes (#29362 ) * core/rawdb, trie: improve db APIs for accessing trie nodes * triedb/pathdb: fix	2024-04-30 16:25:35 +02:00
Guillaume Ballet	da7469e5c4	core: add an end-to-end verkle test (#29262 ) core: add a simple verkle test triedb, core: skip hash comparison in verkle core: remove legacy daoFork logic in verkle chain maker fix: nil pointer in tests triedb/pathdb: add blob hex core: less defensive Co-authored-by: Ignacio Hagopian <jsign.uy@gmail.com> Co-authored-by: Martin HS <martin@swende.se> Co-authored-by: Gary Rong <garyrong0905@gmail.com>	2024-03-26 21:25:41 +01:00
Aaron Chen	723b1e36ad	all: fix mismatched names in comments (#29348 ) * all: fix mismatched names in comments * metrics: fix mismatched name in UpdateIfGt	2024-03-26 21:01:28 +01:00
rjl493456442	7b81cf6362	core/state, trie/triedb/pathdb: remove storage incomplete flag (#28940 ) As SELF-DESTRUCT opcode is disabled in the cancun fork(unless the account is created within the same transaction, nothing to delete in this case). The account will only be deleted in the following cases: - The account is created within the same transaction. In this case the original storage was empty. - The account is empty(zero nonce, zero balance, zero code) and is touched within the transaction. Fortunately this kind of accounts are not-existent on ethereum-mainnet. All in all, after cancun, we are pretty sure there is no large contract deletion and we don't need this mechanism for oom protection.	2024-03-05 14:31:55 +01:00
rjl493456442	fe91d476ba	all: remove the dependency from trie to triedb (#28824 ) This change removes the dependency from trie package to triedb package.	2024-02-13 14:49:53 +01:00

22 commits