Martin HS
|
057667151b
|
core/types, trie: reduce allocations in derivesha (#30747)
/ Linux Build (push) Waiting to run
/ Linux Build (arm) (push) Waiting to run
/ Windows Build (push) Waiting to run
/ Docker Image (push) Waiting to run
Alternative to #30746, potential follow-up to #30743 . This PR makes the
stacktrie always copy incoming value buffers, and reuse them internally.
Improvement in #30743:
```
goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/types
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
│ derivesha.1 │ derivesha.2 │
│ sec/op │ sec/op vs base │
DeriveSha200/stack_trie-8 477.8µ ± 2% 430.0µ ± 12% -10.00% (p=0.000 n=10)
│ derivesha.1 │ derivesha.2 │
│ B/op │ B/op vs base │
DeriveSha200/stack_trie-8 45.17Ki ± 0% 25.65Ki ± 0% -43.21% (p=0.000 n=10)
│ derivesha.1 │ derivesha.2 │
│ allocs/op │ allocs/op vs base │
DeriveSha200/stack_trie-8 1259.0 ± 0% 232.0 ± 0% -81.57% (p=0.000 n=10)
```
This PR further enhances that:
```
goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/types
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
│ derivesha.2 │ derivesha.3 │
│ sec/op │ sec/op vs base │
DeriveSha200/stack_trie-8 430.0µ ± 12% 423.6µ ± 13% ~ (p=0.739 n=10)
│ derivesha.2 │ derivesha.3 │
│ B/op │ B/op vs base │
DeriveSha200/stack_trie-8 25.654Ki ± 0% 4.960Ki ± 0% -80.67% (p=0.000 n=10)
│ derivesha.2 │ derivesha.3 │
│ allocs/op │ allocs/op vs base │
DeriveSha200/stack_trie-8 232.00 ± 0% 37.00 ± 0% -84.05% (p=0.000 n=10)
```
So the total derivesha-improvement over *both PRS* is:
```
goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/types
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
│ derivesha.1 │ derivesha.3 │
│ sec/op │ sec/op vs base │
DeriveSha200/stack_trie-8 477.8µ ± 2% 423.6µ ± 13% -11.33% (p=0.015 n=10)
│ derivesha.1 │ derivesha.3 │
│ B/op │ B/op vs base │
DeriveSha200/stack_trie-8 45.171Ki ± 0% 4.960Ki ± 0% -89.02% (p=0.000 n=10)
│ derivesha.1 │ derivesha.3 │
│ allocs/op │ allocs/op vs base │
DeriveSha200/stack_trie-8 1259.00 ± 0% 37.00 ± 0% -97.06% (p=0.000 n=10)
```
Since this PR always copies the incoming value, it adds a little bit of
a penalty on the previous insert-benchmark, which copied nothing (always
passed the same empty slice as input) :
```
goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/trie
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
│ stacktrie.7 │ stacktrie.10 │
│ sec/op │ sec/op vs base │
Insert100K-8 88.21m ± 34% 92.37m ± 31% ~ (p=0.280 n=10)
│ stacktrie.7 │ stacktrie.10 │
│ B/op │ B/op vs base │
Insert100K-8 3.424Ki ± 3% 4.581Ki ± 3% +33.80% (p=0.000 n=10)
│ stacktrie.7 │ stacktrie.10 │
│ allocs/op │ allocs/op vs base │
Insert100K-8 22.00 ± 5% 26.00 ± 4% +18.18% (p=0.000 n=10)
```
---------
Co-authored-by: Gary Rong <garyrong0905@gmail.com>
Co-authored-by: Felix Lange <fjl@twurst.com>
|
2025-10-01 10:05:49 +02:00 |
|