go-ethereum/core/vm/stack.go
Jonny Rhea 6e62cc5aa8
core/vm: compute stack operations in place (#35156)
The stack primitives pop by value: pop() returns the 32-byte value
itself, so every popped operand is copied out of the stack arena before
it is used. The result side was already in place, peek returns a pointer
and binary ops write into the new stack top. This PR fixes the operand
side: pointer-returning primitives (popPtr, popPtrPeek, etc), with the
handlers rewritten to read operands directly from their arena slots.
Every popped operand paid the copy, whatever the op went on to do with
it, so this optimization covers the arithmetic and comparison ops as
much as JUMP, MSTORE, SSTORE and RETURN.

The copy is visible in the assembly. On arm64, master's opLt spends four
instructions moving the popped value through the frame, and the
comparison then reads it back from there:

LDP (R5), (R6, R7) ; load words 0 and 1 of the popped value from the
arena
    LDP  16(R5), (R5, R8)            ; load words 2 and 3
STP (R6, R7), vm.~r0-64(SP) ; store words 0 and 1 into a frame slot
    STP  (R5, R8), vm.~r0-48(SP)     ; store words 2 and 3

With popPtrPeek those four instructions are gone, the frame shrinks from
locals=0x58 to locals=0x18, and the function from 336 to 288 bytes. The
compiler cannot remove the copy itself: uint256.Int is a four-element
array, and Go's SSA does not promote arrays longer than one element to
registers, so a by-value pop pays this round trip no matter how far
inlining gets, for LT exactly as for ADD.

The CALL and CREATE families are deliberately not converted: a child
frame reuses the same stack arena, so parent pointers into popped slots
die when the child pushes. The rule is recorded on the primitives:
pointers stay valid until the next push or any sub call. Converting the
call family safely means materializing scalars before the child call,
left for later work with a call-heavy benchmark to justify it.

### Benchmarks

Measured with the benchmark suite from #35144 (the evm-bench contract
workloads and the block import benchmark), which is not part of this
PR's diff. Apple M4 Max, fixed iteration counts, n=10, all p=0.000. B/op
and allocs/op are statistically identical on every benchmark:

| benchmark | master | PR | vs master |
|---|---|---|---|
| Snailtracer | 60.0 ms | 54.1 ms | -9.8% |
| TenThousandHashes | 13.2 ms | 12.2 ms | -7.8% |
| ERC20Transfer | 11.7 ms | 11.0 ms | -5.5% |
| ERC20Mint | 7.49 ms | 7.02 ms | -6.2% |
| ERC20ApprovalTransfer | 8.92 ms | 8.44 ms | -5.4% |

This PR is independent of #35144 but plays nicely with it: the generated
dispatch there splices these handler bodies, so the in-place forms land
in its fast path too, where they measure larger.

### Testing

The rewritten handlers run on the interpreter's only execution path, so
correctness rests on references outside the change:

- **Consensus fixtures.** The full tests package passes: state tests,
the execution-spec families, blockchain tests.
- **Opcode testcases.** The JSON testcases compare individual opcode
results against committed expected values.
- **Tracer fixtures.** The tracetest reference files pin exact log and
return data shapes, covering the rewritten LOG and RETURN paths.
- **Cross-build differential.** A goevmlab campaign running this
branch's evm against master's evm over generated state tests across four
forks (Prague, Cancun, London, Osaka) with full trace comparison:
160,566 tests, zero divergences.

---------

Co-authored-by: MariusVanDerWijden <m.vanderwijden@live.de>
2026-06-16 07:47:05 -05:00

242 lines
8.1 KiB
Go

// Copyright 2014 The go-ethereum Authors
// This file is part of the go-ethereum library.
//
// The go-ethereum library is free software: you can redistribute it and/or modify
// it under the terms of the GNU Lesser General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// The go-ethereum library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public License
// along with the go-ethereum library. If not, see <http://www.gnu.org/licenses/>.
package vm
import (
"slices"
"sync"
"github.com/holiman/uint256"
)
// stackArena is an arena which actual evm stacks use for data storage
type stackArena struct {
data []uint256.Int
top int // first free slot
}
func newArena() *stackArena {
return stackPool.Get().(*stackArena)
}
// 1025, because in stack() there is a condition check
// for the stack size that would fail if it was set to
// 1024.
const initialStackSize = 1025
var stackPool = sync.Pool{
New: func() any {
return &stackArena{
data: make([]uint256.Int, initialStackSize),
}
},
}
func returnStack(arena *stackArena) {
arena.top = 0 // defensive, not strictly needed as s.inner.top = s.bottom in release()
stackPool.Put(arena)
}
// stack returns an instance of a stack which uses the underlying arena. The instance
// must be released by invoking the (*Stack).release() method
func (sa *stackArena) stack() *Stack {
// make sure every substack has at least 1024 elements
if len(sa.data) <= sa.top+1024 {
// we need to grow the arena
sa.data = slices.Grow(sa.data, 1024)
sa.data = sa.data[:cap(sa.data)]
}
return &Stack{
bottom: sa.top,
size: 0,
inner: sa,
}
}
// newStackForTesting is meant to be used solely for testing. It creates a stack
// backed by a newly allocated arena.
func newStackForTesting() *Stack {
arena := &stackArena{
data: make([]uint256.Int, 1025),
}
return arena.stack()
}
// Stack is an object for basic stack operations. Items popped to the stack are
// expected to be changed and modified. stack does not take care of adding newly
// initialized objects.
type Stack struct {
bottom int // bottom is the index of the first element of this stack
size int // size is the number of elements in this stack
inner *stackArena
}
// release un-claims the area of the arena which was claimed by the stack.
func (s *Stack) release() {
// When the stack is returned, need to notify the arena that the new 'top' is
// the returned stack's bottom.
s.inner.top = s.bottom
}
// Data returns the underlying uint256.Int array.
func (s *Stack) Data() []uint256.Int {
return s.inner.data[s.bottom : s.bottom+s.size]
}
func (s *Stack) push(d *uint256.Int) {
elem := s.get()
*elem = *d
}
// get returns a pointer to a newly created element
// on top of the stack
func (s *Stack) get() *uint256.Int {
elem := &s.inner.data[s.inner.top]
s.inner.top++
s.size++
return elem
}
func (s *Stack) pop() uint256.Int {
s.inner.top--
s.size--
return s.inner.data[s.inner.top]
}
func (s *Stack) len() int {
return s.size
}
// drop removes the top element without reading it.
func (s *Stack) drop() {
s.inner.top--
s.size--
}
// pop1 removes the top element and returns a pointer to it. The pointer
// stays valid only until the next push or sub call.
func (s *Stack) pop1() *uint256.Int {
s.inner.top--
s.size--
return &s.inner.data[s.inner.top]
}
// pop2 removes the top two elements and returns pointers to them. The
// pointers stay valid only until the next push or sub call.
func (s *Stack) pop2() (top, second *uint256.Int) {
s.inner.top -= 2
s.size -= 2
return &s.inner.data[s.inner.top+1], &s.inner.data[s.inner.top]
}
// pop3 removes the top three elements and returns pointers to them. The
// pointers stay valid only until the next push or sub call.
func (s *Stack) pop3() (top, second, third *uint256.Int) {
s.inner.top -= 3
s.size -= 3
return &s.inner.data[s.inner.top+2], &s.inner.data[s.inner.top+1], &s.inner.data[s.inner.top]
}
// pop4 removes the top four elements and returns pointers to them. The
// pointers stay valid only until the next push or sub call.
func (s *Stack) pop4() (top, second, third, fourth *uint256.Int) {
s.inner.top -= 4
s.size -= 4
return &s.inner.data[s.inner.top+3], &s.inner.data[s.inner.top+2], &s.inner.data[s.inner.top+1], &s.inner.data[s.inner.top]
}
// pop1Peek1 removes the top element and returns pointers to it and to the new
// top, the usual operand and write target of a binary operation. The first
// pointer stays valid only until the next push or sub call.
func (s *Stack) pop1Peek1() (top, rest *uint256.Int) {
s.inner.top--
s.size--
return &s.inner.data[s.inner.top], &s.inner.data[s.inner.top-1]
}
// pop2Peek1 removes the top two elements and returns pointers to them and to
// the new top, for three operand operations. The first two pointers stay
// valid only until the next push or sub call.
func (s *Stack) pop2Peek1() (top, second, rest *uint256.Int) {
s.inner.top -= 2
s.size -= 2
return &s.inner.data[s.inner.top+1], &s.inner.data[s.inner.top], &s.inner.data[s.inner.top-1]
}
func (s *Stack) swap1() {
s.inner.data[s.bottom+s.size-2], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-2]
}
func (s *Stack) swap2() {
s.inner.data[s.bottom+s.size-3], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-3]
}
func (s *Stack) swap3() {
s.inner.data[s.bottom+s.size-4], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-4]
}
func (s *Stack) swap4() {
s.inner.data[s.bottom+s.size-5], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-5]
}
func (s *Stack) swap5() {
s.inner.data[s.bottom+s.size-6], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-6]
}
func (s *Stack) swap6() {
s.inner.data[s.bottom+s.size-7], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-7]
}
func (s *Stack) swap7() {
s.inner.data[s.bottom+s.size-8], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-8]
}
func (s *Stack) swap8() {
s.inner.data[s.bottom+s.size-9], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-9]
}
func (s *Stack) swap9() {
s.inner.data[s.bottom+s.size-10], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-10]
}
func (s *Stack) swap10() {
s.inner.data[s.bottom+s.size-11], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-11]
}
func (s *Stack) swap11() {
s.inner.data[s.bottom+s.size-12], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-12]
}
func (s *Stack) swap12() {
s.inner.data[s.bottom+s.size-13], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-13]
}
func (s *Stack) swap13() {
s.inner.data[s.bottom+s.size-14], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-14]
}
func (s *Stack) swap14() {
s.inner.data[s.bottom+s.size-15], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-15]
}
func (s *Stack) swap15() {
s.inner.data[s.bottom+s.size-16], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-16]
}
func (s *Stack) swap16() {
s.inner.data[s.bottom+s.size-17], s.inner.data[s.bottom+s.size-1] = s.inner.data[s.bottom+s.size-1], s.inner.data[s.bottom+s.size-17]
}
func (s *Stack) dup(n int) {
s.inner.data[s.bottom+s.size] = s.inner.data[s.bottom+s.size-n]
s.size++
s.inner.top++
}
func (s *Stack) peek() *uint256.Int {
return &s.inner.data[s.bottom+s.size-1]
}
// back returns the n'th item in stack
func (s *Stack) back(n int) *uint256.Int {
return &s.inner.data[s.bottom+s.size-n-1]
}