The stack primitives pop by value: pop() returns the 32-byte value
itself, so every popped operand is copied out of the stack arena before
it is used. The result side was already in place, peek returns a pointer
and binary ops write into the new stack top. This PR fixes the operand
side: pointer-returning primitives (popPtr, popPtrPeek, etc), with the
handlers rewritten to read operands directly from their arena slots.
Every popped operand paid the copy, whatever the op went on to do with
it, so this optimization covers the arithmetic and comparison ops as
much as JUMP, MSTORE, SSTORE and RETURN.
The copy is visible in the assembly. On arm64, master's opLt spends four
instructions moving the popped value through the frame, and the
comparison then reads it back from there:
LDP (R5), (R6, R7) ; load words 0 and 1 of the popped value from the
arena
LDP 16(R5), (R5, R8) ; load words 2 and 3
STP (R6, R7), vm.~r0-64(SP) ; store words 0 and 1 into a frame slot
STP (R5, R8), vm.~r0-48(SP) ; store words 2 and 3
With popPtrPeek those four instructions are gone, the frame shrinks from
locals=0x58 to locals=0x18, and the function from 336 to 288 bytes. The
compiler cannot remove the copy itself: uint256.Int is a four-element
array, and Go's SSA does not promote arrays longer than one element to
registers, so a by-value pop pays this round trip no matter how far
inlining gets, for LT exactly as for ADD.
The CALL and CREATE families are deliberately not converted: a child
frame reuses the same stack arena, so parent pointers into popped slots
die when the child pushes. The rule is recorded on the primitives:
pointers stay valid until the next push or any sub call. Converting the
call family safely means materializing scalars before the child call,
left for later work with a call-heavy benchmark to justify it.
### Benchmarks
Measured with the benchmark suite from #35144 (the evm-bench contract
workloads and the block import benchmark), which is not part of this
PR's diff. Apple M4 Max, fixed iteration counts, n=10, all p=0.000. B/op
and allocs/op are statistically identical on every benchmark:
| benchmark | master | PR | vs master |
|---|---|---|---|
| Snailtracer | 60.0 ms | 54.1 ms | -9.8% |
| TenThousandHashes | 13.2 ms | 12.2 ms | -7.8% |
| ERC20Transfer | 11.7 ms | 11.0 ms | -5.5% |
| ERC20Mint | 7.49 ms | 7.02 ms | -6.2% |
| ERC20ApprovalTransfer | 8.92 ms | 8.44 ms | -5.4% |
This PR is independent of #35144 but plays nicely with it: the generated
dispatch there splices these handler bodies, so the in-place forms land
in its fast path too, where they measure larger.
### Testing
The rewritten handlers run on the interpreter's only execution path, so
correctness rests on references outside the change:
- **Consensus fixtures.** The full tests package passes: state tests,
the execution-spec families, blockchain tests.
- **Opcode testcases.** The JSON testcases compare individual opcode
results against committed expected values.
- **Tracer fixtures.** The tracetest reference files pin exact log and
return data shapes, covering the rewritten LOG and RETURN paths.
- **Cross-build differential.** A goevmlab campaign running this
branch's evm against master's evm over generated state tests across four
forks (Prague, Cancun, London, Osaka) with full trace comparison:
160,566 tests, zero divergences.
---------
Co-authored-by: MariusVanDerWijden <m.vanderwijden@live.de>
Here, we change the EVM stack implementation to use an 'arena', i.e.
a shared allocation pool for sub-call stacks. The stack is now more
GC-friendly, since it is a slice of uint256 values instead of a slice of pointers.
Code that pushes an item to the stack has been changed to get() the top
item, then overwrite it.
The PR is a rewrite/rebase of #30362.
---------
Co-authored-by: Martin Holst Swende <martin@swende.se>
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
This change introduces 2 new optional methods; `enter()` and `exit()` for js tracers, and makes `step()` optiona. The two new methods are invoked when entering and exiting a call frame (but not invoked for the outermost scope, which has it's own methods). Currently these are the data fields passed to each of them:
enter: type (opcode), from, to, input, gas, value
exit: output, gasUsed, error
The PR also comes with a re-write of the callTracer. As a backup we keep the previous tracing script under the name `callTracerLegacy`. Behaviour of both tracers are equivalent for the most part, although there are some small differences (improvements), where the new tracer is more correct / has more information.
* core/vm: use fixed uint256 library instead of big
* core/vm: remove intpools
* core/vm: upgrade uint256, fixes uint256.NewFromBig
* core/vm: use uint256.Int by value in Stack
* core/vm: upgrade uint256 to v1.0.0
* core/vm: don't preallocate space for 1024 stack items (only 16)
Co-authored-by: Martin Holst Swende <martin@swende.se>
The run loop, which previously contained custom opcode executes have been
removed and has been simplified to a few checks.
Each operation consists of 4 elements: execution function, gas cost function,
stack validation function and memory size function. The execution function
implements the operation's runtime behaviour, the gas cost function implements
the operation gas costs function and greatly depends on the memory and stack,
the stack validation function validates the stack and makes sure that enough
items can be popped off and pushed on and the memory size function calculates
the memory required for the operation and returns it.
This commit also allows the EVM to go unmetered. This is helpful for offline
operations such as contract calls.
This CL makes several refactors:
- Define a Tracer interface, implementing the `CaptureState` method
- Add the VM environment as the first argument of
`Tracer.CaptureState`
- Rename existing functionality `StructLogger` an make it an
implementation of `Tracer`
- Delete `StructLogCollector` and make `StructLogger` collect the logs
directly
- Change all callers to use the new `StructLogger` where necessary and
extract logs from that.
- Deletes the apparently obsolete and likely nonfunctional 'TraceCall'
from the eth API.
Callers that only wish accumulated logs can use the `StructLogger`
implementation straightforwardly. Callers that wish to efficiently
capture VM traces and operate on them without excessive copying can now
implement the `Tracer` interface to receive VM state at each step and
do with it as they wish.
This CL also removes the accumulation of logs from the vm.Environment;
this was necessary as part of the refactor, but also simplifies it by
removing a responsibility that doesn't directly belong to the
Environment.
Reduced big int allocation by making stack items modifiable. Instead of
adding items such as `common.Big0` to the stack, `new(big.Int)` is
added instead. One must expect that any item that is added to the stack
might change.
* Add params package with exported variables generated from
github.com/ethereum/common/blob/master/params.json
* Use params package variables in applicable places
* Add check for minimum gas limit in validation of block's gas limit
* Remove common/params.json from go-ethereum to avoid
outdated version of it