The BatchSpanProcessor queue size was incorrectly set to
DefaultMaxExportBatchSize (512) instead of DefaultMaxQueueSize (2048).
I noticed the issue on bloatnet when analyzing the block building
traces. During a particular run, the miner was including 1000
transactions in a single block. When telemetry is enabled, the miner
creates a span for each transaction added to the block. With the queue
capped at 512, spans were silently dropped when production outpaced the
span export, resulting in incomplete traces with orphaned spans. While
this doesn't eliminate the possibility of drops under extreme
load, using the correct default restores the 4x buffer between queue
capacity and export batch size that the SDK was designed around.
This PR adds OpenTelemetry tracing configuration to geth via
command-line flags. When enabled, geth initializes the global
OpenTelemetry TracerProvider and installs standard trace context
propagation. When disabled (the default), tracing remains a no-op and
behavior is unchanged.
Co-authored-by: Felix Lange <fjl@twurst.com>
The endSpan closure accepted error by value, meaning deferred calls like
defer spanEnd(err) captured the error at defer-time (always nil), not at
function-return time. This meant errors were never recorded on spans.
- Changed endSpan to accept *error
- Updated all call sites in rpc/handler.go to pass error pointers, and
adjusted handleCall to avoid propagating child-span errors to the parent
- Added TestTracingHTTPErrorRecording to verify that errors from RPC
methods are properly recorded on the rpc.runMethod span
Add Open Telemetry tracing inside the RPC server to help attribute runtime costs within `handler.handleCall()`. In particular, it allows us to distinguish time spent decoding arguments, invoking methods via reflection, and actually executing the method and constructing/encoding JSON responses.
---------
Co-authored-by: lightclient <lightclient@protonmail.com>