`TestTracingHTTPTimeout` still flakes in CI after #35101, failing at the
POST:
--- FAIL: TestTracingHTTPTimeout (0.26s)
tracing_test.go:633: request: Post "http://127.0.0.1:43497": EOF
The test sets a short server `WriteTimeout` and posts a blocking call.
`ContextRequestTimeout` leaves a fixed 100ms for the server to write its
timeout response before the HTTP write deadline cuts the connection.
I can't repro it locally, but my theory is that under load that write
can miss the window, so the connection is dropped and the client POST
returns `EOF`, failing the test before it inspects the span. This is the
only test exposed to it because it is the only one that configures a
`WriteTimeout`.
The EOF is benign: the server sets the timeout error on the SERVER span
before attempting the write, independent of whether the client receives
the response. Since that span status is all the test asserts,
`tryPostJSONRPC` tolerates the transport error instead of failing on it.
This adds a client option to configure trace context propagation via the
`traceparent` HTTP header.
I'm adding this so that prysm can enable distributed tracing on their
engine API client.
The per-call SERVER span ended inside `handleCall()`, so the JSON-RPC
response write happened after the span closed. For large responses like
`engine_getBlobsV*`, that write time was missing from traces.
- Extend the SERVER span past `writeJSON`.
- For batches, add a top-level `jsonrpc.batch` SERVER span (with `rpc.batch.size`) covering the whole batch including `callBuffer.write`.
- Add `rpc.writeJSON` span around the non-batch response write.
- Add `rpc.writeJSONBatch` span around the batch response write.
- Add `rpc.httpWrite` span around the actual HTTP write, separating JSON encoding from network write.
- Add additional telemetry helpers.
---------
Co-authored-by: Felix Lange <fjl@twurst.com>
The endSpan closure accepted error by value, meaning deferred calls like
defer spanEnd(err) captured the error at defer-time (always nil), not at
function-return time. This meant errors were never recorded on spans.
- Changed endSpan to accept *error
- Updated all call sites in rpc/handler.go to pass error pointers, and
adjusted handleCall to avoid propagating child-span errors to the parent
- Added TestTracingHTTPErrorRecording to verify that errors from RPC
methods are properly recorded on the rpc.runMethod span
This PR adds support for the extraction of OpenTelemetry trace context
from incoming JSON-RPC request headers, allowing geth spans to be linked
to upstream traces when present.
---------
Co-authored-by: lightclient <lightclient@protonmail.com>
Add Open Telemetry tracing inside the RPC server to help attribute runtime costs within `handler.handleCall()`. In particular, it allows us to distinguish time spent decoding arguments, invoking methods via reflection, and actually executing the method and constructing/encoding JSON responses.
---------
Co-authored-by: lightclient <lightclient@protonmail.com>