internal/telemetry: fix undersized span queue causing dropped spans (#33927)

The BatchSpanProcessor queue size was incorrectly set to
DefaultMaxExportBatchSize (512) instead of DefaultMaxQueueSize (2048).

I noticed the issue on bloatnet when analyzing the block building
traces. During a particular run, the miner was including 1000
transactions in a single block. When telemetry is enabled, the miner
creates a span for each transaction added to the block. With the queue
capped at 512, spans were silently dropped when production outpaced the
span export, resulting in incomplete traces with orphaned spans. While
this doesn't eliminate the possibility of drops under extreme
load, using the correct default restores the 4x buffer between queue
capacity and export batch size that the SDK was designed around.
This commit is contained in:
Jonny Rhea 2026-03-04 04:47:10 -06:00 committed by GitHub
parent 28dad943f6
commit 402c71f2e2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -113,7 +113,7 @@ func SetupTelemetry(cfg node.OpenTelemetryConfig, stack *node.Node) error {
// Define batch span processor options
batchOpts := []sdktrace.BatchSpanProcessorOption{
// The maximum number of spans that can be queued before dropping
sdktrace.WithMaxQueueSize(sdktrace.DefaultMaxExportBatchSize),
sdktrace.WithMaxQueueSize(sdktrace.DefaultMaxQueueSize),
// The maximum number of spans to export in a single batch
sdktrace.WithMaxExportBatchSize(sdktrace.DefaultMaxExportBatchSize),
// How long an export operation can take before timing out