Commit graph

12344 commits

Author SHA1 Message Date
Daniel Liu
1fe46b8d3f metrics/librato: rename receiver re to rep 2024-12-13 14:00:13 +08:00
Daniel Liu
35380508fc metrics, cmd/geth: informational metrics (prometheus, influxdb, opentsb) (#24877)
This chang creates a GaugeInfo metrics type for registering informational (textual) metrics, e.g. geth version number. It also improves the testing for backend-exporters, and uses a shared subpackage in 'internal' to provide sample datasets and ordered registry.

Implements #21783

---------

Co-authored-by: Martin Holst Swende <martin@swende.se>
2024-12-13 14:00:13 +08:00
Daniel Liu
d43f6c0cb0 metrics: NilResettingTimer.Time should execute the timed function (#27724) 2024-12-13 14:00:13 +08:00
Daniel Liu
51e49a869d metrics: NilTimer should still run the function to be timed (#27723) 2024-12-13 14:00:13 +08:00
Daniel Liu
e524043a3b metrics: use slices package for sorting (#27493 #27909)
Co-authored-by: Felix Lange <fjl@twurst.com>
2024-12-13 14:00:13 +08:00
Daniel Liu
7d40ca6f35 metrics: use sync.map in registry (#27159) 2024-12-13 14:00:13 +08:00
Daniel Liu
7015a9e1cc metrics: use atomic type (#27121) 2024-12-13 14:00:13 +08:00
Daniel Liu
aa5e3e2c08 metrics: make gauge_float64 and counter_float64 lock free (#27025)
Makes the float-gauges lock-free

name                      old time/op  new time/op  delta
CounterFloat64Parallel-8  1.45µs ±10%  0.85µs ± 6%  -41.65%  (p=0.008 n=5+5)

---------

Co-authored-by: Exca-DK <dev@DESKTOP-RI45P4J.localdomain>
Co-authored-by: Martin Holst Swende <martin@swende.se>
2024-12-13 14:00:13 +08:00
Daniel Liu
54570300cc all: ensure resp.body closed (#26969) 2024-12-13 14:00:13 +08:00
Daniel Liu
6e055a601d metrics/influxdb: reuse code between v1 and v2 reporters (#26963) 2024-12-13 14:00:13 +08:00
Daniel Liu
9eae1243cd metrics: add cpu counters (#26796)
This PR adds counter metrics for the CPU system and the Geth process.
Currently the only metrics available for these items are gauges. Gauges are
fine when the consumer scrapes metrics data at the same interval as Geth
produces new values (every 3 seconds), but it is likely that most consumers
will not scrape that often. Intervals of 10, 15, or maybe even 30 seconds
are probably more common.

So the problem is, how does the consumer estimate what the CPU was doing in
between scrapes. With a counter, it's easy ... you just subtract two
successive values and divide by the time to get a nice, accurate average.
But with a gauge, you can't do that. A gauge reading is an instantaneous
picture of what was happening at that moment, but it gives you no idea
about what was going on between scrapes. Taking an average of values is
meaningless.
2024-12-13 14:00:13 +08:00
Daniel Liu
c616077fb5 metrics: improve accuracy of CPU gauges (#26793)
This PR changes metrics collection to actually measure the time interval between collections, rather
than assume 3 seconds. I did some ad hoc profiling, and on slower hardware (eg, my Raspberry Pi 4)
I routinely saw intervals between 3.3 - 3.5 seconds, with some being as high as 4.5 seconds. This
will generally cause the CPU gauge readings to be too high, and in some cases can cause impossibly
large values for the CPU load metrics (eg. greater than 400 for a 4 core CPU).

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2024-12-13 14:00:13 +08:00
Daniel Liu
6199c84050 metrics: remove deprecated uses of math.rand (#26710) 2024-12-13 14:00:13 +08:00
Daniel Liu
76320b4b98 metrics/librato: use http package to replace http method names (#26535) 2024-12-13 14:00:13 +08:00
Daniel Liu
40f47a641b metrics/influxdb: fix time ticker leaks (#26507) 2024-12-13 14:00:13 +08:00
Daniel Liu
dddd6c57cd metrics: improve reading Go runtime metrics (#25886)
This changes how we read performance metrics from the Go runtime. Instead
of using runtime.ReadMemStats, we now rely on the API provided by package
runtime/metrics.

runtime/metrics provides more accurate information. For example, the new
interface has better reporting of memory use. In my testing, the reported
value of held memory more accurately reflects the usage reported by the OS.

The semantics of metrics system/memory/allocs and system/memory/frees have
changed to report amounts in bytes. ReadMemStats only reported the count of
allocations in number-of-objects. This is imprecise: 'tiny objects' are not
counted because the runtime allocates them in batches; and certain
improvements in allocation behavior, such as struct size optimizations,
will be less visible when the number of allocs doesn't change.

Changing allocation reports to be in bytes makes it appear in graphs that
lots more is being allocated. I don't think that's a problem because this
metric is primarily interesting for geth developers.

The metric system/memory/pauses has been changed to report statistical
values from the histogram provided by the runtime. Its name in influxdb has
changed from geth.system/memory/pauses.meter to
geth.system/memory/pauses.histogram.

We also have a new histogram metric, system/cpu/schedlatency, reporting the
Go scheduler latency.
2024-12-13 14:00:13 +08:00
Daniel Liu
e8452c94a7 metrics: fix some typos (#25551) 2024-12-13 14:00:12 +08:00
Daniel Liu
392b44c948 metrics/influxdb: replace time.Tick with time.NewTicker (#24783) 2024-12-13 14:00:12 +08:00
Daniel Liu
2e34afe400 rpc: swap out timer metrics to histograms (#25044) 2024-12-13 14:00:12 +08:00
Daniel Liu
7b0a7e4593 metrics/influxdb: temp solution to present counter meaningfully (#24811) 2024-12-13 14:00:12 +08:00
Daniel Liu
d77c1e5ea3 metrics: replace strings.Replace with string.ReplaceAll (#24835) 2024-12-13 14:00:12 +08:00
Daniel Liu
2e5b342826 metrics: add go:build lines (#23468) 2024-12-13 14:00:12 +08:00
Daniel Liu
98079104e4 metrics: fix compilation for GOOS=js (#23449) 2024-12-13 14:00:12 +08:00
Daniel Liu
29b72dbba6 metrics/influxdb: support V2 (#23194) 2024-12-13 14:00:12 +08:00
Daniel Liu
73b81dde78 metrics: use golang.org/x/sys/unix to support Solaris (#22584)
Fixes #11113

Co-authored-by: rene <41963722+renaynay@users.noreply.github.com>
2024-12-13 14:00:12 +08:00
Daniel Liu
d16c72edbe metrics/influxdb: don't push empty histograms, no measurement != 0 (#22590) 2024-12-13 14:00:12 +08:00
Daniel Liu
ebbcd608cc metrics: use resetting histograms for rare packets (#22586) 2024-12-13 14:00:12 +08:00
Daniel Liu
5e9cb5d758 metrics: add handler performance metrics (#22581) 2024-12-13 14:00:12 +08:00
Daniel Liu
ec0ae4965d metrics: fix cast omission in cpu_syscall.go (#22262)
fixes an regression which caused build failure on certain platforms
2024-12-13 14:00:12 +08:00
Daniel Liu
1a844e4578 metrics: remove uneeded syntax (#21921) 2024-12-13 14:00:12 +08:00
Daniel Liu
9d082aa38c cmd/XDC: dump config for metrics (#22083) 2024-12-13 14:00:12 +08:00
Daniel Liu
d4f1b8a6dd metrics: fix the panic for reading empty cpu stats (#21864) 2024-12-13 14:00:12 +08:00
Daniel Liu
bf4b42a551 metrics: zero temp variable in updateMeter (#21470)
* metrics: zero temp variable in  updateMeter

Previously the temp variable was not updated properly after summing it to count.
This meant we had astronomically high metrics, now we zero out the temp whenever we
sum it onto the snapshot count

* metrics: move temp variable to be aligned, unit tests

Moves the temp variable in MeterSnapshot to be 64-bit aligned because of the atomic bug.
Adds a unit test, that catches the previous bug.
2024-12-13 14:00:12 +08:00
Daniel Liu
730960ff06 metrics: make meter updates lock-free (#21446) 2024-12-13 14:00:12 +08:00
Daniel Liu
3dee6675d2 metrics/exp: allow configuring metrics HTTP server on separate endpoint (#21290) 2024-12-13 14:00:12 +08:00
Daniel Liu
d7d54b00f7 metrics: replace gosigar with gopsutil (#21041) 2024-12-13 14:00:12 +08:00
Daniel Liu
32f974cc7b metrics/prometheus: define TYPE once, add tests (#21068)
* metrics/prometheus: define type once for histograms

* metrics/prometheus: test collector
2024-12-13 14:00:12 +08:00
Daniel Liu
c65d0cd947 cmd/XDC: enable metrics for geth import command (#20738) 2024-12-13 14:00:12 +08:00
Daniel Liu
1415bb6369 metrics: add missing calls to Ticker.Stop in tests (#20866) 2024-12-13 14:00:12 +08:00
Daniel Liu
47ce406a4a metrics: make flawed test less flawed (#20818) 2024-12-13 14:00:12 +08:00
Daniel Liu
9fee8a72eb metrics: disable CPU stats (gosigar) on iOS (#20816) 2024-12-13 14:00:12 +08:00
Daniel Liu
462999b381 metrics: fix issues reported by staticcheck (#20365) 2024-12-13 14:00:12 +08:00
Daniel Liu
332ac32bc5 metrics: not compare float numbers directly (#20219) 2024-12-13 14:00:12 +08:00
Daniel Liu
745640795a metrics: change links in README.md to https (#20182) 2024-12-13 14:00:11 +08:00
Daniel Liu
a4e113ca11 metrics: gather and export threads and goroutines (#19725) 2024-12-13 14:00:11 +08:00
Daniel Liu
1eb2ed8293 core, metrics, p2p: expose various counter metrics for grafana (#19692) 2024-12-13 14:00:11 +08:00
Daniel Liu
a577c71944 metrics/prometheus: added prometheus metrics (#17077) 2024-12-13 14:00:11 +08:00
Daniel Liu
ed427a9426 metrics: fix expensive metrics flag processing (#19327) 2024-12-13 14:00:11 +08:00
Daniel Liu
db9487f1e8 core: split out detailed trie access metrics from insertion time (#19316) 2024-12-13 14:00:11 +08:00
Daniel Liu
1557746bcd metrics/influxdb: add a timeout to the InfluxDB HTTP client (#19250) 2024-12-13 14:00:11 +08:00