Page MenuHomePhabricator

Investigate making TimingMetric unit-aware
Closed, ResolvedPublic

Description

Traditionally, all metrics needed to be provided the value in milliseconds to be calculated and rendered correctly by Graphite via StatsD. StatsLib continued this tradition, but did nothing to indicate or enforce it besides a docstring on the parameter.

Because of the above, the interface was made somewhat confusing:

  1. Prometheus naming convention recommends supplying the metric name with a unit
  2. Prometheus encourages sending values in human terms - this is usually seconds in the smallest unit
  3. Because of the above, the metric unit was set as "seconds" but we gave the verb the duration in milliseconds (unit mismatch)

Given our ownership of StatsLib, we have the opportunity to make TimingMetric unit-aware and thus allow the developer to give us the metric value in whatever unit they want. Given that, StatsLib could then do the appropriate recalculation to milliseconds behind the scenes.

This could manifest as either/both:

  1. Perform the appropriate recalculation to milliseconds based on a developer-supplied unit
  2. Append the developer-supplied unit to the metrics, foregoing the need to suffix the metric name with the unit

Thanks to @Tarrow for this idea!

Event Timeline

colewhite triaged this task as High priority.

We have tripped over this issue a few times now - let's make this happen.

Change #1056249 had a related patch set uploaded (by Cwhite; author: Cwhite):

[mediawiki/core@master] Stats: make TimingMetric unit-aware

https://gerrit.wikimedia.org/r/1056249

Change #1056250 had a related patch set uploaded (by Cwhite; author: Cwhite):

[mediawiki/core@master] Stats: ensure _seconds suffix on TimingMetric instances

https://gerrit.wikimedia.org/r/1056250

colewhite changed the task status from Open to In Progress.Aug 1 2024, 8:20 PM

Change #1056249 merged by jenkins-bot:

[mediawiki/core@master] Stats: add unit helpers functions to TimingMetric

https://gerrit.wikimedia.org/r/1056249

Change #1109102 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/core@master] TimingMetric: clarify "observe" to "observeMilliseconds"

https://gerrit.wikimedia.org/r/1109102

Change #1114444 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] Stats: Add support for chaining TimingMetric->start()

https://gerrit.wikimedia.org/r/1114444

Change #1109102 merged by jenkins-bot:

[mediawiki/core@master] Stats: clarify TimingMetric::observe() units and discourage in new code

https://gerrit.wikimedia.org/r/1109102

Change #1114444 merged by jenkins-bot:

[mediawiki/core@master] Stats: Add support for chaining TimingMetric->start()

https://gerrit.wikimedia.org/r/1114444

Change #1136815 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] ParserOutputAccess: Improve parseroutputaccess_cache stat

https://gerrit.wikimedia.org/r/1136815

Change #1136815 merged by jenkins-bot:

[mediawiki/core@master] ParserOutputAccess: Improve parseroutputaccess_cache stat

https://gerrit.wikimedia.org/r/1136815

I'm going to call this resolved as we have a few methods available now.

For the last associated patch, it was proposed to check the condition statically in CI, although that may be easier said than done.