Skip to content

Percentile Timers

A Timer that buckets the counts, to allow for estimating percentiles. This Timer type will track the data distribution for the timer by maintaining a set of Counters. The distribution can then be used on the server side to estimate percentiles, while still allowing for arbitrary slicing and dicing based on dimensions.

Percentile Timers are expensive compared to basic Timers from the Registry. In order to maintain the data distribution, they have a higher storage cost, with a worst-case of up to 300X that of a standard Timer. Be diligent about any additional dimensions added to Percentile Timers and ensure that they have a small bounded cardinality. In addition, it is highly recommended to set a range, whenever possible, to restrict the worst case overhead.

When using the builder, the range will default from 10 ms to 1 minute. Based on data at Netflix, this is the most common range for request latencies and restricting to this window reduces the worst case multiple from 276X to 58X.

Range Recommendations

The range should be the SLA boundary or failure point for the activity. Explicitly setting the range allows us to optimize for the important range of values and reduce the overhead associated with tracking the data distribution.

For example, suppose you are making a client call and timeout after 10 seconds. Setting the range to 10 seconds will restrict the possible set of buckets used to those approaching the boundary. So we can still detect if it is nearing failure, but percentiles that are further away from the range may be inflated compared to the actual value.

Bucket Distribution

The set of buckets is generated by using powers of 4 and incrementing by one-third of the previous power of 4 in between as long as the value is less than the next power of 4 minus the delta.

Base: 1, 2, 3

4 (4^1), delta = 1
    5, 6, 7, ..., 14,

16 (4^2), delta = 5
   21, 26, 31, ..., 56,

64 (4^3), delta = 21
...