Naming
Summary¶
- Names
- Describe the measurement being collected
- Use short prefixes for categorization (max 2 levels)
- Use camelCase
- Static - no dynamic content
- Succinct - avoid long names
- Tags
- Should be used for dimensional filtering
- Be careful about combinatorial explosion and cardinality
- Tag combinations should be stable over time
- Tag keys should be static
- Use
id
to distinguish between instances
- Query Design
- Avoid the need for regex and expensive pattern matching
- Design for simple queries with incremental drill-down
- Support exact matches and simple filters
- Use Base Units
Names¶
Describe the Measurement¶
Names should clearly describe what is being measured. A good name allows someone to understand the metric without needing additional context.
Use Short Prefixes for Categorization¶
Common names should use short prefixes to broadly categorize metrics, for example ipc.server.call
or jvm.gc.pause
. The prefix should generally have no more than 2 levels to keep names succinct.
This is not a package hierarchy like in Java - it's simply a way to group related metrics.
Examples of good prefixes:
* ipc.*
for inter-process communication metrics
* jvm.*
for Java Virtual Machine metrics
* db.*
for database metrics
The prefix provides just enough context to understand the broad category and perhaps a sub-category,
while the rest of the name specifies the actual measurement. Remember that metrics will already be
scoped by other dimensions like application name, instance, etc., so the name itself should focus
on describing the measurement rather than providing extensive context. Avoid unnecessary boiler
plate like com.netflix.*
.
Use camelCase¶
For segments within a name, use camel case to distinguish words if needed. For example
jvm.gc.concurrentPhaseTime
.
The exception to this rule is where there is an established common case. For example, with Amazon
regions, it is preferred to use us-east-1
rather than usEast1
as it is the more common form.
Static¶
There should not be any dynamic content in a metric name, such as requests.$APP_NAME
. Metric names
and tag keys are how users interact with the data, and dynamic values make them difficult to use.
Dynamic information is better suited for tag values.
Succinct¶
Long names should be avoided. In many cases, long names are the result of combining many pieces of information together into a single string. In this case, consider either discarding information that is not useful or encoding the information in tag values. Shorter names are easier to read, type, and view when working with the data.
Tags¶
Tags should be used for dimensional filtering - they allow data to be filtered into subsets by values of interest. Using tags as a namespace mechanism is discouraged.
As a general rule, it should be possible to use the name as a pivot. If only the name is selected, then the user should be able to use other dimensions to filter the data and successfully reason about the aggregate value being shown.
Cardinality Considerations¶
Keep combinatorial complexity in mind. The full combination of tags creates unique time series, and each combination consumes storage and processing resources. Tag combinations should be stable over time to avoid constantly creating new time series.
Consider the cardinality impact: * A metric with 3 tag keys, each with 10 possible values = 1,000 potential time series * A metric with 5 tag keys, each with 10 possible values = 100,000 potential time series
Guidelines for managing cardinality: * Limit high-cardinality dimensions. Avoid tags with unbounded or very large value sets * Use stable identifiers. Tag values should remain consistent over time
Design for Simple Queries¶
Avoid regex and expensive pattern matching. Design metric names and tag structures so they can be queried simply and allow users to incrementally drill into the data. This improves both query performance and user experience.
Good query patterns:
* name,threadpool.size,:eq
- exact match on name
* name,threadpool.size,:eq,id,server-requests,:eq,:and
- add exact tag filter
* name,threadpool.*,:re
- simple prefix pattern (use sparingly)
Avoid patterns that require expensive operations: * Complex regex patterns that must scan many metric names * Queries that require examining all tag combinations to find matches * Dynamic name construction that makes direct queries impossible
Design principle: Users should be able to start with a broad query and progressively add filters to narrow down to the specific data they need.
As a concrete example, suppose we have two metrics:
- The number of threads currently in a thread pool.
- The number of rows in a database table.
Discouraged Approach¶
Id poolSize = registry.createId("size")
.withTag("class", "ThreadPool")
.withTag("id", "server-requests");
Id poolSize = registry.createId("size")
.withTag("class", "Database")
.withTag("table", "users");
In this approach, if you select the name size
, then it will match both the ThreadPool
and
Database
classes. This results in a value that is an aggregate of the number of threads and the
number of items in a database, which has no meaning.
Recommended Approach¶
Id poolSize = registry.createId("threadpool.size")
.withTag("id", "server-requests");
Id poolSize = registry.createId("db.size")
.withTag("table", "users");
This variation provides enough context in the name so that the meaning is more apparent and you can
successfully reason about the values. For example, if you select threadpool.size
, then you can
see the total number of threads in all pools. You can then group by or select an id
to further
filter the data to a subset in which you have an interest.
This approach also supports simple queries without regex patterns:
* name,threadpool.size,:eq
gives you all thread pool sizes
* name,db.size,:eq
gives you all database sizes
* name,threadpool.size,:eq,id,server-requests,:eq,:and
drills down to a specific pool
Use Base Units¶
Keep measurements in base units, whenever and wherever possible.
It is best to have timers in seconds
, disk sizes in bytes
, and network rates in bytes/second
.
This allows any International System of Units (SI) prefixes applied to tick labels on a graph to
have an obvious meaning, such as:
1 m
meaning1 millisecond
, as opposed to1 milli-millisecond
, for timers.1 k
meaning1 kilobyte
, as opposed to1 kilo-megabyte
, for disk sizes.1 M
meaning1 megabyte/second
, as opposed to1 mega-kilobyte
, for network rates.
Atlas automatically applies tick labels to the Y-axis of the graph, in order to accurately report the magnitude of values, while keeping them within the view window.
Some meters in some clients, such as Java Timers, will automatically constrain values to base units in their implementations.