Adding instrumentation for tracing and logging¶
It can be extremely valuable to add tracing, metrics and logging to your GraphQL API. At Netflix we publish tracing spans and metrics for each datafetcher to our distributed tracing/metrics backends, and log queries and query results to our logging backend. The implementations we use at Netflix are highly specific for our infrastructure, but it's easy to add your own to the framework!
Internally the DGS framework uses GraphQL Java.
GraphQL Java supports the concept of instrumentation
.
In the DGS framework we can easily add one or more instrumentation classes by implementing the graphql.execution.instrumentation.Instrumentation
interface and register the class as @Component
.
The easiest way to implement the Instrumentation
interface is to extend graphql.execution.instrumentation.SimpleInstrumentation
.
The following is an example of an implementation that outputs the execution time for each data fetcher, and the total query execution time, to the logs. Most likely you would want to replace the log output with writing to your tracing/metrics backend. Note that the code example accounts for async data fetchers. If we wouldn't do this, the result for an async data fetcher would always be 0, because the actual processing happens later.
@Component
public class ExampleTracingInstrumentation extends SimpleInstrumentation {
private final static Logger LOGGER = LoggerFactory.getLogger(ExampleTracingInstrumentation.class);
@Override
public InstrumentationState createState() {
return new TracingState();
}
@Override
public InstrumentationContext<ExecutionResult> beginExecution(InstrumentationExecutionParameters parameters) {
TracingState tracingState = parameters.getInstrumentationState();
tracingState.startTime = System.currentTimeMillis();
return super.beginExecution(parameters);
}
@Override
public DataFetcher<?> instrumentDataFetcher(DataFetcher<?> dataFetcher, InstrumentationFieldFetchParameters parameters) {
// We only care about user code
if(parameters.isTrivialDataFetcher()) {
return dataFetcher;
}
return environment -> {
long startTime = System.currentTimeMillis();
Object result = dataFetcher.get(environment);
if(result instanceof CompletableFuture) {
((CompletableFuture<?>) result).whenComplete((r, ex) -> {
long totalTime = System.currentTimeMillis() - startTime;
LOGGER.info("Async datafetcher {} took {}ms", findDatafetcherTag(parameters), totalTime);
});
} else {
long totalTime = System.currentTimeMillis() - startTime;
LOGGER.info("Datafetcher {} took {}ms", findDatafetcherTag(parameters), totalTime);
}
return result;
};
}
@Override
public CompletableFuture<ExecutionResult> instrumentExecutionResult(ExecutionResult executionResult, InstrumentationExecutionParameters parameters) {
TracingState tracingState = parameters.getInstrumentationState();
long totalTime = System.currentTimeMillis() - tracingState.startTime;
LOGGER.info("Total execution time: {}ms", totalTime);
return super.instrumentExecutionResult(executionResult, parameters);
}
private String findDatafetcherTag(InstrumentationFieldFetchParameters parameters) {
GraphQLOutputType type = parameters.getExecutionStepInfo().getParent().getType();
GraphQLObjectType parent;
if (type instanceof GraphQLNonNull) {
parent = (GraphQLObjectType) ((GraphQLNonNull) type).getWrappedType();
} else {
parent = (GraphQLObjectType) type;
}
return parent.getName() + "." + parameters.getExecutionStepInfo().getPath().getSegmentName();
}
static class TracingState implements InstrumentationState {
long startTime;
}
}
@Component
class ExampleTracingInstrumentation: SimpleInstrumentation() {
val logger : Logger = LoggerFactory.getLogger(ExampleTracingInstrumentation::class.java)
override fun createState(): InstrumentationState {
return TraceState()
}
override fun beginExecution(parameters: InstrumentationExecutionParameters): InstrumentationContext<ExecutionResult> {
val state: TraceState = parameters.getInstrumentationState()
state.traceStartTime = System.currentTimeMillis()
return super.beginExecution(parameters)
}
override fun instrumentDataFetcher(dataFetcher: DataFetcher<*>, parameters: InstrumentationFieldFetchParameters): DataFetcher<*> {
// We only care about user code
if(parameters.isTrivialDataFetcher) {
return dataFetcher
}
val dataFetcherName = findDatafetcherTag(parameters)
return DataFetcher { environment ->
val startTime = System.currentTimeMillis()
val result = dataFetcher.get(environment)
if(result is CompletableFuture<*>) {
result.whenComplete { _,_ ->
val totalTime = System.currentTimeMillis() - startTime
logger.info("Async datafetcher '$dataFetcherName' took ${totalTime}ms")
}
} else {
val totalTime = System.currentTimeMillis() - startTime
logger.info("Datafetcher '$dataFetcherName': ${totalTime}ms")
}
result
}
}
override fun instrumentExecutionResult(executionResult: ExecutionResult, parameters: InstrumentationExecutionParameters): CompletableFuture<ExecutionResult> {
val state: TraceState = parameters.getInstrumentationState()
val totalTime = System.currentTimeMillis() - state.traceStartTime
logger.info("Total execution time: ${totalTime}ms")
return super.instrumentExecutionResult(executionResult, parameters)
}
private fun findDatafetcherTag(parameters: InstrumentationFieldFetchParameters): String {
val type = parameters.executionStepInfo.parent.type
val parentType = if (type is GraphQLNonNull) {
type.wrappedType as GraphQLObjectType
} else {
type as GraphQLObjectType
}
return "${parentType.name}.${parameters.executionStepInfo.path.segmentName}"
}
data class TraceState(var traceStartTime: Long = 0): InstrumentationState
}
Datafetcher 'Query.shows': 0ms
Total execution time: 3ms
Enabling Apollo Tracing¶
If you want to leverage Apollo Tracing, as supported by graphql-java
, you can create a bean of type TracingInstrumentation
.
In this example, we added a conditional property on the bean to enable/disable the Apollo Tracing.
This property is enabled by default, but you can turn it off by setting graphql.tracing.enabled=false
in your application properties.
import graphql.execution.instrumentation.tracing.TracingInstrumentation;
@SpringBootApplication
public class ReviewsDgs {
@Bean
@ConditionalOnProperty( prefix = "graphql.tracing", name = "enabled", matchIfMissing = true)
public Instrumentation tracingInstrumentation(){
return new TracingInstrumentation();
}
}
For federated tracing, you will need to use the instrumentation provided by Apollo's jvm federation library
import com.apollographql.federation.graphqljava.tracing.FederatedTracingInstrumentation;
@SpringBootApplication
public class ReviewsDgs {
@Bean
@ConditionalOnProperty( prefix = "graphql.tracing", name = "enabled", matchIfMissing = true)
public Instrumentation tracingInstrumentation(){
return new FederatedTracingInstrumentation();
}
}
It's important to note that the default behavior in Apollo's JVM federation library is to trace requests, even if the federated gateway does not request it (i.e. the gateway does not add FEDERATED_TRACING_HEADER_NAME
when forwarding the request). Including the following component in your DGS project will explicitly ask Apollo's JVM federation library to not trace a request in the event that the gateway does not request it.
@Component
public class ApolloFederatedTracingHeaderForwarder implements GraphQLContextContributor {
@Override
public void contribute(@NotNull GraphQLContext.Builder builder, @Nullable Map<String, ?> extensions, @Nullable DgsRequestData dgsRequestData) {
if (dgsRequestData == null || dgsRequestData.getHeaders() == null) {
return;
}
final HttpHeaders headers = dgsRequestData.getHeaders();
// if the header exists, we should just forward it.
if (headers.containsKey(FederatedTracingInstrumentation.FEDERATED_TRACING_HEADER_NAME)) {
builder.put(
FederatedTracingInstrumentation.FEDERATED_TRACING_HEADER_NAME,
headers
.get(FederatedTracingInstrumentation.FEDERATED_TRACING_HEADER_NAME)
.stream()
.findFirst()
.get()
);
} else {
//otherwise, place a value != "ftv1" so when it gets checked for == ftv1 it fails
// and trace does not happen.
builder.put(FederatedTracingInstrumentation.FEDERATED_TRACING_HEADER_NAME, "DO_NOT_TRACE");
}
}
}
Metrics Out of The Box¶
tl;dr
- Supported via the opt-in
graphql-dgs-spring-boot-micrometer
module. - Provides specific GraphQL metrics such as
gql.query
,gql.error
, andgql.dataLoader
. - Supports several backend implementations since it's implemented via Micrometer.
dependencies {
implementation 'com.netflix.graphql.dgs:graphql-dgs-spring-boot-micrometer'
}
dependencies {
implementation("com.netflix.graphql.dgs:graphql-dgs-spring-boot-micrometer")
}
<dependencies>
<dependency>
<groupId>com.netflix.graphql.dgs</groupId>
<artifactId>graphql-dgs-spring-boot-micrometer</artifactId>
</dependency>
</dependencies>
Hint
Note that the version is missing since we assume you are using the latest BOM. We recommend you use the DGS Platform BOM to handle such versions.
Shared Tags¶
The following are tags shared across most of the meters.
Tags:
tag name | values | description |
---|---|---|
gql.operation |
QUERY, MUTATION, SUBSCRIPTION are the possible values. These represent the GraphQL operation that is executed. | |
gql.operation.name |
GraphQL operation name if any, else anonymous . Since the cardinality of the value is high it will be limited. |
|
gql.query.complexity |
one in [5, 10, 20, 50, 100, 200, 500, 1000] | The total number of nodes in the query. Refer to Query Complexity section. for additional information. |
gql.query.sig.hash |
Query Signature Hash of the query that was executed. Since the cardinality of the value is high it will be limited. |
GraphQL Query Complexity¶
The gql.query.complexity
is typically calculated as 1 + Child's Complexity. The query complexity is valuable to calculate
the cost of a query as this can vary based on input arguments to the query.
The computed value is represented as one of the bucketed values to reduce the cardinality of the metric.
Example Query:
query {
viewer {
repositories(first: 50) {
edges {
repository:node {
name
issues(first: 10) {
totalCount
edges {
node {
title
bodyHTML
}
}
}
}
}
}
}
}
Example Calculation:
50 = 50 repositories
+
50 x 10 = 500 repository issues
= 550 total nodes
GraphQL Query Signature Hash¶
The Query Signature is defined as the tuple of the GraphQL AST Signature of the GraphQL Document and the GraphQL AST Signature Hash. The GraphQL AST Signature of a GraphQL Document is defined as follows:
A canonical AST which removes excess operations, removes any field aliases, hides literal values and sorts the result into a canonical query Ref graphql-java
The GraphQL AST Signature Hash is the Hex 256 SHA string produced by encoding the AST Signature.
While we can't tag a metric by its signature, due its length, we can use the hash, as now expressed by the
gql.query.sig.hash
tag.
There are a few configuration parameters that can change the behavior of the gql.query.sig.hash
tag.
management.metrics.dgs-graphql.query-signature.enabled
: Defaulting totrue
, it enables the calculation of the GQL Query Signature. Thegql.query.sig.hash
will express the GQL Query Signature Hash.management.metrics.dgs-graphql.query-signature.caching.enabled
: Defaulting totrue
, it will cache the GQL Query Signature. If set tofalse
it will just disable the cache but will not turn the calculation of the signature off. If you want to turn such calculation off use themanagement.metrics.dgs-graphql.query-signature.enabled
property.
Cardinality Limiter¶
The cardinality of a given tag, the number of different values that a tag can express, can be problematic to
servers supporting metrics. In order to prevent the cardinality of some of the tags supported out of the box there
are some limiters by default. The limited tag values will only see the first 100 different values by default,
from there new values will be expressed as --others--
.
You can change the limiter via the following configuration:
management.metrics.dgs-graphql.tags.limiter.limit
: Defaults to100
, sets the number of different values expressed per limited tag.
Not all tags are limited, currently, only following are:
gql.operation.name
gql.query.sig.hash
Query Timer: gql.query¶
Captures the elapsed time that a given GraphQL query, or mutation, takes.
Name: gql.query
Tags:
tag name | values | description |
---|---|---|
outcome |
success or failure |
Result of the operation, as defined by the ExecutionResult. |
Error Counter: gql.error¶
Captures the number of GraphQL errors encountered during query execution of a query or mutation. Remember that one graphql request can have multiple errors.
Name: gql.error
Tags:
tag name | description |
---|---|
gql.errorCode |
The GraphQL error code, such as VALIDATION , INTERNAL , etc. |
gql.path |
The sanitized query path that resulted in the error. |
gql.errorDetail |
Optional flag containing additional details, if present. |
Data Loader Timer: gql.dataLoader¶
Captures the elapsed time for a data loader invocation for a batch of queries. This is useful if you want to find data loaders that might be responsible for poor query performance.
Name: gql.dataLoader
Tags:
tag name | description |
---|---|
gql.loaderName |
The name of the data loader, may or may not be the same as the type of entity. |
gql.loaderBatchSize |
The number of queries executed in the batch. |
Data Fetcher Timer: gql.resolver¶
Captures the elapsed time of each data fetcher invocation.
This is useful if you want to find data fetchers that might be responsible for poor query performance.
That said, there might be times where you want to remove a data fetcher from being measured/included in this meter.
You can do so by annotating the method with @DgsEnableDataFetcherInstrumentation(false)
.
Info
This metric is not available if:
- The data is resolved via a Batch Loader.
- The method is annotated with
@DgsEnableDataFetcherInstrumentation(false)
. - The DataFetcher is TrivialDataFetcher. A trivial DataFetcher is one that simply maps data from an object to a field.
This is defined directly in
graphql-java
.
Name: gql.resolver
Tags:
tag name | description |
---|---|
gql.field |
Name of the data fetcher. This has the ${parentType}.${field} format as specified in the @DgsData annotation. |
Data Fetcher Timer as a Counter¶
The data fetcher, or resolver, timer can be used as a counter as well. If used in this manner it will reflect the number of invocations of each data fetcher. This is useful if you want to find out which data fetchers are used often.
Further Tag Customization¶
You can customize the tags applied to the metrics above by providing beans that implement the following functional interfaces.
Interface | Description |
---|---|
DgsContextualTagCustomizer |
Used to add common, contextual tags. Example of these could be used to describe the deployment environment, application profile, application version, etc |
DgsExecutionTagCustomizer |
Used to add tags specific to the ExecutionResult of the query. The SimpleGqlOutcomeTagCustomizer is an example of this. |
DgsFieldFetchTagCustomizer |
Used to add tags specific to the execution of data fetchers. The SimpleGqlOutcomeTagCustomizer is an example of this as well. |
Additional Metrics Configuration¶
management.metrics.dgs-graphql.enabled
: Enables the metrics provided out of the box; defaults totrue
.management.metrics.dgs-graphql.tag-customizers.outcome.enabled
: Enables the tag customizer that will label thegql.query
andgql.resolver
timers with anoutcome
reflecting the result of the GraphQL outcome, eithersuccess
orfailure
; defaults totrue
.management.metrics.dgs-graphql.data-loader-instrumentation.enabled
: Enables instrumentation of data loaders; defaults totrue
.management.metrics.dgs-graphql.resolver.enabled
: Enables instrumentation of graphql resolvers; defaults totrue
.management.metrics.dgs-graphql.query.enabled
: Enables instrumentation of graphql query; defaults totrue
.