Mastering Kotlin Coroutines with OpenTelemetry Tracing: A Comprehensive Guide

Nicolas Fränkel

In modern software development, observability is key to maintaining robust and performant applications. As systems become increasingly complex, tools like OpenTelemetry offer invaluable insights into how your code performs in real-world scenarios. This is especially true when dealing with Kotlin coroutines, a feature that allows for asynchronous programming in a more straightforward and less error-prone manner compared to traditional threading models. In this blog, we’ll delve into the intricacies of using Kotlin coroutines with OpenTelemetry tracing, exploring how to effectively instrument your Kotlin code to gain deep visibility into your application’s behavior.

Introduction to OpenTelemetry and Kotlin Coroutines

OpenTelemetry is an open-source observability framework that provides APIs, libraries, agents, and instrumentation to generate and collect telemetry data such as traces, metrics, and logs. For JVM applications, OpenTelemetry is particularly useful as it integrates seamlessly with popular frameworks and libraries, enabling automatic instrumentation with minimal code changes.

Kotlin coroutines, on the other hand, offer a powerful way to write asynchronous and non-blocking code. By abstracting away thread management, coroutines simplify concurrent programming, making your code cleaner and more maintainable. However, because coroutines can execute across multiple threads, integrating them with traditional telemetry tools like OpenTelemetry can be challenging.

The Role of the `@WithSpan` Annotation in OpenTelemetry

The @WithSpan annotation in OpenTelemetry is designed to automatically create spans, which are logical units of work that help you track the performance and flow of your code. Spans are essential for tracing requests across different services and components within a distributed system. When you annotate a function with @WithSpan, OpenTelemetry ensures that a span is started when the function is called and ended when it returns, capturing valuable telemetry data in the process.

How `@WithSpan` Works with Coroutines

Initially, the @WithSpan annotation was straightforward to use with synchronous code, but things get more complex when coroutines are involved. This complexity arises because coroutines can switch between threads during their execution, making it difficult to track spans accurately if you’re relying on traditional thread-based context storage.

Understanding the OpenTelemetry Annotation Processor

At the heart of the @WithSpan functionality lies the annotation processor, specifically the WithSpanInstrumentation class. This processor is responsible for detecting methods annotated with @WithSpan and ensuring that the necessary spans are created and managed correctly.

The WithSpanInstrumentation class delegates its tasks to the WithSpanSingleton class, which in turn interacts with the Instrumenter class. The Instrumenter is a core component of OpenTelemetry, handling everything from starting and stopping spans to recording metrics and interacting with the OpenTelemetry collector.

Deep Dive into the `Instrumenter` Class

The Instrumenter class is the workhorse of OpenTelemetry’s tracing functionality. Here’s how it operates:

Initialization: The Instrumenter is typically initialized using the InstrumenterBuilder, which allows you to configure various aspects of how spans are managed. This includes setting up span processors, configuring attributes, and defining span names.
Starting a Span: When a method annotated with @WithSpan is invoked, the Instrumenter checks whether a span should be started using the Instrumenter#shouldStart method. If the conditions are met, a new span is created and linked to the current execution context.
Ending a Span: Once the method execution is complete, the Instrumenter ensures that the span is ended properly, capturing any additional data such as exceptions or errors that occurred during the method execution.
Context Management: The Instrumenter also works closely with the Context class, which stores the current state of the span and other telemetry data. By default, this context is stored in a ThreadLocal variable, which ties the span to a specific thread.

The Challenge of Instrumenting Kotlin Coroutines

While the ThreadLocal approach works well in traditional multi-threaded applications, it poses significant challenges when dealing with Kotlin coroutines. Coroutines are designed to be lightweight and can switch between threads or even run without a dedicated thread. This makes it impossible to rely on ThreadLocal for context propagation, as the context might be lost when the coroutine resumes on a different thread.

Kotlin’s Solution: Coroutine Context

To address this, Kotlin coroutines introduce the concept of a coroutine context, which is a collection of data associated with a coroutine. The coroutine context can include information such as the coroutine’s job, dispatcher, and, crucially for our purposes, the OpenTelemetry context.

The opentelemetry-extension-kotlin library provides the necessary tools to integrate OpenTelemetry with Kotlin coroutines. Specifically, it offers functions to transfer the telemetry context from the ThreadLocal storage to the coroutine context, ensuring that spans are tracked accurately regardless of how coroutines are scheduled and executed.

Integrating OpenTelemetry with Kotlin Coroutines

Let’s explore how the opentelemetry-extension-kotlin library works in practice. The key player here is the KotlinCoroutinesInstrumentationHelper class, which ensures that the OpenTelemetry context is correctly managed when coroutines are involved.

Key Functionality of `KotlinCoroutinesInstrumentationHelper`

The KotlinCoroutinesInstrumentationHelper class provides a method called addOpenTelemetryContext, which is responsible for transferring the context between ThreadLocal and coroutine contexts. Here’s a closer look at how it works:

kotlin
package io.opentelemetry.javaagent.instrumentation.kotlinxcoroutines;

import io.opentelemetry.context.Context;
import io.opentelemetry.extension.kotlin.ContextExtensionsKt;
import kotlin.coroutines.CoroutineContext;

public final class KotlinCoroutinesInstrumentationHelper {

  public static CoroutineContext addOpenTelemetryContext(CoroutineContext coroutineContext) {
    Context current = Context.current();                                                      
    Context inCoroutine = ContextExtensionsKt.getOpenTelemetryContext(coroutineContext);
    if (current == inCoroutine || inCoroutine != Context.root()) {
      return coroutineContext;
    }
    return coroutineContext.plus(ContextExtensionsKt.asContextElement(current));              
  }

  private KotlinCoroutinesInstrumentationHelper() {}
}

How It Works:

Context Retrieval: The method starts by retrieving the current OpenTelemetry context from the ThreadLocal storage using Context.current().
Context Comparison: It then compares the current context with the one stored in the coroutine context. If they match, or if the coroutine context is empty, no further action is needed.
Context Propagation: If the contexts are different, the method adds the current ThreadLocal context to the coroutine context using the asContextElement method from the ContextExtensionsKt class.
This process ensures that the OpenTelemetry context is preserved across coroutine suspensions and resumptions, allowing you to trace coroutine-based code accurately.

Practical Example: Tracing with Kotlin Coroutines
To illustrate how these concepts work together, let’s consider a practical example. Suppose you have a Kotlin function that performs several asynchronous operations using coroutines, and you want to trace these operations using OpenTelemetry.

kotlin
import io.opentelemetry.api.trace.Span
import io.opentelemetry.api.trace.Tracer
import kotlinx.coroutines.CoroutineScope
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.launch
import kotlinx.coroutines.runBlocking

val tracer: Tracer = // initialize your tracer

fun tracedFunction() = runBlocking {
    launch(Dispatchers.Default + addOpenTelemetryContext(coroutineContext)) {
        val span = tracer.spanBuilder("tracedFunction").startSpan()
        try {
            // Perform some asynchronous work
            doAsyncWork()
        } finally {
            span.end()
        }
    }
}

suspend fun doAsyncWork() {
    // Simulate async work
}

In this example:
• Coroutine Scope: The runBlocking coroutine scope is used to run the coroutine.
• Context Propagation: The addOpenTelemetryContext function ensures that the OpenTelemetry context is properly propagated to the coroutine, allowing the Span to be correctly tracked.
Credit and Original Source:
This article was originally written by Nicolas Fränkel and published on their blog. For more insights from Nicolas, including in-depth discussions on OpenTelemetry and other Java-related topics, you can visit the original post here

Conclusion
Tracing Kotlin coroutines with OpenTelemetry requires a deep understanding of both coroutines and how telemetry data is managed. By leveraging the @WithSpan annotation and the opentelemetry-extension-kotlin library, you can gain comprehensive insights into the performance and behavior of your asynchronous Kotlin code. The ability to accurately trace coroutines across multiple threads is essential for debugging, performance tuning, and ensuring the reliability of your applications in production.