Introduction
In today’s technology landscape, whether you’re implementing microservices or working with a multi-component system, monitoring and observability are paramount. A typical system might consist of a reverse proxy, multiple applications, and databases. As the number of components increases, so does the complexity of monitoring. When issues arise, having an aggregated view across all components becomes essential. This aggregated view is achieved through tracing, one of the three pillars of observability alongside metrics and logs.
In this post, we’ll delve into end-to-end tracing using OpenTelemetry, guiding you through setting up tracing in a microservices architecture with practical examples and code snippets.
Understanding End-to-End Tracing
End-to-end tracing involves tracking the journey of a request as it traverses through various components of a distributed system. This visibility is crucial for identifying bottlenecks, latency issues, and failures within your architecture.
Why Tracing Matters
In a microservices architecture, a single user request can initiate interactions across multiple services. Without tracing, pinpointing the root cause of an issue becomes challenging. Tracing provides a comprehensive view of how requests propagate through services, making it easier to diagnose and resolve problems efficiently.
OpenTelemetry: The Golden Standard for Tracing
What is OpenTelemetry?
OpenTelemetry is a collection of tools, APIs, and SDKs designed to instrument, generate, collect, and export telemetry data (metrics, logs, and traces). Managed by the Cloud Native Computing Foundation (CNCF), OpenTelemetry has emerged as the de facto standard for observability in modern software systems. It supports multiple programming languages, making it versatile for diverse technology stacks.
For more information, visit OpenTelemetry.
The Evolution of OpenTelemetry
Before OpenTelemetry, two primary projects dominated the observability landscape:
• OpenTracing: Focused solely on tracing, providing a standard API for instrumentation.
• OpenCensus: Aimed at capturing both metrics and traces, offering built-in libraries for various languages.
These projects merged, incorporating logs to form OpenTelemetry, which now offers comprehensive observability capabilities through:
• Instrumentation APIs: Available in multiple languages.
• Canonical Implementations: Standardized across different languages.
• Infrastructure Components: Such as collectors.
• Interoperability Formats: Including the W3C’s Trace Context.
While Trace Context is limited to HTTP, OpenTelemetry extends tracing to non-web components like Kafka, broadening its applicability.
The Use Case: E-Commerce Microservices
Consider an e-commerce platform designed with microservices, each accessible via REST APIs and secured behind an API Gateway. For simplicity, we’ll focus on two microservices:
- Catalog Service: Manages product information.
- Pricing Service: Handles product pricing.
When a user accesses the application, the home page fetches all products and their respective prices to display them.
Request Flow Example:
• User request flows through the API Gateway.
• API Gateway interacts with the Catalog and Pricing services.
• Each service communicates with its respective database.
Technologies Used:
• Catalog Service: Spring Boot application written in Kotlin.
• Pricing Service: Python Flask application.
Implementing Tracing with Apache APISIX and OpenTelemetry
Setting Up the Gateway with Apache APISIX
The gateway is the entry point of the system, responsible for generating the trace ID. We’ll use Apache APISIX as the API Gateway due to its rich feature set and plugin architecture.
Apache APISIX offers an OpenTelemetry plugin that reports tracing data according to the OpenTelemetry specification.
For more details, visit Apache APISIX OpenTelemetry Plugin.
Configuring the OpenTelemetry Plugin
config.yaml
Copy code
apisix:
enable_admin: false
config_center: yaml
plugins:
- opentelemetry
plugin_attr:
opentelemetry:
resource:
service.name: APISIX
collector:
address: jaeger:4318
This configuration sets up the OpenTelemetry plugin globally, ensuring that every route is traced.
apisix.yaml
Copy code
global_rules:
- id: 1
plugins:
opentelemetry:
sampler:
name: always_on
Key Points:
• Service Name: Identifies the service in trace displays.
• Collector Address: Points to the Jaeger service for trace data collection.
• Sampler: Set to always_on for comprehensive tracing in the demo.
Collecting and Storing Traces with Jaeger
Jaeger is a popular tool for collecting, storing, and visualizing trace data. It offers an all-in-one Docker image, simplifying setup and configuration.
Docker Compose Setup for Jaeger
docker-compose.yaml
Copy code
services:
jaeger:
image: jaegertracing/all-in-one:1.37
environment:
- COLLECTOR_OTLP_ENABLED=true
ports:
- "16686:16686" # Jaeger UI
- "4317:4317" # OpenTelemetry gRPC
- "4318:4318" # OpenTelemetry HTTP
Port Breakdown:
• 16686: Jaeger UI for trace visualization.
• 4317: Accepts OpenTelemetry Protocol (OTLP) over gRPC.
• 4318: Accepts OpenTelemetry Protocol (OTLP) over HTTP.
Instrumenting the Microservices
Tracing in the Pricing Service (Python Flask)
The Pricing Service offers an endpoint to fetch product prices. We’ll instrument it using OpenTelemetry for automatic and manual tracing.
Endpoint Example:
Copy code
from flask import Flask, jsonify
from opentelemetry import trace
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from random import uniform
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
@app.route('/price/<product_str>')
def price(product_str: str):
product_id = int(product_str)
with tracer.start_as_current_span("SELECT * FROM PRICE WHERE ID=:id", attributes={":id": product_id}):
# Simulate a database call
price = {"product_id": product_id, "price": round(uniform(19.0, 21.0), 2)}
return jsonify(price)
Docker Configuration:
Copy code
pricing:
build: ./pricing
environment:
OTEL_EXPORTER_OTLP_ENDPOINT: http://jaeger:4317
OTEL_RESOURCE_ATTRIBUTES: service.name=pricing
OTEL_METRICS_EXPORTER: none
OTEL_LOGS_EXPORTER: none
Key Steps:
- Install OpenTelemetry Packages:
Copy code
pip install opentelemetry-distro[otlp] opentelemetry-instrumentation opentelemetry-
instrumentation-flask
- Instrument the Flask Application:
The FlaskInstrumentor
automatically creates spans for each request.
- Run the Application with OpenTelemetry:
Copy code
opentelemetry-instrument flask run
- Manual Instrumentation (Optional):
Add custom spans to capture specific operations.
Copy code
from opentelemetry import trace
@app.route('/price/<product_str>')
def price(product_str: str):
product_id = int(product_str)
with tracer.start_as_current_span("SELECT * FROM PRICE WHERE ID=:id", attributes={":id": product_id}):
price = {"product_id": product_id, "price": round(uniform(19.0, 21.0), 2)}
return jsonify(price)
Tracing in the Catalog Service (Spring Boot Kotlin)
The Catalog Service is a Reactive Spring Boot application developed in Kotlin. We’ll use the OpenTelemetry Java agent for automatic instrumentation and add manual spans where necessary.
Running the Application with OpenTelemetry Agent:
Copy code
java -javaagent:opentelemetry-javaagent.jar -jar catalog.jar
Docker Configuration:
Copy code
catalog:
build: ./catalog
environment:
APP_PRICING_ENDPOINT: http://pricing:5000/price
OTEL_EXPORTER_OTLP_ENDPOINT: http://jaeger:4317
OTEL_RESOURCE_ATTRIBUTES: service.name=catalog
OTEL_METRICS_EXPORTER: none
OTEL_LOGS_EXPORTER: none
Manual Instrumentation with Annotations:
- Add Dependency:
Copy code
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-instrumentation-annotations</artifactId>
<version>1.17.0-alpha</version>
</dependency>
- Annotate Methods:
Copy code
import io.opentelemetry.api.trace.Span
import io.opentelemetry.extension.annotations.WithSpan
@RestController
class ProductController(val repository: ProductRepository) {
@WithSpan("ProductHandler.fetch")
suspend fun fetch(@SpanAttribute("id") id: Long): Result<Product> {
val product = repository.findById(id)
return if (product != null) Result.success(product) else Result.failure(Exception("Product not found"))
}
}
Analyzing Traces in Jaeger
With tracing implemented, Jaeger provides a comprehensive UI to visualize and analyze traces.
Key Insights from Jaeger:
• Sequence Flow: Visualize the path of requests across services.
• Performance Bottlenecks: Identify spans with high latency.
• Error Diagnosis: Locate spans where errors occur.
Best Practices for Effective Tracing
- Start with Automatic Instrumentation: It provides immediate insights with minimal effort. Gradually add manual instrumentation for critical paths.
- Implement Sampling Strategically: While tracing every request offers complete visibility, it can impact performance. Use sampling to balance observability and efficiency.
- Ensure Context Propagation: Properly propagate trace context across service boundaries to maintain trace integrity.
- Monitor Trace Volume: Keep an eye on the volume of trace data to manage storage and processing costs effectively.
Credit and Original Source:
This article was originally written by Nicolas Fränkel and published on their blog. For more insights from Nicolas, including in-depth discussions on OpenTelemetry and other Java-related topics, you can visit the original post here.
Conclusion
Implementing end-to-end tracing with OpenTelemetry significantly enhances the observability of your distributed systems. By integrating tracing into your microservices architecture, you gain valuable insights into request flows, performance metrics, and error points, enabling you to optimize and maintain your applications effectively.
Whether you’re working with HTTP-based services or extending tracing to non-web components like Kafka, OpenTelemetry provides the tools and flexibility needed to achieve comprehensive observability.
Key Takeaways:
• OpenTelemetry is the leading standard for implementing observability in distributed systems.
• End-to-End Tracing provides deep insights into request flows across multiple services.
• Jaeger offers a robust solution for collecting, storing, and visualizing trace data.
• Effective Tracing requires a balance between comprehensive data collection and system performance.
Embark on your observability journey with OpenTelemetry and transform how you monitor and optimize your microservices architecture.