Distributed Tracing Observability in Microservices
Have you ever tried to find a bug in a multi-layered architecture? Although this might sound like a simple enough task, it can quickly become a…
Whether you are just starting your observability journey or already are an expert, our courses will help advance your knowledge and practical skills.
Expert insight, best practices and information on everything related to Observability issues, trends and solutions.
Explore our guides on a broad range of observability related topics.
Distributed tracing is the ability to follow a request through a software system from beginning to end. While that may sound trivial, a single request can easily spawn multiple child requests to different microservices with modern distributed architectures. These, in turn, trigger further sub-requests, resulting in a complex web of transactions to service a single originating request.
While each microservice can generate logs for the specific transactions they handle, those logs don’t describe the entire flow of a request. Piecing transactions together manually is a labor-intensive process.
This is where distributed tracing comes in: by propagating identifiers to each child request (or “span”), tracing allows you to join the dots between transactions and map the entire chain of events. When you’re debugging a complex issue or looking for the source of a performance bottleneck in a distributed microservice-based architecture, distributed tracing provides the insights that logs and the metrics on their own cannot.
In response to the growth in popularity of microservice architectures, several distributed tracing tools have been developed, of which Jaeger is one. Jaeger distributed tracing is an open-source distributed tracing platform that allows you to collect, aggregate, and analyze trace data from software systems.
Initially developed in 2015 by ride-share giant, Uber, Jaeger was adopted by the Cloud Native Computing Foundation (CNCF) in 2017. Two years later, the project was promoted from incubation to graduated status, reflecting its maturity as an established, widely used, and well-documented platform.
As you might expect from a CNCF project, Jaeger is designed for cloud-hosted, containerized, microservice-based systems. It consists of the following elements:
When implementing jaeger distributed tracing, there are various considerations to bear in mind.
The first step towards distributed tracing is to instrument your application code. While this involves some initial effort, it’s an investment that renders your system more observable. The result is that you can later answer questions that you didn’t know you would want to ask. To facilitate the adoption of distributed tracing and avoid vendor lock-in, the industry has centered on an open standard for tracing instrumentation: OpenTelemetry.
Jaeger added native support for OpenTelemetry in 2022, meaning that if you’ve instrumented your application code using the OpenTelemetry Protocol (OTLP) API or SDKs, you can now send traces directly to the Jaeger collector. The Jaeger client libraries have been deprecated, so for new implementations, it’s best to use OpenTelemetry for instrumentation. Using this open standard also allows you to move to other tracing solutions without having to re-instrument your application code first.
Jaeger ships with an all-in-one deployment option, with the agent, collector, and query service in a single container image. However, as this design offers no resilience in the event of the node failing, it’s only suitable for proof-of-concept and demo implementations.
You’ll need to implement multiple collectors to provide resilience and scale for production deployments. This is where it’s beneficial to use the agent for service discovery. You can then send data directly to the storage backend or stream it via Kafka.
If you’re using Kubernetes to orchestrate a containerized deployment, it’s relatively straightforward to add distributed tracing to your K8s cluster using the Jaeger operator. The Jaeger agent is deployed as a sidecar in each pod. You can specify whether to write traces directly to the database from the collector (production strategy) or stream them via Kafka (streaming strategy).
Jaeger distributed tracing can add considerable overhead to your application, as trace identifiers are propagated to each sub-request, and the data from each span is then processed and written to storage. Sampling rates reduce processing and storage costs while still collecting a representative sub-set of trace data.
With Jaeger, sampling can either be configured on the client as part of the instrumentation logic or defined centrally and propagated to clients via the agent. The advantage of remote sampling is that you can apply sampling rates consistently across the system and update them easily.
Jaeger distributed tracing supports two forms of remote sampling: file-based and adaptive. With the former, you define sampling rates for each service or operation explicitly using either probability or rate-limiting. With adaptive sampling, Jaeger adjusts the sampling rate dynamically to meet a pre-determined target tracing rate, meaning it can adjust to changes in traffic.
Jaeger is a cloud-native distributed tracing platform designed to address the challenges of building observability into microservice-based systems. It offers native Kubernetes support via the Kubernetes operator, while support for OpenTelemetry ensures the flexibility to move to other tracing solutions without having to re-instrument your application code.
Have you ever tried to find a bug in a multi-layered architecture? Although this might sound like a simple enough task, it can quickly become a…
Tracing is often the last thought in any observability strategy. While engineers prioritize logs and metrics, tracing is truly the hallmark of a mature observability platform,…
Write enough programs, and you’ll agree that it’s impossible to write an exception-free program, at least in the first go. Java debugging is a major part…