Introducing Log Observability for Microservices

Joanna Wallace
November 2, 2021

Two popular deployment architectures exist in software: the out-of-favor monolithic architecture and the newly popular microservices architecture. Monolithic architectures were quite popular in the past, with almost all companies adopting them. As time went on, the drawbacks of these systems drove companies to rework entire systems to use microservices instead.

Microservice Observability Logs

Monolithic architectures build applications using a single unit. They may contain a database, a client-facing application, and a server-side executable. Monoliths are simple to develop and deploy when small but quickly get large and unmanageable as a product grows. When upgrades or fixes are required, teams must deploy a new version of the entire executable.

By contrast, microservice architectures use loosely-coupled services which communicate through various interfaces, often using cloud computing. Each service can be upgraded or fixed individually without requiring the deployment of the entire system.

The microservice architecture builds a product as a series of small services that work together to become a product offering. Microservices are ideal for producing maintainable, scalable, and deployable code. They also enable development and quality control teams to work efficiently and independently on different features. However, with microservices comes increased complexity in monitoring system health. Monolithic architectures did not have the same observability issues as microservice architectures do. Microservice observability tools need to track data both within a single service and across different services.

As microservices have gained popularity, so have tools that provide microservice observability. Observability can help limit or prevent system failures, track security issues, monitor user-system interactions, and provide operational insights to reduce cost. This article discusses observability techniques and some of the microservices observability problems that can arise.

Logging for microservices

Logging is the same conceptually in both monolithic and microservice architecture. Logs tell your teams what is happening in your system. Microservices, however, require some further consideration over what has been sufficient for monolithic systems. Below are some tips on how to configure logging in microservices to be used with external tools. Tools such as Coralogix’s log analytics system can help you quickly troubleshoot your system when failures occur. Coralogix also provides examples of how to configure logging in apps.

Centralize log storage

Since microservices are ephemeral and distributed, services must send logs to a central location. When teams can view all logs together, it becomes possible to get an entire picture of what the system is doing rather than just a single service. This single location may be on a local server, a cloud service provider, or a third-party service specializing in observability tools.

Structure log formatting

Logs tell developers and quality assurance teams what is happening in software. While making them human-readable is essential, having a consistent and structured format is also crucial.

Human-readable logs are essential for finding specific issues in a microservice. However, logs alone are less helpful in finding troubleshooting issues in microservices simply because issues can take a significant amount of time to find. Without help from other tools, the scale and diversity of a distributed system can make finding specific events difficult. Using a consistent structure allows logs to be proactively analyzed. These make logs more easily searchable and can even provide insights that prevent issues.

Label logs

Microservice logs come from a large number of sources. Aggregation and log analysis tools need to have a method for discriminating between logs for better analysis. Add labels to these logs for fast and efficient searching. These labels will allow tools to differentiate between logs and only include those necessary in an analysis. Developers can add labels into the log text itself, but keep in mind that you need a consistent log structure for searching.

Sign Up Today

Measuring microservice metrics

Metrics track numerical values associated with specific data measurements over time. Because they use numerical values, the data set tends to be smaller than logs. Their size allows them to be stored for longer and to scale more efficiently. Like logs, it is crucial to keep a consistent metadata format and associate descriptive labels with metrics to ensure fast and effective analysis.

System metrics

System metrics are those values associated with your deployment’s infrastructure. Measurements could include performance, availability, reliability, duration, and memory use.

Tracking these metrics can help your team understand where infrastructure designs may need to be changed or where additional infrastructure is needed to scale the system.

Network metrics

Network metrics are values related to the quality of service your system gives the end-user. Measurements could include bandwidth, throughput, latency, and connectivity. Tracking these metrics will show how the end-user is experiencing your app in terms of speed and quality of experience. Detecting and fixing poor experiences due to network issues can significantly differ the number of conversion rates of app users.

Business intelligence metrics

Business intelligence (BI) metrics are values related to sales and marketing successes and failures. Values tracked could include app traffic, bounce rates, open rates, and conversion rates. Teams often track these metrics in separate tools from technical metrics. However, keeping BI metrics with these technical measurements can help teams understand how technical issues affect customer interactions so you can best focus energies on fixing issues.

Metric cardinality

Metrics tend to require less storage space than logs and are generally faster to query as well. The query speed and efficiency greatly depend on the cardinality of the data. A high cardinality metric will use a unique value for a label. For example, if you use a timestamp as a label, there will be very few metrics that use the same label. Since metric tools use labels for searching, high cardinality data may slow down your search. When designing your metric system, choose a tool with scalable metrics regardless of the label chosen, or be sure to choose labels wisely and keep cardinality low.

Correlating Metrics

Developers should correlate the metrics discussed above in some centralized location. Doing this allows users to have a holistic view of what is going on in the system. If your system spans different metric collectors, you can use a third-party system that integrates metrics from multiple sources. Understanding how much capacity a system has or how many users tend to use services at a given time is invaluable to scaling, troubleshooting, and building your system.

Sign Up Today

Tracing requests

Logs and metrics show what has happened to data in a single microservice. On the other hand, traces allow developers to understand the lifecycle of any given request in a system. Traces are pieces of metadata that flow with a request through the entire distributed system. They show you where data moves from one service to another, where bottlenecks are in the system, and where errors have occurred.

Traces can point to microservices or infrastructure that are causing errors in your system. Once the erroring service is known, troubleshooters can use logs and metrics to zero in on the problem and fix it. Without traces to point to the error in the first place, troubleshooting can take significantly longer. Unlike monolithic deployments, microservice issues often cannot be found by isolating and recreating issues. Traces provide an alternative tracking method for teams.

Service mesh for distributed tracing

Service meshes work as an intermediary for data between microservices. They operate outside of the microservice itself as a separate infrastructure layer. Meshes themselves can include logs or metrics to help users understand how microservices are interacting. Since all data flows through them to get to a service, they can inject traces as metadata onto microservice calls.

Service meshes provide a clean way to implement traces in microservices deployed on containers like Kubernetes. Service meshes may already be in place for load balancing or security. Using these also means developers do not need to alter container code to implement tracing.

Wrapping up

Microservice architecture gives development teams more flexibility than ever before when implementing various tools in a single product offering. Developers could work on a unique feature within the product using different development tools and even languages. Features could be deployed as they were made ready without disrupting other parts of the product. Deployments no longer require product downtime and could be done one microservice at a time.

Products must attain microservice observability differently than they would for monolithic services. While monoliths required developers to zone in on issues using logs and test inputs, microservice require more involved tools. The benefits of microservices still outweigh these new issues they have brought forth.

Designs must include logging, metrics, and traces in their distribution system to obtain microservice observability. Including logging, metrics, and traces will give developers the tools needed to find and fix issues efficiently and effectively. Using a centralized tool to analyze each of these data sets can help your team find errors first before clients see them. Coralogix offers a platform that uses machine learning to predict where errors will occur.