Optimizing observability costs with a DIY framework
Observability costs are exploding as businesses strive to deliver maximum customer satisfaction with high performance and 24/7 availability. Global annual spending on observability in 2024 is…
Whether you are just starting your observability journey or already are an expert, our courses will help advance your knowledge and practical skills.
Expert insight, best practices and information on everything related to Observability issues, trends and solutions.
Explore our guides on a broad range of observability related topics.
If you build it right and not just use it as a buzz word, I believe we can all agree that microservices are great and that they make your code & production environment simpler to handle.
Here are some of the clear benefits when writing a micro service architecture:
Although all of the nice things written above, micro services have a very dark side to them, which if not checked & managed, can cause havoc to your production system and not only crash it but exhaust your cloud resources. Here is an example for such a scenario:
Let’s assume that one of your microservices is responsible for pulling messages from your Kafka queue and in turn, process each message and insert it into your Elastic search. Now, your service is on docker, fully scaled, can work in parallel with many other services of the same kind and is managed by an auto-discovery service.
Now let’s assume that you have a bug in your service which only happens when a certain type of data resides in the pulled messages. That bug causes the service to work slower and slower as time goes by, (not due to a memory leak but rather because the calculations needed due to the prior unexpected type of data are higher) thus your service pulls fewer messages each second. And if that’s not enough, your auto-discovery is not able to communicate with this service because the CPU is so highly utilized that the handshake with the machine takes too long and fails after a 5 seconds timeout.
What will happen next is not pleasant; your monitoring system will probably detect the backlog in your Kafka and alert your auto scale system which will start opening more and more micro services on machines that have no clue what their IPs are. This will keep going until you’ll run out of allocated machines or (if you’re smart) reach the pre-defined limit you’ve defined in your auto scale service. When you figured out what’s wrong and run to shut down the zombie machines from the cloud console, it will probably be too late.
If you’ve seen the walking dead, this case is not far from it; you have dozens of zombie machines with hundreds of micro services. The piece of information you have left are log records which you inserted in your service, and if you did not insert solid logs, you are probably in a world of pain.
The moral of the story is that it is vital to log your micro service well and to make sure that your logs provide you with all the necessary data you need to control them, here are a few thumb rules.
To summarize, microservices are an excellent piece of architecture, and if you built it right, this design would provide your code flexibility, durability, and modularity, BUT you need to take into account that a micro service can go rogue and plan for this day.
For questions and feedback don’t hesitate to email us at [email protected]
Observability costs are exploding as businesses strive to deliver maximum customer satisfaction with high performance and 24/7 availability. Global annual spending on observability in 2024 is…
The Need for Frequent Testing in a DevOps Ecosystem Microservices testing is an essential part of any DevOps strategy. In a fast-paced environment like DevOps, you…
Two popular deployment architectures exist in software: the out-of-favor monolithic architecture and the newly popular microservices architecture. Monolithic architectures were quite popular in the past, with…