Instantly Diagnose a Database Outage with Flow Alerts

Chris Cooney
September 6, 2022

Stateful, commonly monolithic, and absolutely fundamental to system design, the quality of your database administration and operation is a key determinant of your overall success. Databases are the cornerstone of modern architecture, requiring constant effort, investigation, and iteration to get the most out of a database. This makes it all the more terrifying when an outage occurs.

When a database crashes, it crashes hard

When a database outage occurs, it typically comes with a few serious risks:

The outage causes some of your features to fail because they can no longer read from or write to their primary data store
The missing database brings down a dozen other applications, cascading the failure through your architecture
The storage is corrupted, or otherwise lost, leading to either partial or, in some cases, total data loss

These additional complications can cloud the initial database outage and make it difficult to understand the chain of events that have led up to the present situation. Databases require constant changes to ensure they are performing, and herein lies our challenge.

How do we constantly change something that carries the highest risk of breaking, and often leads to the most complex outages?

Where alerting hasn’t yet succeeded

The big problem with database changes is that, often, the impact on the database isn’t the first indicator that something is wrong. For example, if we migrate the scheme of a SQL database, the database itself may report that everything is fine, but the downstream applications may report an issue.

Likewise, if a migration fails, an engineer may perform a rollback and assume all is well, and only later on will a downstream application fail. This means that the relationship between cause and effect isn’t clear, and relying only on siloed alerting isn’t enough.

What are siloed alerts?

A siloed alert is an alert that can only focus on one metric attached to one single part of a system, for example, the CPU on a database. These alerts are a necessary part of your observability, but they aren’t the end of it. They are building bricks, but like real bricks, they need to be assembled and joined together to become something more.

So what can be done about complex database outages?

We need a solution that ties the siloed events into a single, continuous alert that can span across multiple data types, such as logs, metrics, traces or security data, and components, such as databases and applications.

This is where Flow Alerts come in. With Flow Alerts, you can link your individual alerts into a single, coherent story that captures the full context of an event through your system, breaking the data silo in your observability platform.

How can Flow alerts help with Database alerting?

Flow alerts allow you to link disparate events in many applications over multiple data sources. For example, the following flow alert detects error logs from a database migration. Then, it looks for any one of the following to happen within 1 hour:

Increase in average error rate
100 Unique cart failures on purchase
More than 400 could not retrieve errors

This means that when the alert triggers, it won’t be a series of disparate alarms, indicating that multiple applications that have nothing directly to do with one another are broken. Instead, there will be a clear message – the database is broken, and here are the downstream impacts.

Break the data silo with Flow Alerts

Flow alerts are a unique feature of the Coralogix platform that allows you to reach across all of your observability data and define alerts around any metric you want. This allows you to accurately describe the conditions of an outage, whether it’s a single event or a complex chain of logically dependent variables.

Alerting Techniques for an Observable Platform

8 min

Alerts

Alerting Techniques for an Observable Platform

By Joanna Wallace
August 2, 2022

Observable and secure platforms use three connected data sets: logs, metrics, and traces. Platforms can link these data to alerting systems to notify system administrators when…

What Is a Log Agent and Why It Matters, Plus Examples

8 min

Logging

What Is a Log Agent and Why It Matters, Plus Examples

By Joanna Wallace
May 17, 2022

This article was last updated on June 28, 2023. If you’ve been investigating log monitoring lately, you’ve probably heard of logging agents like Logstash or Fluent…

Coralogix’s Streama Technology: The Ultimate Party Bouncer

6 min

Product

Coralogix’s Streama Technology: The Ultimate Party Bouncer

By Amir Raz
April 21, 2022

Coralogix is not just another monitoring or observability platform. We’re using our unique Streama technology to analyze data without needing to index it so teams can…