A Log’s Life Cycle in Coralogix

Chris Cooney
May 18, 2023

Illustrative log life cycle at Coralogix

Coralogix is a full-stack observability platform that effortlessly processes logs, metrics, traces, and security data. More specifically, log monitoring in Coralogix are processed in larger volumes than almost any other observability provider out there, making a log’s life cycle unique.

This article will examine the different stages of logs and help you better understand one of the most sophisticated telemetry processing architecture on the market.

Learn More

1. Open-source ingestion

The first stage begins once an application writes to a log file, which a log collector picks up. Coralogix offers a completely open-source integration via OpenTelemetry, as well as a large collection of well-known open source tools. This means that your system won’t be polluted with proprietary code—and avoid running the risk of vendor lock-in when integrating your platform.

2. Parsing and extraction

Logs are captured and converted into JSON documents using regex. Alternatively, logs can simply be extracted, taking only a subset of the log, instead of converting the entire log. There are various options for optimized log ingestion, so none of your raw log data is left behind. Read more about log parsing rules.

3. Enrichment

Often, our data in our logs isn’t useless, rather incomplete. IP addresses have some excellent use cases, but when coupled with Geo enrichment, the logs take on a whole new meaning.

Pair Geo enrichment with Security enrichment, which instantly tells you if an IP address is suspicious, and our AWS enrichment which populates your log with the metadata about a chosen AWS resource. Therefore, your logs will be transformed into self-contained documents that have everything they need to tell an engineer the state of their system at the moment the log is written. Learn more about data enrichment with Coralogix.

4. The TCO Optimizer

The TCO Optimizer is a unique Coralogix feature that allows customers to select their use case for their logs. It also lets users choose between the following:

High Priority: For data that they will need to access constantly.
Medium Priority: For data that will drive their dashboards and metrics.
Low Priority: For data they need to retain but will only query sometimes.

The TCO Optimizer alone regularly saves customers between 40% and 70% of the observability costs, taking an entirely different approach to how logs are classified and stored.

5. Events2Metrics

High Priority and Medium Priority

Events2Metrics allows Coralogix customers to ingest logs and automatically parse metric values out of them. For example, if a given log contains the page load time, that time can be extracted and converted into a Prometheus metric.

The metric can then be queried and processed like any other, allowing essential data in a log to be held, very cheaply and for much longer, while the original log is removed. Learn about how logs are converted into metrics here.

6. In-Stream alerting

High Priority and Medium Priority

Logs are processed against the existing alerts that have been defined in the customer account. This happens before optional indexing and storage, which means Coralogix alerts trigger much faster than our index-based competition.

7. Loggregation – Machine Learning Log Clustering

High Priority and Medium Priority

Coralogix supports the most powerful log clustering algorithm in the market. Millions of logs are converted into a handful of templates, which rapidly enables engineers to detect the noisiest errors, understand the normal flow of certain logs and explore the different values for every key in a log document. Read more about how logs are queried.

8. Archiving and long-term storage

High Priority, Medium Priority and Low Priority

Any logs, whether they’re classified as High Priority, Medium Priority, or Low Priority can eventually be stored. The beauty of the Low Priority use case is that if you have logs that you only need to hold for Low Priority reasons, there is no need to pay for indexing.

In addition, Coralogix is the only provider that will allow you to archive enriched logs (remember enrichment above!) without once being indexed and stored in high-performance data.

Archived telemetry data is stored in our customer’s own cloud account, where they pay absolutely no fee to Coralogix. The only cost is the cost of the customer’s own cloud storage, which is a fraction of what any SaaS observability provider will charge for the same data.

Query from remote, the secret sauce of the Coralogix archive

Coralogix distinguishes itself in archiving above every other SaaS provider by offering a unique capability. When working with any other SaaS observability provider, users need to reindex their logs from their archive and back into their high-performance storage if they wish to explore their archived data.

This represents a triple cost. The cost of initial ingestion, subsequent storage, and, finally, reingestion. The extra step of log reindexing undermines much of the cost optimization of an archive because rather than avoiding cost entirely, users are merely delaying it.

How does Remote Query differ?

Coralogix Remote Query requires no reindexing. Users can issue queries using Lucene, SQL, or our own DataPrime query language. Moreover, Coralogix does not charge per query. The only costs a Coralogix customer ever pays are for ingestion and storage, which is quite unique. While Coralogix supports reindexing, it is the only platform where it is not necessary in order to query archived data.

No other provider can offer this capability, so Coralogix customers can hold less in High Priority, driving further cost reductions. Read about our unique and revolutionary remote archive.

9. Indexing and storage

High Priority

If data is part of the High Priority use case, then it is indexed and stored in a highly optimized OpenSearch cluster that responds to queries capturing millions of log documents in under a second. This happens at the end of the log analysis process, but logs are still indexed in seconds, meaning that logs are available with very little ingestion lag.

Note, all previous processing happened completely independently of this, which means Coralogix is the only platform that provides in-stream analytics, without depending on indexing and storage.

Retention policy and deletion

This is the end of the line for our log. In the Coralogix platform, data can be tagged with values from a Retention Policy. This means that all of the data in the archive will be automatically labeled with a value indicating for how long it should be retained.

Customers can then set deletion policies in their S3 bucket, which decides at which point their data should be deleted.

Coralogix life cycle policies represent the last link in a long chain of cutting-edge technologies that ingest raw, unstructured data, convert it into actionable insights, and store it in the most efficient and cost-effective solution on the market.

Unparalleled log management, and that’s just the start

Coralogix doesn’t just stop at logs. As a full-stack observability platform, Coralogix can easily ingest logs, traces, metrics, and security data, with hundreds of integrations to open source and SaaS tools. Here, we have taken a deep dive into the logging capabilities in the Coralogix platform, but much more awaits you. Sign up for a free trial now.

A Log’s Life Cycle in Coralogix

1. Open-source ingestion

2. Parsing and extraction

3. Enrichment

4. The TCO Optimizer

5. Events2Metrics

6. In-Stream alerting

7. Loggregation – Machine Learning Log Clustering

8. Archiving and long-term storage

Query from remote, the secret sauce of the Coralogix archive

How does Remote Query differ?

9. Indexing and storage

Retention policy and deletion

Unparalleled log management, and that’s just the start

Related Articles

3 Key Benefits to Web Log Analysis

7 JSON Logging Tips That You Can Implement

10 Ways to Take Your Error Logs Up a Level

Observability and Securitythat Scale with You.

Observability and Security
that Scale with You.