Generate Span Metrics
Note: Span Metrics is currently in beta for early adopters.
For customers transitioning from Events2Metrics (E2M) to Span Metrics, the default method is maintaining both pipelines. This allows E2M-based metrics to be generated alongside Span Metrics, enabling a fallback if necessary. Costs will be incurred for both pipelines. The user can update the default to a single pipeline at any stage.
What is Span Metrics?
Span Metrics offers an automated method of transforming and aggregating trace data into metrics outside Coralogix using the OpenTelemetry Span Metrics Connector. By sending the metrics to Coralogix, you can utilize our cutting-edge APM features, save on costs, and gain comprehensive insights into your data.
Benefits
Use Span Metrics for any of the following:
Cost savings. Sending Span Metrics reduces the volume and frequency of the data sent, helping you cut costs dramatically when compared to sending us 100% of your spans. Compare and contrast our data pipelines here.
No additional setup. Span Metrics is particularly valuable when your system lacks traditional metrics but implements distributed tracing. It allows you to obtain metrics from your tracing pipeline without additional setup.
Comprehensive insights. Even if your system is already equipped with metrics, leveraging span metrics can offer a deeper level of monitoring. The metrics generated provide insights at the application level, showcasing how tracing information propagates throughout your applications. Offers deeper monitoring even if traditional metrics are already in place.
Seamless migration. Easily migrate from the Event2Metrics to Span Metrics data pipeline, while retaining E2M data during a defined retention period.
Setup overview
This section demonstrates how to generate metrics from spans, based on your OpenTelemetry installation.
For each setup, you will required to:
Enable Span Metrics
Validate your metrics
Configure the collector buckets, while maintaining 2 default buckets used for the APM Apdex set at 1ms and 4ms
You may also:
Enable tail sampling
Configure the Database Catalog
Configure different buckets for different services using the
spanMetricsMulti
preset (Kubernetes extension users only)
Create Span Metrics with the Kubernetes extension
Enable Span Metrics
STEP 1. If you have not yet done so, deploy the Coralogix Kubernetes extension package. Navigate to Data Flow > Extensions > Kubernetes from your Coralogix toolbar.
STEP 2. Manually upgrade the Helm chart used with your Kubernetes integration to its latest version to enable the creation of Span Metrics. Span Metrics is disabled by default and can be enabled by setting the spanmetrics.enabled
value to true
in the values.yaml file.
spanMetrics:
enabled: true
collectionInterval: "{{.Values.global.collectionInterval}}"
metricsExpiration: 5m
histogramBuckets:
[1ms, 4ms, 10ms, 20ms, 50ms, 100ms, 200ms, 500ms, 1s, 2s, 5s]
extraDimensions:
- name: http.method
- name: cgx.transaction
- name: cgx.transaction.root
Notes:
Enabling the feature will create additional metrics, which can significantly increase depending on how you instrument your applications. This is especially true for cases where the span name includes specific values, such as user IDs or UUIDs. Such instrumentation practice is strongly discouraged.
In such cases, we recommend correcting your instrumentation or using the
spanMetrics.spanNameReplacePattern
parameter, to replace the problematic values with a generic placeholder. For example, if your span name corresponds to templateuser-1234
, you can use the following pattern to replace the user ID with a generic placeholder. This will result in your spans having a generalized nameuser-{id}
. See the following configuration:
Validate your metrics
The following metrics are generated by OpenTelemetery and are sent by default to enable Span Metrics. The metrics and their labels should not be removed.
Metric | Labels |
---|---|
duration_ms_sum | span_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root |
duration_ms_bucket | span_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root, le |
calls_total | span_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root |
duration_ms_count | span_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root |
When using multiple OpenTelemetry (OTel) collector agents, each performs span metrics aggregation separately. Without a unique label value, Coralogix receives the metrics individually and cannot effectively aggregate them. For example, Kubernetes users who implement a collector on each node may experience metrics from the same service on different nodes overwriting each other. Adding a pod name as a label resolves this issue by providing a unique identifier, which differentiates the metrics and enables accurate querying and aggregation.
Note that cgx_transaction
and cgx_transaction_root
will only appear for users using Service Flows.
Configure collector buckets
SLO, Apdex latency, and latency percentiles require bucket threshold definitions to calculate thresholds correctly. The default Apdex buckets, set at 1ms and 4ms, should remain unchanged unless you want to adjust all Apdex thresholds for your services. Other buckets may be modified. Note that too many buckets may affect performance.
Configure different collector buckets per application
If you want to use a Span Metrics connector with different buckets per application, you need to use the spanMetricsMulti
preset. For example:
presets:
spanMetricsMulti:
enabled: false
defaultHistogramBuckets: [1ms, 4ms, 10ms, 20ms, 50ms, 100ms, 200ms, 500ms, 1s, 2s, 5s]
configs:
- selector: route() where attributes["service.name"] == "one"
histogramBuckets: [1s, 2s]
- selector: route() where attributes["service.name"] == "two"
histogramBuckets: [5s, 10s]
For every selector, you must write an OTTL statement. Find out more here.
Enable tail sampling (recommended)
We recommend following your setup with tail sampling. Enabling this feature grants you additional APM capabilities while optimizing costs. Tail sampling lets users view traces, service connections, and maps in the Coralogix platform. Find out more here.
The following example demonstrates how to employ tail sampling for trace reduction using the tail sampling processor. Incorporate the otel-integration
by installing it with the tail-sampling-values.yaml
configuration. For instance:
helm repo add coralogix-charts-virtual <https://cgx.jfrog.io/artifactory/coralogix-charts-virtual>
helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \\
--render-subchart-notes -f tail-sampling-values.yaml
This adjustment will set up the otel-agent
pods to transmit span data to the coralogix-opentelemetry-gateway
deployment through the load balancing exporter. Ensure adequate replica configuration and resource allocation to handle the anticipated load. Subsequently, you must configure tail-sampling processor policies according to your specific tail sampling requirements.
When operating in an Openshift environment, ensure the distribution: "openshift"
parameter is set in your values.yaml
. In Windows environments, utilize the values-windows-tailsampling.yaml
configuration file. Find out more here.
Create Span Metrics using your own OpenTelemetry
Enable Span Metrics
Customers using their own OpenTelemetry or Prometheus should add the following to their configuration file:
connectors:
spanmetrics:
histogram:
explicit:
buckets: [100us, 1ms, 2ms, 4ms, 6ms, 10ms, 100ms, 250ms]
dimensions:
- name: http.method
- name: cgx.transaction
- name: cgx.transaction.root
exemplars:
enabled: true
dimensions_cache_size: 1000
aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
metrics_flush_interval: 15s
events:
enabled: true
dimensions:
- name: exception.type
- name: exception.message
Notes:
dimensions
may be modified, but should not be removed.Adjust
buckets
to fit your usage best. Read more below.
Enabling this feature will create additional metrics, which can significantly increase depending on how you instrument your applications. This is especially true for cases where the span name includes specific values, such as user IDs or UUIDs. Such instrumentation practice is strongly discouraged. In such cases, we recommend adding the following code snippet to the configuration file:
transform/span_name:
trace_statements:
- context: span
statements:
- replace_pattern(name, "^(.*)$", "$1")
Validate your metrics
The following metrics are generated by OpenTelemetery and are sent by default to enable Span Metrics. The metrics and their labels should not be removed.
Metric | Labels |
---|---|
duration_ms_sum | span_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root |
duration_ms_bucket | span_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root, le |
calls_total | span_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root |
duration_ms_count | span_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root |
Note that cgx_transaction
and cgx_transaction_root
will only appear for users using Service Flows.
Configure collector buckets
SLO, Apdex latency, and latency percentiles require bucket threshold definitions to calculate thresholds correctly. The default Apdex buckets, set at 1ms and 4ms, should remain unchanged unless you specifically want to adjust all Apdex thresholds for your services. Other buckets may be modified. Note that too many buckets may affect performance.
Enable tail sampling (recommended)
We recommend following your setup with tail sampling. Enabling this feature grants you additional APM capabilities while optimizing costs. Tail sampling lets users view traces, service connections, and maps in the Coralogix platform. Find out more here.
This section demonstrates how to send traces with errors using tail sampling. We recommend creating multiple tracing pipelines for each type of filtering.
STEP 1. Add the tail_sampling
processor definition under processors
. In this example, it is named errors
, but you can choose any name you prefer.
STEP 2. Include this processor in the pipeline, ensuring it follows state-using processors like k8sattributes
.
processors:
tail_sampling/errors:
decision_wait: 10s
num_traces: 100
expected_new_traces_per_sec: 10
policies:
- name: only-errors
type: status_code
status_code: {status_codes: [ERROR]}
service:
pipelines:
traces/errors:
exporters:
- coralogix
processors:
# - state based processors like k8sattributes
- tail_sampling/errors
- batch
receivers:
- otlp
Configure the Database Catalog (recommended)
This section demonstrates sending spans with db.system
to enjoy the Database Catalog.
STEP 1. Define the filter for the DB Catalog under processors
.
STEP 2. Add this processor to the pipeline to filter the spans.
processors:
filter/dbcatalog:
error_mode: ignore
traces:
span:
- 'attributes["db.system"] == nil'
service:
pipelines:
traces/dbcatalog:
exporters:
- coralogix
processors:
- filter/dbcatalog
receivers:
- otlp
Full OpenTelemetry setup
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
tail_sampling/errors:
decision_wait: 10s
num_traces: 100
expected_new_traces_per_sec: 10
policies:
- name: only-errors
type: status_code
status_code: {status_codes: [ERROR]}
filter/dbcatalog:
error_mode: ignore
traces:
span:
- 'attributes["db.system"] == nil'
exporters:
coralogix:
domain: # your domain
private_key: # your private key
application_name: # your application name
subsystem_name: # your subsystem name
connectors:
spanmetrics:
histogram:
explicit:
buckets: [ 100us, 1ms, 2ms, 4ms, 6ms, 10ms, 100ms, 250ms ]
dimensions:
- name: http.method
- name: cgx.transaction
- name: cgx.transaction.root
exemplars:
enabled: true
dimensions_cache_size: 1000
aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
metrics_flush_interval: 15s
events:
enabled: true
dimensions:
- name: exception.type
- name: exception.message
service:
pipelines:
traces/errors:
receivers: [ otlp ]
processors: [ tail_sampling/errors, batch ]
exporters: [ coralogix ]
traces/dbcatalog:
receivers: [ otlp ]
processors: [ filter/dbcatalog ]
exporters: [ coralogix ]
traces/spanmetrics:
receivers: [ otlp ]
exporters: [ spanmetrics ]
metrics:
receivers: [ spanmetrics ]
exporters: [ coralogix ]
telemetry:
logs:
level: debug
Migrating from Events2Metrics to Span Metrics
This section is for existing customers migrating from Events2Metrics to Span Metrics.
Overview
You may select a retention period for your E2M data as part of the migration. During this period, E2M-based metrics will be created alongside Span Metrics as long as you continue to send spans to Coralogix. This allows you to revert to E2M and view metrics generated by E2M rules during the migration. Once the retention period ends, E2M-based metrics will no longer be created.
You can retain your E2M data indefinitely or specify a custom time period for retention (recommended). This flexibility allows you to revert to E2M during the migration as long as you continue sending spans to Coralogix.
After the retention period ends, historical metric data ingested before the period will remain accessible for all users in [Custom Dashboards] or [Grafana]. However, if you migrate back to E2M in the future, SLO and Apdex settings will need to be redefined.
Retention Period | Description |
---|---|
Indefinite | View metrics based on Span Metrics in the Service Catalog UI while retaining Event2Metrics (including all metrics, SLO, and Apdex data). Current and historical E2M data is excluded from the UI. |
Opting for indefinite data retention entails ongoing charges for both ingested metrics and traces. If a retention period is not defined, it will default to indefinite. | | Custom | View metrics based on Span Metrics in the Service Catalog UI during the custom migration period. Current and historical E2M data is excluded from the UI. |
Migration steps
STEP 1. Contact Customer Support via our in-app chat or by emailing [email protected] and notify them that you want to migrate.
STEP 2. Select a retention period for your E2M data.
STEP 3. Set up Span Metrics with the Coralogix Kubernetes extension or use your own OpenTelemetry or Prometheus.
Notes:
SLO and Apdex settings are not automatically migrated when transitioning from E2M to Span Metrics.
Define them during the Span Metrics setup. Then, in the Service Catalog UI, create the actual SLO and Apdex per service based on the options you configured in your setup.
Additional resources
Documentation | APM Onboarding Tutorial |
Support
Need help?
Feel free to contact us via our in-app chat or by emailing [email protected].