Skip to content

Generate Span Metrics

Note: Span Metrics is currently in beta for early adopters.

For customers transitioning from Events2Metrics (E2M) to Span Metrics, the default method is maintaining both pipelines. This allows E2M-based metrics to be generated alongside Span Metrics, enabling a fallback if necessary. Costs will be incurred for both pipelines. The user can update the default to a single pipeline at any stage.

What is Span Metrics?

Span Metrics offers an automated method of transforming and aggregating trace data into metrics outside Coralogix using the OpenTelemetry Span Metrics Connector. By sending the metrics to Coralogix, you can utilize our cutting-edge APM features, save on costs, and gain comprehensive insights into your data.

Benefits

Use Span Metrics for any of the following:

  • Cost savings. Sending Span Metrics reduces the volume and frequency of the data sent, helping you cut costs dramatically when compared to sending us 100% of your spans. Compare and contrast our data pipelines here.

  • No additional setup. Span Metrics is particularly valuable when your system lacks traditional metrics but implements distributed tracing. It allows you to obtain metrics from your tracing pipeline without additional setup.

  • Comprehensive insights. Even if your system is already equipped with metrics, leveraging span metrics can offer a deeper level of monitoring. The metrics generated provide insights at the application level, showcasing how tracing information propagates throughout your applications. Offers deeper monitoring even if traditional metrics are already in place.

  • Seamless migration. Easily migrate from the Event2Metrics to Span Metrics data pipeline, while retaining E2M data during a defined retention period.

Setup overview

This section demonstrates how to generate metrics from spans, based on your OpenTelemetry installation.

For each setup, you will required to:

  • Enable Span Metrics

  • Validate your metrics

  • Configure the collector buckets, while maintaining 2 default buckets used for the APM Apdex set at 1ms and 4ms

You may also:

  • Enable tail sampling

  • Configure the Database Catalog

  • Configure different buckets for different services using the spanMetricsMulti preset (Kubernetes extension users only)

Create Span Metrics with the Kubernetes extension

Enable Span Metrics

STEP 1. If you have not yet done so, deploy the Coralogix Kubernetes extension package. Navigate to Data Flow > Extensions > Kubernetes from your Coralogix toolbar.

STEP 2. Manually upgrade the Helm chart used with your Kubernetes integration to its latest version to enable the creation of Span Metrics. Span Metrics is disabled by default and can be enabled by setting the spanmetrics.enabled value to true in the values.yaml file.

    spanMetrics:
      enabled: true
      collectionInterval: "{{.Values.global.collectionInterval}}"
      metricsExpiration: 5m
      histogramBuckets:
        [1ms, 4ms, 10ms, 20ms, 50ms, 100ms, 200ms, 500ms, 1s, 2s, 5s]
      extraDimensions:
        - name: http.method
        - name: cgx.transaction
        - name: cgx.transaction.root

Notes:

  • Enabling the feature will create additional metrics, which can significantly increase depending on how you instrument your applications. This is especially true for cases where the span name includes specific values, such as user IDs or UUIDs. Such instrumentation practice is strongly discouraged.

  • In such cases, we recommend correcting your instrumentation or using the spanMetrics.spanNameReplacePattern parameter, to replace the problematic values with a generic placeholder. For example, if your span name corresponds to template user-1234, you can use the following pattern to replace the user ID with a generic placeholder. This will result in your spans having a generalized name user-{id}. See the following configuration:

spanNameReplacePattern: 
- regex: "user-[0-9]+"
  replacement: "user-{id}"

Validate your metrics

The following metrics are generated by OpenTelemetery and are sent by default to enable Span Metrics. The metrics and their labels should not be removed.

MetricLabels
duration_ms_sumspan_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root
duration_ms_bucketspan_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root, le
calls_totalspan_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root
duration_ms_countspan_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root

When using multiple OpenTelemetry (OTel) collector agents, each performs span metrics aggregation separately. Without a unique label value, Coralogix receives the metrics individually and cannot effectively aggregate them. For example, Kubernetes users who implement a collector on each node may experience metrics from the same service on different nodes overwriting each other. Adding a pod name as a label resolves this issue by providing a unique identifier, which differentiates the metrics and enables accurate querying and aggregation.

Note that cgx_transaction and cgx_transaction_root will only appear for users using Service Flows.

Configure collector buckets

SLO, Apdex latency, and latency percentiles require bucket threshold definitions to calculate thresholds correctly. The default Apdex buckets, set at 1ms and 4ms, should remain unchanged unless you want to adjust all Apdex thresholds for your services. Other buckets may be modified. Note that too many buckets may affect performance.

Configure different collector buckets per application

If you want to use a Span Metrics connector with different buckets per application, you need to use the spanMetricsMulti preset. For example:

  presets:
    spanMetricsMulti:
      enabled: false
      defaultHistogramBuckets: [1ms, 4ms, 10ms, 20ms, 50ms, 100ms, 200ms, 500ms, 1s, 2s, 5s]
      configs:
        - selector: route() where attributes["service.name"] == "one"
          histogramBuckets: [1s, 2s]
        - selector: route() where attributes["service.name"] == "two"
          histogramBuckets: [5s, 10s]

For every selector, you must write an OTTL statement. Find out more here.

We recommend following your setup with tail sampling. Enabling this feature grants you additional APM capabilities while optimizing costs. Tail sampling lets users view traces, service connections, and maps in the Coralogix platform. Find out more here.

The following example demonstrates how to employ tail sampling for trace reduction using the tail sampling processor. Incorporate the otel-integration by installing it with the tail-sampling-values.yaml configuration. For instance:

helm repo add coralogix-charts-virtual <https://cgx.jfrog.io/artifactory/coralogix-charts-virtual>

helm upgrade --install otel-coralogix-integration coralogix-charts-virtual/otel-integration \\
  --render-subchart-notes -f tail-sampling-values.yaml

This adjustment will set up the otel-agent pods to transmit span data to the coralogix-opentelemetry-gateway deployment through the load balancing exporter. Ensure adequate replica configuration and resource allocation to handle the anticipated load. Subsequently, you must configure tail-sampling processor policies according to your specific tail sampling requirements.

When operating in an Openshift environment, ensure the distribution: "openshift" parameter is set in your values.yaml. In Windows environments, utilize the values-windows-tailsampling.yaml configuration file. Find out more here.

Create Span Metrics using your own OpenTelemetry

Enable Span Metrics

Customers using their own OpenTelemetry or Prometheus should add the following to their configuration file:

 connectors:
      spanmetrics:
        histogram:
          explicit:
            buckets: [100us, 1ms, 2ms, 4ms, 6ms, 10ms, 100ms, 250ms]
        dimensions:
          - name: http.method
          - name: cgx.transaction
          - name: cgx.transaction.root
        exemplars:
          enabled: true
        dimensions_cache_size: 1000
        aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"    
        metrics_flush_interval: 15s
        events:
          enabled: true
          dimensions:
            - name: exception.type
            - name: exception.message 

Notes:

  • dimensions may be modified, but should not be removed.

  • Adjust buckets to fit your usage best. Read more below.

Enabling this feature will create additional metrics, which can significantly increase depending on how you instrument your applications. This is especially true for cases where the span name includes specific values, such as user IDs or UUIDs. Such instrumentation practice is strongly discouraged. In such cases, we recommend adding the following code snippet to the configuration file:

  transform/span_name:
    trace_statements:
    - context: span
      statements:
      - replace_pattern(name, "^(.*)$", "$1")

Validate your metrics

The following metrics are generated by OpenTelemetery and are sent by default to enable Span Metrics. The metrics and their labels should not be removed.

MetricLabels
duration_ms_sumspan_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root
duration_ms_bucketspan_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root, le
calls_totalspan_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root
duration_ms_countspan_name, service_name, span_kind, status_code, http_method, cgx_transaction, cgx_transaction_root

Note that cgx_transaction and cgx_transaction_root will only appear for users using Service Flows.

Configure collector buckets

SLO, Apdex latency, and latency percentiles require bucket threshold definitions to calculate thresholds correctly. The default Apdex buckets, set at 1ms and 4ms, should remain unchanged unless you specifically want to adjust all Apdex thresholds for your services. Other buckets may be modified. Note that too many buckets may affect performance.

We recommend following your setup with tail sampling. Enabling this feature grants you additional APM capabilities while optimizing costs. Tail sampling lets users view traces, service connections, and maps in the Coralogix platform. Find out more here.

This section demonstrates how to send traces with errors using tail sampling. We recommend creating multiple tracing pipelines for each type of filtering.

STEP 1. Add the tail_sampling processor definition under processors. In this example, it is named errors, but you can choose any name you prefer.

STEP 2. Include this processor in the pipeline, ensuring it follows state-using processors like k8sattributes.

processors:
  tail_sampling/errors:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 10
    policies:
      - name: only-errors
        type: status_code
        status_code: {status_codes: [ERROR]}
service:        
  pipelines:
    traces/errors:
      exporters:
        - coralogix
      processors:
      # - state based processors like k8sattributes
        - tail_sampling/errors
        - batch 
       receivers:
        - otlp      

This section demonstrates sending spans with db.system to enjoy the Database Catalog.

STEP 1. Define the filter for the DB Catalog under processors.

STEP 2. Add this processor to the pipeline to filter the spans.

processors:
  filter/dbcatalog:
    error_mode: ignore
    traces:
      span:
        - 'attributes["db.system"] == nil'
service:        
  pipelines:        
    traces/dbcatalog:
      exporters:
        - coralogix
      processors:
        - filter/dbcatalog
      receivers:
        - otlp

Full OpenTelemetry setup

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  batch:
  tail_sampling/errors:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 10
    policies:
      - name: only-errors
        type: status_code
        status_code: {status_codes: [ERROR]}
  filter/dbcatalog:
    error_mode: ignore
    traces:
      span:
        - 'attributes["db.system"] == nil'

exporters:
  coralogix:
    domain: # your domain
    private_key: # your private key
    application_name: # your application name
    subsystem_name:  # your subsystem name

connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [ 100us, 1ms, 2ms, 4ms, 6ms, 10ms, 100ms, 250ms ]
    dimensions:
      - name: http.method
      - name: cgx.transaction
      - name: cgx.transaction.root
    exemplars:
      enabled: true
    dimensions_cache_size: 1000
    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
    metrics_flush_interval: 15s
    events:
      enabled: true
      dimensions:
        - name: exception.type
        - name: exception.message

service:
  pipelines:
    traces/errors:
      receivers: [ otlp ]
      processors: [ tail_sampling/errors, batch ]
      exporters: [ coralogix ]
    traces/dbcatalog:
      receivers: [ otlp ]
      processors: [ filter/dbcatalog ]
      exporters: [ coralogix ]
    traces/spanmetrics:
      receivers: [ otlp ]
      exporters: [ spanmetrics ]
    metrics:
      receivers: [ spanmetrics ]
      exporters: [ coralogix ]
  telemetry:
    logs:
      level: debug

Migrating from Events2Metrics to Span Metrics

This section is for existing customers migrating from Events2Metrics to Span Metrics.

Overview

You may select a retention period for your E2M data as part of the migration. During this period, E2M-based metrics will be created alongside Span Metrics as long as you continue to send spans to Coralogix. This allows you to revert to E2M and view metrics generated by E2M rules during the migration. Once the retention period ends, E2M-based metrics will no longer be created.

You can retain your E2M data indefinitely or specify a custom time period for retention (recommended). This flexibility allows you to revert to E2M during the migration as long as you continue sending spans to Coralogix.

After the retention period ends, historical metric data ingested before the period will remain accessible for all users in [Custom Dashboards] or [Grafana]. However, if you migrate back to E2M in the future, SLO and Apdex settings will need to be redefined.

Retention PeriodDescription
IndefiniteView metrics based on Span Metrics in the Service Catalog UI while retaining Event2Metrics (including all metrics, SLO, and Apdex data). Current and historical E2M data is excluded from the UI.

Opting for indefinite data retention entails ongoing charges for both ingested metrics and traces. If a retention period is not defined, it will default to indefinite. | | Custom | View metrics based on Span Metrics in the Service Catalog UI during the custom migration period. Current and historical E2M data is excluded from the UI. |

Migration steps

STEP 1. Contact Customer Support via our in-app chat or by emailing [email protected] and notify them that you want to migrate.

STEP 2. Select a retention period for your E2M data.

STEP 3. Set up Span Metrics with the Coralogix Kubernetes extension or use your own OpenTelemetry or Prometheus.

Notes:

  • SLO and Apdex settings are not automatically migrated when transitioning from E2M to Span Metrics.

  • Define them during the Span Metrics setup. Then, in the Service Catalog UI, create the actual SLO and Apdex per service based on the options you configured in your setup.

Additional resources

DocumentationAPM Onboarding Tutorial

Support

Need help?

Feel free to contact us via our in-app chat or by emailing [email protected].