AWS Centralized Logging Guide
The key challenge with modern visibility on clouds like AWS is that data originates from various sources across every layer of the application stack, is varied…
Amazon ELB (Elastic Load Balancing) allows you to make your applications highly available by using health checks and intelligently distributing traffic across a number of instances. It distributes incoming application traffic across multiple targets, such as Amazon EC2 instances, containers, IP addresses, and Lambda functions. You might have heard the terms, CLB, ALB, and NLB. All of them are types of load balancers under the ELB umbrella.
This article will focus on ELB logs, you can get more in-depth information about ELB itself in this post
Elastic Load Balancing provides access logs that capture detailed information about requests sent to your load balancer. Each ELB log contains information such as the time the request was received, the client’s IP address, latencies, request paths, and server responses.
Because of the evolution of ELB, documentation can be a bit confusing. Not surprisingly, there are three variations of the AWS ALB access logs; ALB, NLB, and CLB. We need to rely on the document header to understand which of the variant logs it describes (the URL and body will usually reference ELB generically).
The ELB access logging monitoring capability is integrated with Coralogix. The logs can be easily collected and sent straight to the Coralogix log management solution.
This is an example of a parsed ALB HTTP entry log:
{
“type”:“http”,
“timestamp:“2018-07-02T22:23:00.186641Z”,
“elb”:“app/my-loadbalancer/50dc6c495c0c9188”,
“client_addr”:“192.168.131.39”,
“client_port”:“2817”,
“target_addr”:“110.8.13.9”,
“target_port”:“80”,
“request_processing_time”:“0.000”,
“target_processing_time”:“0.001”,
“response_processing_time”:“0.000”,
“elb_status_code”:“200”,
“target_status_code”:“200”,
“received_bytes”:“34”,
“sent_bytes”:“366”,
“request”:“GET http://www.example.com:80/ HTTP/1.1”,
“user_agent”:“curl/7.46.0”,
“Ssl_cipher”:“-”,
“ssl_protocol”:“-”,
“target_group_arn”:“arn:aws:elasticloadbalancing:us-east-2:123456789012:targetgroup/my-targets/73e2d6bc24d8a067”,
“trace_id”:“Root=1-58337262-36d228ad5d99923122bbe354”,
“domain_name”:“type”:”http”-”,
“chosen_cert_arn”:“-”,
“matched_rule_priority”:”0”,
“request_creation_time”:”2018-07-02T22:22:48.364000Z”,
“Actions_executed“:“forward”,
“redirect_url“:“-”,
“error_reason“:“-”,
“target_port_list“:”80”,
“target_status_code_list“:“200”
}
Note that if you compare this log to the AWS log syntax table, we split the client address and port and target address and port into four different fields to make it easier.
This is an example of a parsed HTTPS CLB log:
{ “timestamp":”2018-07-02T22:23:00.186641Z”, “elb”:”app/my-loadbalancer/50dc6c495c0c9188”, “client_addr”:”192.168.131.39”, “client_port”:”2817”, “target_addr”:”10.0.0.1”, “target_port”:”80”, “request_processing_time”:”0.001”, “backend_processing_time”:”0.021”, “response_processing_time”:”0.003”, “elb_status_code”:”200”, “backend_status_code”:”200”, “received_bytes”:”0”, “sent_bytes”:”366”, “request":”http”GET http://www.example.com:80/ HTTP/1.1”, “user_agent":”curl/7.46.0”, “Ssl_cipher":”DHE-RSA_AES128-SHA”, “ssl_protocol”:”TLSv1.2”, }
CLB logs have a subset of the ALB fields. The target is changed to the listener and the relevant target field names are changed to the backend.
This is an example of an NLB log:
{ “type”:”tls”, “version”:”1.0”, “timestamp":”2018-07-02T22:23:00.186641Z”, “elb”:”net/my-network-loadbalancer/c6e77e28c25b2234”, “listener”:”g3d4b5e8bb8464cd” “client_addr”:”192.168.131.39”, “client_port”:”51341”, “target_addr:”10.0.0.1”, “target_port”:”443”, “connection_time”:”5”, “tls_handshake__time”:”2”, “received_bytes”:”29”, “sent_bytes”:”366”, “Incoming_tls_alert”:”-” “chosen_cert_arn”:”arn:aws:elasticloadbalancing:us-east-2:123456789012:certificate/2a108f19-aded-46b0-8493-c63eb1ef4a99”, “chosen_cert_serial”:”-” “tls_cipher":”ECDHE-RSA_AES128-SHA”, “tls_protocol_versiion”:”TLSv12”, “tls_named_group”:”-”, “domain_name”:”my-network-loadbalancer-c6e77e28c25b2234.elb.us-east-2.amazonaws.com”, }
ELB logs contain unstructured data. Using Coralogix parsing rules, you can easily transform the unstructured ELB logs into JSON format to get the full power of Coralogix and the Elastic stack working for you. Parsing rules use RegEx and I created the expressions for NLB, ALB-1, ALB-2, and CLB logs. The two ALB regexes cover “normal” ALB logs and the special cases of WAF, Lambda, or failed or partially fulfilled requests. In this cases AWS assigns the value ‘-‘ to the target_addr field with no port. You will see some time measurements assigned the value -1. Make sure you take it into account in your visualizations filters. Otherwise averages and other aggregations could be skewed. Amazon may add fields and change the log structure from time to time, so always check these against your own logs and make changes if needed. The examples should provide a solid foundation.
The following section requires some familiarity with regular expressions but just skip directly to the examples if you prefer.
A bit about why the regexes were created the way they were. Naturally, we always want to use a regex that is simple and efficient. At the same time, we should make sure that each rule captures the correct logs in the correct way (correct values matched with the correct fields). Think about a regex that starts with the following expression:
^(?P<timestamp>[^\s]+)\s*
It will work as long as the first field in the unstructured log is a timestamp, like in the case of CLB logs. However, in the case of NLB and ALB logs, the expression will capture the “type” field instead. Since the regex and rule have no context, it will just place the wrong value in the wrong JSON key. There are other differences that can cause problems like different numbers of fields or field order. To avoid this problem, we use the fact that NLB logs always start with ‘tls 1.0’ standing for the fields ’type’ and ‘version’, and that ALB logs start with a ‘type’ field with 6 optional values (http, https, h2, ws, wss).
Note: As explained in the Coralogix rule tutorial, rules are organized by groups and executed by the order they appear within the group. When a rule matches a log, the log will move to the next group without processing the remaining rules within the same group.
Taking all this into account, we should:
This approach will guarantee that each rule matches with the correct log. Now we are ready for the main part of this post.
In the following examples, we’ll describe how different ELB log fields can be used to indicate operational status. In the examples, we assume that the logs were parsed into JSON format. The examples will rely on the Coralogix alerts engine and on Kibana visualizations. They also provide additional insights into the different keys and values within the logs. Like always, we give you ideas and guidance on how to get more value out of your logs. However, every business environment is different and you are encouraged to take these ideas and build on top of them based on the best implementation for your infrastructure and goals. Last but not least, Elastic Load Balancing logs requests on a best-effort basis. The logs should be used to understand the nature of the requests, not as a complete accounting of all requests. In some cases, we will use ‘notify immediately’ alerts, but you should use ELB as a backup and not as the main vehicle for these types of alerts.
Tip: To learn the details of how to create Coralogix alerts you can read this guide.
This alert identifies if a specific ELB generates 403 errors more than usual. A 403 error results from a request that is blocked by AWS WAF, Web Application Firewall. The alert uses the ‘more than usual’ option. With this option, Coraloix’s ML algorithms will identify normal behavior for every time period. It will trigger an alert if the number of errors is more than normal and is above the optional threshold supplied by the user.
Alert Filter:
elb:”app/my-loadbalancer/50dc6c495c0c9188” AND elb_status_code:”403”
Alert Condition: ‘More than usual’. The field elb_status_code can be found across ALB, CLB logs.
In this example, we use the client field. It contains the IP of the requesting client. The alert will trigger if a request is coming from a restricted address. For the purpose of this example, we assume that permitted addresses are all under the subnet 172.xxx.xxx.xxx.
Alert Filter:
client_addr:/172\.[0-9]{1,3},[0-9]{1,3},[0-9]{1,3}/
Note: Client_addr is found across NLB, ALB, and CLB.
Alert Condition:‘Notify immediately’.
This alert identifies an inactive ELB. It uses the ‘less than’ alert condition. The threshold is set to no logs in 5 minutes. This should be adapted to your specific environment.
Alert Filter:
elb:”app/my-loadbalancer/50dc6c495c0c9188”
Alert Condition: ‘less than 1 in 5 minutes’’
This alert works across NLB, ALB, and CLB.
Knowing the type of transactions running on a specific ELB, ops would like to be alerted if connection times are unusually long. Here again, the Coralogix ‘more than usual” alert option will be very handy.
Alert Filter:
connection_time:[2 TO *]
Note: Connectiion_time is specific to NLB logs. You can create similar alerts on any of the time-related fields in any of the logs.
Alert Condition: ‘more than usual’
The field ‘matched_rule_priority’ indicates the priority value of the rule that matched the request. The value 0 indicates that no rule was applied and the load balancer resorted to the default. Applying rules to requests is specifically important in highly regulated or secured environments. For such environments, it will be important to identify rule patterns and abnormal behavior. Coralogix has powerful ML algorithms focused on identifying deviation from a normal flow of logs. This alert will notify users if the number of requests not matched with a rule is more than the usual number.
Alert Filter:
matched_rule_priority:0
Note: This is an ALB field.
Alert Condition: ‘more than usual’
In this example, we assume a regulated environment. One of the requirements is that for every ELB request the load balancer should validate the session, authenticate the user, and add the user information to the request header, as specified by the rule configuration. This sequence of actions will be indicated by having the value ‘authenticate’ in the ‘actions_executed’ field. The field can include a few actions separated by ‘,’. Though ELB doesn’t guarantee that every request will be recorded, it is important to be notified of the existence of such a problem, so we will use the ‘notify immediately’ condition.
Alert Filter:
NOT actions_executed:authenticate
Note: This is an ALB field.
Alert Condition: ‘notify immediately’
Using the ‘type’ field this visualization shows the distribution of the different requests and connection types.
Summation of the number of bytes sent.
Request processing time is the total time elapsed from the time the load balancer received the request until the time it sent it to a target. Response processing time is the total time elapsed from the time the load balancer received the response header from the target until it started to send the response to the client. In this visualization, we are using Timelion to track the average over time and generate a trend line.
Timelion expression:
.es(q=*,metric=avg:destination.request_processing_time.numeric).label("Avg request processing time").lines(width=3).color(green), .es(q=*,metric=avg:destination.request_processing_time.numeric).trend().lines(width=3).color(green).label("Avg request processing time trend"),.es(q=*,metric=avg:destination.response_processing_time.numeric).label("Avg response processing Time").lines(width=3).color(red), .es(q=*,metric=avg:destination.response_processing_time.numeric).trend().lines(width=3).color(red).label("Avg response processing time trend")
In this visualization, we show the average response processing time by ELB. We used the horizontal option. See the definition screens.
This table lists the top IP addresses generating requests to specific ELB’s. The ELB’s are separated by the metadata applicationName. This metadata field is assigned to the load balancer when you configure the integration. We created a Kibana filter that looks only at these two devices. You can read about filtering and querying in our tutorial.
This is an example showing the status code distribution for the last 24 hours.
You can also create a more dynamic representation showing how the distribution behaves over time.
This blog post covered the different types of services that AWS provides under the ELB umbrella, NLB, ALB, and CLB. We focused on the logs these services generate and their structure, and showed some examples of alerts and visualizations that can help you unlock the value of these logs. Remember that every user is unique and has its own use case and data. Your logs might be customized and configured differently and you will most likely have your own requirements. So, you are encouraged to take the methods and concepts showed here and adapt them to your own needs. If you need help or have any questions, don’t hesitate and reach out to [email protected]. You can learn more about unlocking the value embedded in AWS ALB logs and other logs in some of our other blog posts.