Customer was looking through our ELB Access logs and they noticed that they are not getting parsed correctly. Some logs come through and appear to be missing important data - almost as if they are not being parsed correctly.
Currently a good 99% of their data from ELB access logs are incorrectly formatted when ingested into Splunk.
Update
We now have two new Splunk Lambda blueprints for ELB access logs that you can use directly from AWS Lambda console instead of creating & deploying your own custom function per step 2 below. There's one blueprint for each of Classic & Application Load Balancers. Use them to automatically retrieve the access logs, unarchive when applicable, extract timestamp and forward to HEC as described below.
Also, skip step 1 from original answer below as Splunk Add-on for AWS (4.3+) now includes the necessary fields extractions for Application Load Balancer access logs, in addition to Classic Load Balancer access logs, by updating the sourcetype aws:elb:accesslogs. The Splunk Lambda blueprints set events to that sourcetype automatically. Make sure you have the Add-on for AWS installed in Splunk Enterprise.
======
You can pull ALB access logs via AWS Add-on (as of 4.3).
Alternatively, you can push these logs using Lambda to have AWS stream logs to Splunk HTTP Event Collector (HEC).
1) [No longer required as of AWS Add-on 4.3 - just use aws:elb:accesslogs as noted above]
Add new sourcetype for ALB access logs, say aws:alb:accesslogs. It's very similar to the classic elb equivalent sourcetype but it has additional field extractions. Assign this sourcetype to the HEC token you create to receive these logs.
[aws:alb:accesslogs]
EXTRACT-alb = ^\s*(?P<type>[^\s]+)\s+(?P<timestamp>[^\s]+)\s+(?P<elb>[^\s]+)\s+(?P<client_ip>[0-9.]+):(?P<client_port>\d+)\s+(?P<target>[^\s]+)\s+(?P<request_processing_time>[^\s]+)\s+(?P<target_processing_time>[^\s]+)\s+(?P<response_processing_time>[^\s]+)\s+(?P<elb_status_code>\d+)\s+(?P<target_status_code>\d+)\s+(?P<received_bytes>\d+)\s+(?P<sent_bytes>\d+)\s+"(?P<request>.+)"\s+"(?P<user_agent>.+)"\s+(?P<ssl_cipher>[-\w]+)\s*(?P<ssl_protocol>[-\w\.]+)\s+(?P<target_group_arn>[^\s]+)\s+"(?P<trace_id>.+)"
EVAL-rtt = request_processing_time + target_processing_time + response_processing_time
2) Create a Lambda function with S3 'CreateObject' as trigger: it will get triggered as soon as access logs are written to S3 bucket. The code itself is fairly simple: it will retrieve the log.gz file, uncompress it, and forward the log entries to Splunk HEC. I have a working Lambda deployment package that you can use but it seems I cannot attach it here. Reach out to me at rarsan @ splunk.com if interested.
For what it's worth. The current version (4.6.0) also does not parse the ALB Access logs correctly. It breaks on too greedy fields ( user_agent/request) and is missing the newest fields (https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html)
For now this seems to work (contacted Splunk to validate and include):
[source::...(/|\)\d+elasticloadbalancing.log.gz]
EXTRACT-elb = ^\s(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[0-9.]+):(?P\d+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[\d-]+)\s+(?P[\d-]+)\s+(?P\d+)\s+(?P\d+)\s+"(?P[^"]+)"\s+"(?P[^"]+)"\s+(?P[-\w]+)\s*(?P[-\w.]+)\s+(?P[^\s]+)\s+"(?P[^\s]+)"\s+"(?P[^\s]+)"\s+"(?P[^\s]+)"\s+(?P[^\s]+)\s+(?P[^\s]+)\s+"(?P[^\s]+)"\s+"(?P[^\s]+)"\s+"(?P[^\s]+)"\s+"(?P[^\s]+)"\s+"(?P[^\s+]+)"
EVAL-rtt = request_processing_time + target_processing_time + response_processing_time
Update
We now have two new Splunk Lambda blueprints for ELB access logs that you can use directly from AWS Lambda console instead of creating & deploying your own custom function per step 2 below. There's one blueprint for each of Classic & Application Load Balancers. Use them to automatically retrieve the access logs, unarchive when applicable, extract timestamp and forward to HEC as described below.
Also, skip step 1 from original answer below as Splunk Add-on for AWS (4.3+) now includes the necessary fields extractions for Application Load Balancer access logs, in addition to Classic Load Balancer access logs, by updating the sourcetype aws:elb:accesslogs. The Splunk Lambda blueprints set events to that sourcetype automatically. Make sure you have the Add-on for AWS installed in Splunk Enterprise.
======
You can pull ALB access logs via AWS Add-on (as of 4.3).
Alternatively, you can push these logs using Lambda to have AWS stream logs to Splunk HTTP Event Collector (HEC).
1) [No longer required as of AWS Add-on 4.3 - just use aws:elb:accesslogs as noted above]
Add new sourcetype for ALB access logs, say aws:alb:accesslogs. It's very similar to the classic elb equivalent sourcetype but it has additional field extractions. Assign this sourcetype to the HEC token you create to receive these logs.
[aws:alb:accesslogs]
EXTRACT-alb = ^\s*(?P<type>[^\s]+)\s+(?P<timestamp>[^\s]+)\s+(?P<elb>[^\s]+)\s+(?P<client_ip>[0-9.]+):(?P<client_port>\d+)\s+(?P<target>[^\s]+)\s+(?P<request_processing_time>[^\s]+)\s+(?P<target_processing_time>[^\s]+)\s+(?P<response_processing_time>[^\s]+)\s+(?P<elb_status_code>\d+)\s+(?P<target_status_code>\d+)\s+(?P<received_bytes>\d+)\s+(?P<sent_bytes>\d+)\s+"(?P<request>.+)"\s+"(?P<user_agent>.+)"\s+(?P<ssl_cipher>[-\w]+)\s*(?P<ssl_protocol>[-\w\.]+)\s+(?P<target_group_arn>[^\s]+)\s+"(?P<trace_id>.+)"
EVAL-rtt = request_processing_time + target_processing_time + response_processing_time
2) Create a Lambda function with S3 'CreateObject' as trigger: it will get triggered as soon as access logs are written to S3 bucket. The code itself is fairly simple: it will retrieve the log.gz file, uncompress it, and forward the log entries to Splunk HEC. I have a working Lambda deployment package that you can use but it seems I cannot attach it here. Reach out to me at rarsan @ splunk.com if interested.
Hi I know this is quite old thread. However, Now I'm suing Add-on for Amazon Web Services version 5.0.0. I have ingested ELB logs as described in https://docs.splunk.com/Documentation/AddOns/released/AWS/IncrementalS3.
Now I could see the logs are being ingested. However, those events still no parsing. still I could see only the raw logs. I have added the props.conf as you shows above, still the issue is same.
Am I missing something else?
@rarsan_splunk I am curious if you are sending the ALB log unzipped to the HEC as raw and letting the events in the file get indexed that way, or using the lambda to send each line as a log event to the HEC.
Yes, Lambda function would take care of uncompressing the ALB logs, parsing each line, and send the raw line events in batches to HEC. Here's a snippet of a working code, where logger is a simple HEC client library you can find from Splunk Lambda blueprints in AWS Lambda console, and payload is the retrieved log.gz file from s3.
zlib.gunzip(payload, (err, result) => {
if (err) {
console.log(err);
callback(err);
} else {
const parsed = result.toString('ascii');
const logEvents = parsed.split("\n");
let count = 0, time;
if (logEvents) {
logEvents.forEach((event) => {
if (event) {
// Extract timestamp as 2nd field in log entry
// For more details: http://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html#ac...
time = event.split(' ')[1];
// Forward with source-specified timestamp
// (optional 'context' arg used to add Lambda metadata e.g. awsRequestId, functionName)
logger.logWithTime(time, event, context);
count += 1;
}
});
console.log(`Processed ${count} log entries`);
}
logger.flushAsync((err, response) => {
if (err) {
callback(err);
} else {
console.log(`Response from Splunk:\n${response}`);
callback(null, count); // Echo number of events forwarded
}
});
}
})
Hi I am with Splunk-TA-AWS 4.4, but I can see any option for [aws:alb:accesslogs] when using the Web UI to onboard new alb logs? I can't find a stanza in the props.conf either? How do we onboard alb access logs?
I have asked engineering and in add-on version 4.4 ALB is indeed supported. The sourcetype keep being called "aws:elb:accesslogs" however, so both CLB, ALB & ELB are all under aws:elb:accesslogs
Thanks ytenenbaum,
Yes, I did use sourcetype 'aws:elb:accesslogs', for our alb logs. Field extraction work fine.
I checked the props.conf in default, found the field extraction is actually using source instead of using sourcetype. That explains why... See below for the configurations
[source::...(/|\)\d+elasticloadbalancing.log.gz]
EXTRACT-elb = ^\s(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[0-9.]+):(?P\d+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[^\s]+)\s+(?P[\d-]+)\s+(?P[\d-]+)\s+(?P\d+)\s+(?P\d+)\s+"(?P.+)"\s+"(?P.+)"\s+(?P[-\w]+)\s*(?P[-\w.]+)\s+(?P[^\s]+)\s+(?P[^\s]+)
EVAL-rtt = request_processing_time + target_processing_time + response_processing_time
The issue is the customer is using ALB and not ELB. You can tell it is ALB because the log files end with log.gz instead of just .log.
AWS released application load balancer in H2 2016. It’s not supported yet by Splunk AWS App and Add-on. It’s part of scope in one of the future releases.
Workaround for the moment: when you create the S3 input point it to aws:alb:accesslogs. The sourcetype can't be elb:accesslogs otherwise, the .gz files will be filtered out
I found this helpful, I think. Point added. We are also looking to start ingesting ALB logs. I know that with ELB access logs we had to set the "Incremental Log" type in the config web wizard, and I don't think the incremental log option shows up unless you set the sourcetype to elb:accesslogs. Will I just have to set whatever the incremental log option is in one of the config on my collection node?
May i ask if we are using incr-s3 or generic s3 to do elb log collection. Please try increment S3