All Apps and Add-ons

Splunk App and Add-on for Amazon Web Services: How to index Cloudfront .gz log files?

ryandonn85
New Member

I'm trying to ingest Cloudfront logs from an AWS. Currently they are dumped to an S3 bucket in the form of a .gz file.

The only data I'm getting returned is:

 4/4/16
9:05:23.000 AM  
#Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type

    host = **Redacted**
    source = s3://mybucketname/EL90IIKCKI7FS.2016-04-04-12.9f3c0869.gz
    sourcetype = aws:cloudfront:accesslog

This is the result of the input being pointed at the root of the bucket.

However, if I point the input directly at the .gz file within the bucket, it will ingest it and i can see my access logs. This won't work long term because the logfile rolls regularly and spawns a new one.

Is there something I'm missing? Splunk seems to be aware that the .gz file exists when pointed at the root of the bucket, but it doesn't seem to be ingesting the file.

Thanks!

0 Karma

ivan_mirosav
Explorer

This documentation is not clear as to any parameters for aws_key

Can we get more information? Does it accept wildcards? should the path include a filename? 

My understanding is that S3 does not use a Linux file system, so a lot of the directories etc. act differently. But I can't find any information on configuring it in the TA....

0 Karma

mounavignesh
New Member

All,

Im not able to search cloudfront logs.. There is no results. Below are my input.conf 

[splunk_ta_aws_logs://Cloudfront_logs]
aws_account = splunk_DEV
bucket_name = Mybucketname
bucket_region = us-east-1
host_name = s3.amazonaws.com
interval = 1800
log_file_prefix = cdn_logs
log_name_format = ABCDEFGH.%Y-%m-%d-
log_start_date = 2020-01-01
log_type = cloudfront:accesslogs
max_fails = 10000
max_retries = -1
sourcetype = aws:cloudfront:accesslogs

0 Karma

kchen_splunk
Splunk Employee
Splunk Employee

Sharing the stanza in inputs.conf will help.
1. Please make sure "Prefix" is configured correctly
2. Please make sure the datetime is configured correctly.

ryandonn85
New Member

Thanks for your reply.

What should the prefix be?

Here is the stanza from input.conf:

[aws_s3://svc-aws-splunk-app_20160406092105]
aws_account = myawsaccount
bucket_name = mybucketname
character_set = auto
ct_blacklist = ^$
host_name = s3.amazonaws.com
index = aws
initial_scan_datetime = 1971-01-01T00:00:00Z
max_items = 100000
max_retries = 3
polling_interval = 60
recursion_depth = -1
sourcetype = aws:cloudfront:accesslogs

0 Karma

phadnett_splunk
Splunk Employee
Splunk Employee

I think kchen is referring to the "S3 key prefix" which is the key_name parameter in the S3 input. Looking at your input.conf file, it does not appear you have this configured:

http://docs.splunk.com/Documentation/AddOns/latest/AWS/S3
"The key prefix or full key name to scan for available files. The add-on searches all the objects in all directories under this key name. Leave blank to scan all files in the bucket."

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

 Are you ready to revolutionize your IT operations? As digital transformation accelerates, the demand for ...

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...