All Apps and Add-ons

Splunk App for AWS: Why is Splunk not line breaking at the beginning of log files from s3?

Explorer

I have the Splunk AWS app ingesting ELB access log files from an S3 bucket. These are very simple log files with only one entry per line so I have the following in props.conf:

EXTRACT-ELB access log = (?P<timestamp>[\S]+) (?P<elb>[\S]+) (?P<client>[\S]+) (?P<backend>[\S]+) (?P<request_processing_time>[\S]+) (?P<backend_processing_time>[\S]+) (?P<response_processing_time>[\S]+) (?P<elb_status_code>[\S]+) (?P<backend_status_code>[\S]+) (?P<received_bytes>[\S]+) (?P<sent_bytes>[\S]+) "(?P<http_method>[\S]+) (?P<request_url>[\S]+) (?P<http_version>[\S]+)" "(?P<user_agent>[^"]+)" (?P<ssl_cipher>[\S]+) (?P<ssl_protocol>[\S]+)
SHOULD_LINEMERGE = false

However, at the beginning of each file, Splunk is merging lines, so the first 1025 lines are treated as a single entry.

I've tried adding persistentQueueSize = 0 to inputs.conf which is a workaround to a similar issue in the release notes. I've tried setting the LINEBREAK to be more specific:

LINE_BREAKER = ([\r\n]+)\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{6}Z

But still it merges just the beginning of the file.

0 Karma
1 Solution

Explorer

Answering my own question here, the issue was Splunk was not recognising the timestamp format of the log entry. I found this from a log entry:

06-29-2016 03:47:33.686 +0000 WARN  AggregatorMiningProcessor - Breaking event because limit of 1024 has been exceeded - data_source="s3://logs/access_logs/AWSLogs/...", data_host="...", data_sourcetype="aws:elb:accesslog"
06-29-2016 03:47:33.686 +0000 WARN  AggregatorMiningProcessor - Changing breaking behavior for event stream because MAX_EVENTS (1024) was exceeded without a single event break. Will set BREAK_ONLY_BEFORE_DATE to False, and unset any MUST_NOT_BREAK_BEFORE or MUST_NOT_BREAK_AFTER rules. Typically this will amount to treating this data as single-line only. - data_source="...", data_host="...", data_sourcetype="aws:elb:accesslog"

I fixed this by setting the following in props.conf

BREAK_ONLY_BEFORE_DATE = false

View solution in original post

0 Karma

Explorer

Answering my own question here, the issue was Splunk was not recognising the timestamp format of the log entry. I found this from a log entry:

06-29-2016 03:47:33.686 +0000 WARN  AggregatorMiningProcessor - Breaking event because limit of 1024 has been exceeded - data_source="s3://logs/access_logs/AWSLogs/...", data_host="...", data_sourcetype="aws:elb:accesslog"
06-29-2016 03:47:33.686 +0000 WARN  AggregatorMiningProcessor - Changing breaking behavior for event stream because MAX_EVENTS (1024) was exceeded without a single event break. Will set BREAK_ONLY_BEFORE_DATE to False, and unset any MUST_NOT_BREAK_BEFORE or MUST_NOT_BREAK_AFTER rules. Typically this will amount to treating this data as single-line only. - data_source="...", data_host="...", data_sourcetype="aws:elb:accesslog"

I fixed this by setting the following in props.conf

BREAK_ONLY_BEFORE_DATE = false

View solution in original post

0 Karma