I have the Splunk AWS app ingesting ELB access log files from an S3 bucket. These are very simple log files with only one entry per line so I have the following in props.conf:
EXTRACT-ELB access log = (?P<timestamp>[\S]+) (?P<elb>[\S]+) (?P<client>[\S]+) (?P<backend>[\S]+) (?P<request_processing_time>[\S]+) (?P<backend_processing_time>[\S]+) (?P<response_processing_time>[\S]+) (?P<elb_status_code>[\S]+) (?P<backend_status_code>[\S]+) (?P<received_bytes>[\S]+) (?P<sent_bytes>[\S]+) "(?P<http_method>[\S]+) (?P<request_url>[\S]+) (?P<http_version>[\S]+)" "(?P<user_agent>[^"]+)" (?P<ssl_cipher>[\S]+) (?P<ssl_protocol>[\S]+)
SHOULD_LINEMERGE = false
However, at the beginning of each file, Splunk is merging lines, so the first 1025 lines are treated as a single entry.
I've tried adding persistentQueueSize = 0
to inputs.conf which is a workaround to a similar issue in the release notes. I've tried setting the LINEBREAK to be more specific:
LINE_BREAKER = ([\r\n]+)\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{6}Z
But still it merges just the beginning of the file.
Answering my own question here, the issue was Splunk was not recognising the timestamp format of the log entry. I found this from a log entry:
06-29-2016 03:47:33.686 +0000 WARN AggregatorMiningProcessor - Breaking event because limit of 1024 has been exceeded - data_source="s3://logs/access_logs/AWSLogs/...", data_host="...", data_sourcetype="aws:elb:accesslog"
06-29-2016 03:47:33.686 +0000 WARN AggregatorMiningProcessor - Changing breaking behavior for event stream because MAX_EVENTS (1024) was exceeded without a single event break. Will set BREAK_ONLY_BEFORE_DATE to False, and unset any MUST_NOT_BREAK_BEFORE or MUST_NOT_BREAK_AFTER rules. Typically this will amount to treating this data as single-line only. - data_source="...", data_host="...", data_sourcetype="aws:elb:accesslog"
I fixed this by setting the following in props.conf
BREAK_ONLY_BEFORE_DATE = false
Answering my own question here, the issue was Splunk was not recognising the timestamp format of the log entry. I found this from a log entry:
06-29-2016 03:47:33.686 +0000 WARN AggregatorMiningProcessor - Breaking event because limit of 1024 has been exceeded - data_source="s3://logs/access_logs/AWSLogs/...", data_host="...", data_sourcetype="aws:elb:accesslog"
06-29-2016 03:47:33.686 +0000 WARN AggregatorMiningProcessor - Changing breaking behavior for event stream because MAX_EVENTS (1024) was exceeded without a single event break. Will set BREAK_ONLY_BEFORE_DATE to False, and unset any MUST_NOT_BREAK_BEFORE or MUST_NOT_BREAK_AFTER rules. Typically this will amount to treating this data as single-line only. - data_source="...", data_host="...", data_sourcetype="aws:elb:accesslog"
I fixed this by setting the following in props.conf
BREAK_ONLY_BEFORE_DATE = false