All Apps and Add-ons

Splunk App for AWS: Why is Splunk not line breaking at the beginning of log files from s3?

nemski
Explorer

I have the Splunk AWS app ingesting ELB access log files from an S3 bucket. These are very simple log files with only one entry per line so I have the following in props.conf:

EXTRACT-ELB access log = (?P<timestamp>[\S]+) (?P<elb>[\S]+) (?P<client>[\S]+) (?P<backend>[\S]+) (?P<request_processing_time>[\S]+) (?P<backend_processing_time>[\S]+) (?P<response_processing_time>[\S]+) (?P<elb_status_code>[\S]+) (?P<backend_status_code>[\S]+) (?P<received_bytes>[\S]+) (?P<sent_bytes>[\S]+) "(?P<http_method>[\S]+) (?P<request_url>[\S]+) (?P<http_version>[\S]+)" "(?P<user_agent>[^"]+)" (?P<ssl_cipher>[\S]+) (?P<ssl_protocol>[\S]+)
SHOULD_LINEMERGE = false

However, at the beginning of each file, Splunk is merging lines, so the first 1025 lines are treated as a single entry.

I've tried adding persistentQueueSize = 0 to inputs.conf which is a workaround to a similar issue in the release notes. I've tried setting the LINEBREAK to be more specific:

LINE_BREAKER = ([\r\n]+)\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{6}Z

But still it merges just the beginning of the file.

0 Karma
1 Solution

nemski
Explorer

Answering my own question here, the issue was Splunk was not recognising the timestamp format of the log entry. I found this from a log entry:

06-29-2016 03:47:33.686 +0000 WARN  AggregatorMiningProcessor - Breaking event because limit of 1024 has been exceeded - data_source="s3://logs/access_logs/AWSLogs/...", data_host="...", data_sourcetype="aws:elb:accesslog"
06-29-2016 03:47:33.686 +0000 WARN  AggregatorMiningProcessor - Changing breaking behavior for event stream because MAX_EVENTS (1024) was exceeded without a single event break. Will set BREAK_ONLY_BEFORE_DATE to False, and unset any MUST_NOT_BREAK_BEFORE or MUST_NOT_BREAK_AFTER rules. Typically this will amount to treating this data as single-line only. - data_source="...", data_host="...", data_sourcetype="aws:elb:accesslog"

I fixed this by setting the following in props.conf

BREAK_ONLY_BEFORE_DATE = false

View solution in original post

0 Karma

nemski
Explorer

Answering my own question here, the issue was Splunk was not recognising the timestamp format of the log entry. I found this from a log entry:

06-29-2016 03:47:33.686 +0000 WARN  AggregatorMiningProcessor - Breaking event because limit of 1024 has been exceeded - data_source="s3://logs/access_logs/AWSLogs/...", data_host="...", data_sourcetype="aws:elb:accesslog"
06-29-2016 03:47:33.686 +0000 WARN  AggregatorMiningProcessor - Changing breaking behavior for event stream because MAX_EVENTS (1024) was exceeded without a single event break. Will set BREAK_ONLY_BEFORE_DATE to False, and unset any MUST_NOT_BREAK_BEFORE or MUST_NOT_BREAK_AFTER rules. Typically this will amount to treating this data as single-line only. - data_source="...", data_host="...", data_sourcetype="aws:elb:accesslog"

I fixed this by setting the following in props.conf

BREAK_ONLY_BEFORE_DATE = false
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...

SPL2 Deep Dives, AppDynamics Integrations, SAML Made Simple and Much More on Splunk ...

Splunk Lantern is Splunk’s customer success center that provides practical guidance from Splunk experts on key ...