Getting Data In

Multi-line events breaking at 257 lines despite MAX_EVENTS=40000

dbourke
Engager

I have some files I'm trying to parse into splunk, and I'm having trouble with getting large multi-line events to work properly.
The file format looks like this:

15-05-07 01:03:24.481936 
url=http://something something
content="""lots
and lots
of multiline
content
in completely
random
formats
"""
--- END PASTE RECORD ---

This works fine for most events, but some (long) events get split up into 257-line chunks and everything goes to hell.
The setup I'm using is universal forwarder -> indexers -> search head.

On the forwarder, there's a props.conf in etc/system/local, with this in it:

[source::///opt/path/*.log]
TRUNCATE = 0
MAX_EVENTS = 40000
LINE_BREAKER = (--- END PASTE RECORD ---)
EXTRACT-paste_content = content="""(?<paste_content>.*)[\n\r]"""[\n\r]

on the indexers I have a stanza in props.conf (in a deployment app) like this:

[pastedata]
TRUNCATE = 0
MAX_EVENTS = 40000
LINE_BREAKER = (--- END PASTE RECORD ---)
EXTRACT-paste_content = content="""(?<paste_content>.*)[\n\r]"""[\n\r]

What am I missing? When events are short, everything works fine, but any long event can break in such a way that it gets turned in to hundreds of individual events (if, for instance, the event data has multiple lines which start with timestamps). This is weird, and sometimes ends up with events happening in the future.

(side note: did you know, if you're running a real-time all-time search on splunk, on a data source that is not currently being populated, and you get to a timestamp that already existed in the data, it shows up like it was an event that just happened?)

My specific questions are:
1. why are my events being broken up early
2. when my events are broken up, why do they sometimes get broken up into chunks that don't match the line breaker settings?

I am more concerned about question 1, because if that stops happening the other one will stop too.

thanks, and let me know if you need anything else.

(edit: the regexes are actually fine, but the lt/gt characters aren't displaying properly here. I do not actually have html escapes in my regexes at this time)

0 Karma
1 Solution

dbourke
Engager

Of course, once I posted the question, I managed to make it work.

The answer is:
if you're using LINE_BREAKER and nothing else, you need to set SHOULD_LINEMERGE = false.

also you should make sure that your deployment app is deploying things, before you restart your indexers, but that is an entirely other issue.

View solution in original post

0 Karma

dbourke
Engager

Of course, once I posted the question, I managed to make it work.

The answer is:
if you're using LINE_BREAKER and nothing else, you need to set SHOULD_LINEMERGE = false.

also you should make sure that your deployment app is deploying things, before you restart your indexers, but that is an entirely other issue.

0 Karma
Get Updates on the Splunk Community!

Best Strategies to Optimize Observability Costs

 Join us on Tuesday, May 6, 2025, at 11 AM PDT / 2 PM EDT for an insightful session on optimizing ...

Fueling your curiosity with new Splunk ILT and eLearning courses

At Splunk Education, we’re driven by curiosity—both ours and yours! That’s why we’re committed to delivering ...

Splunk AI Assistant for SPL 1.1.0 | Now Personalized to Your Environment for Greater ...

Splunk AI Assistant for SPL has transformed how users interact with Splunk, making it easier than ever to ...