Getting Data In
Highlighted

Multi-line events breaking at 257 lines despite MAX_EVENTS=40000

Engager

I have some files I'm trying to parse into splunk, and I'm having trouble with getting large multi-line events to work properly.
The file format looks like this:

15-05-07 01:03:24.481936 
url=http://something something
content="""lots
and lots
of multiline
content
in completely
random
formats
"""
--- END PASTE RECORD ---

This works fine for most events, but some (long) events get split up into 257-line chunks and everything goes to hell.
The setup I'm using is universal forwarder -> indexers -> search head.

On the forwarder, there's a props.conf in etc/system/local, with this in it:

[source::///opt/path/*.log]
TRUNCATE = 0
MAX_EVENTS = 40000
LINE_BREAKER = (--- END PASTE RECORD ---)
EXTRACT-paste_content = content="""(?<paste_content>.*)[\n\r]"""[\n\r]

on the indexers I have a stanza in props.conf (in a deployment app) like this:

[pastedata]
TRUNCATE = 0
MAX_EVENTS = 40000
LINE_BREAKER = (--- END PASTE RECORD ---)
EXTRACT-paste_content = content="""(?<paste_content>.*)[\n\r]"""[\n\r]

What am I missing? When events are short, everything works fine, but any long event can break in such a way that it gets turned in to hundreds of individual events (if, for instance, the event data has multiple lines which start with timestamps). This is weird, and sometimes ends up with events happening in the future.

(side note: did you know, if you're running a real-time all-time search on splunk, on a data source that is not currently being populated, and you get to a timestamp that already existed in the data, it shows up like it was an event that just happened?)

My specific questions are:
1. why are my events being broken up early
2. when my events are broken up, why do they sometimes get broken up into chunks that don't match the line breaker settings?

I am more concerned about question 1, because if that stops happening the other one will stop too.

thanks, and let me know if you need anything else.

(edit: the regexes are actually fine, but the lt/gt characters aren't displaying properly here. I do not actually have html escapes in my regexes at this time)

0 Karma
Highlighted

Re: Multi-line events breaking at 257 lines despite MAX_EVENTS=40000

Engager

Of course, once I posted the question, I managed to make it work.

The answer is:
if you're using LINEBREAKER and nothing else, you need to set SHOULDLINEMERGE = false.

also you should make sure that your deployment app is deploying things, before you restart your indexers, but that is an entirely other issue.

View solution in original post

0 Karma