Splunk Search
Highlighted

How can I set a custom linemerge sourcetype event break regex to ensure that an entire event is ready to be captured

Explorer

Hi everyone,

I have a monitored file that is appended to by a cron job. Sometimes splunk checks the file in the middle of >> appended output , and cuts an event before it's ready. This results in a split event.

In order to get around this I'm trying to get a regex expression that will only "fire" an event break when i know the last line of the multi-line event has been written to the file.

Here's an example with 2 raw entries:

Start: Mon, 11 Apr 2016 13:26:01 -0400
Starting up poller script.  
Database connection established.
Garbage line found, likely the header of the csv.  I'm not going to insert.
Affected Rows: 5 
Parsed total lines: 1009
End: Mon, 11 Apr 2016 13:26:02 -0400
Start: Mon, 11 Apr 2016 13:31:01 -0400
Starting up poller script.  
Database connection established.
Garbage line found, likely the header of the csv.  I'm not going to insert.
Affected Rows: 5 
Parsed total lines: 1009
End: Mon, 11 Apr 2016 13:31:01 -0400

If i take that data and use the regex of /(?s)Start:.*?End:/gm at regex101 , i get what I want. I am unable to effect an ungreedy multiline search in the splunk event-break regex pattern.

No, i won't take answers like " output your cronjob to a temp location, then append it all at once to the file you are monitoring".

0 Karma
Highlighted

Re: How can I set a custom linemerge sourcetype event break regex to ensure that an entire event is ready to be captured

SplunkTrust
SplunkTrust

Hi, not sure if this is what you are trying to achieve but here we go.

| stats count
| eval _raw = "
Start: Mon, 11 Apr 2016 13:26:01 -0400
Starting up poller script.  
Database connection established.
Garbage line found, likely the header of the csv.  I'm not going to insert.
Affected Rows: 5 
Parsed total lines: 1009
End: Mon, 11 Apr 2016 13:26:02 -0400
Start: Mon, 11 Apr 2016 13:31:01 -0400
Starting up poller script.  
Database connection established.
Garbage line found, likely the header of the csv.  I'm not going to insert.
Affected Rows: 5 
Parsed total lines: 1009
End: Mon, 11 Apr 2016 13:31:01 -0400
"
| rex max_match=0 "(?msi)Start:(?<my_event>.*?)End:"
| table my_event

Output:

my_event
--------------------------------------------------------------------------------------------------------
Mon, 11 Apr 2016 13:26:01 -0400 Starting up poller script. Database connection established. Garbage line found, likely the header of the csv. I'm not going to insert. Affected Rows: 5 Parsed total lines: 1009

Mon, 11 Apr 2016 13:31:01 -0400 Starting up poller script. Database connection established. Garbage line found, likely the header of the csv. I'm not going to insert. Affected Rows: 5 Parsed total lines: 1009 

Let me know if that helps.

Thanks,
J

0 Karma
Highlighted

Re: How can I set a custom linemerge sourcetype event break regex to ensure that an entire event is ready to be captured

Explorer

Thanks for getting into this with me J

Your answer was in the context of a search-app search, when providing a raw string. I tried

(?msi)Start:(?.*?)End:

however,

I'm stuck at the data-input section of set source type where you can specify a line of regex in order to direct splunk on where the event-break resides.

I tried your variation of the regular expression there, however, the result is the same. The preview shows a hit for the line that begins "Match", but doesn't stop when it hits the first "End:". I get about 257 lines in that event (each should contain only 7 lines. Presumably this is because it wants to continue searching to the bottom of my growing file , but gets capped by MAX_EVENTS.

I played with setting the greediness mode, thus switching the question mark's function,
(?ms)Start:.?End:
(?Ums)Start:.
End:
but it seems to be greedy either way.

I can visually see what i'm after in regex101 website with my originally posted regex, but it doesn't seem to be processing the same while in splunk's data-source regex field.

For the record of where i am entering the regex...

add data > monitor > source Files and Directory > index once (file.log) > set source type, submenu event-breaks > regex... > field to fill in the regex here and preview on the right.

0 Karma
Highlighted

Re: How can I set a custom linemerge sourcetype event break regex to ensure that an entire event is ready to be captured

SplunkTrust
SplunkTrust

Ah see.
In that case, go to the Timestamp submenu instead and do the following:

Extraction: Advanced
Timestamp prefix (without the quotes but including the space after the colon): "Start: "

Leave everything else as default.
See picture below.

Hope that helps

alt text

0 Karma
Highlighted

Re: How can I set a custom linemerge sourcetype event break regex to ensure that an entire event is ready to be captured

Explorer

I understand that this is the best way to do it on a static file, but the problem is in the context of a monitored file.

When cron-jobbed output is being appended to the monitored file (the cron can take 1, 2 or up to 10 seconds to finish dumping those 7 lines of output), splunk can "look" at the file mid-way through the event output.

If it determines a match at that perfect time, then it will cut my event short.

Doing a regex match for the total output of the event (to ensure the beginning and end has been finalized and written in total), i wanted to use the event break regex to prevent splunk from detrmining an event has occured by determining that no event has occured until the regex comes back "true".

That is why i am trying to tell it to only consider an event break occured when it sees "Start:" at the top and "End:" at the bottom.

Here's an example of how splunk chops up my event using the normal timestamp method:

I can't seem to post pics yet, here's the listed output of indexed events: See how event was broken up for 2) and 3) ? My script had not yet fully output all there was to output before splunk picked up the "new event". When it next found a match for an event break, it did so (nicely), but it lumped the chunk of ascii between the last match and this match into an event between 2) and 4). Ergo event 2) and event 3) were split. Event 5 and 6 are perfect because I lucked out as to when splunk monitored the file and discovered the new (full) events.

2   4/13/16
12:51:01.000 PM 
Start: Wed, 13 Apr 2016 12:51:01 -0400
Starting up poller script.  
Database connection established.
Garbage line found, likely the header of the csv.  I'm not going to insert.
host = server1 source = /data/poller/poller_v1.log sourcetype = custom_source_type_poller
3   4/13/16
12:46:01.000 PM 
Affected Rows: 5 
Parsed total lines: 1009
End: Wed, 13 Apr 2016 12:46:01 -0400
host = server1 source = /data/poller/poller_v1.log sourcetype = custom_source_type_poller
4   4/13/16
12:46:01.000 PM 
Start: Wed, 13 Apr 2016 12:46:01 -0400
Starting up poller script.  
Database connection established.
Garbage line found, likely the header of the csv.  I'm not going to insert.
host = server1 source = /data/poller/poller_v1.log sourcetype = custom_source_type_poller
5   4/13/16
12:41:01.000 PM 
Start: Wed, 13 Apr 2016 12:41:01 -0400
Starting up poller script.  
Database connection established.
Garbage line found, likely the header of the csv.  I'm not going to insert.
Affected Rows: 5 
Parsed total lines: 1009
End: Wed, 13 Apr 2016 12:41:02 -0400
host = server1 source = /data/poller/poller_v1.log sourcetype = custom_source_type_poller
6   4/13/16
12:36:01.000 PM 
Start: Wed, 13 Apr 2016 12:36:01 -0400
Starting up poller script.  
Database connection established.
Garbage line found, likely the header of the csv.  I'm not going to insert.
Affected Rows: 5 
Parsed total lines: 1009
End: Wed, 13 Apr 2016 12:36:02 -0400
host = server1 source = /data/poller/poller_v1.log sourcetype = custom_source_type_poller
0 Karma
Highlighted

Re: How can I set a custom linemerge sourcetype event break regex to ensure that an entire event is ready to be captured

Explorer

Found some other attributes I can play with to hopefully solve the mystery...

http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Indexmulti-lineevents

MUSTBREAKAFTER = * When set and the regular expression matches the current line, Splunk Enterprise always creates a new event for the next input line.
Splunk Enterprise might still break before the current line if another rule matches.
empty string
MUSTNOTBREAKAFTER = * When set and the current line matches the regular expression, Splunk Enterprise does not break on any subsequent lines until the MUSTBREAKAFTER expression matches. empty string
MUST
NOTBREAKBEFORE = * When set and the current line matches the regular expression, Splunk Enterprise does not break the last event before the current line. empty string

0 Karma
Highlighted

Re: How can I set a custom linemerge sourcetype event break regex to ensure that an entire event is ready to be captured

SplunkTrust
SplunkTrust

That's good.
Please post your answer here if you find it in order to help other users.

If you are still having problems let us know.

Thanks,
J

0 Karma
Highlighted

Re: How can I set a custom linemerge sourcetype event break regex to ensure that an entire event is ready to be captured

Explorer

Yup, i've been trying various combinations of data source settings, but with no success. The difficulty in playing with the settings is that what may appear to work when getting a preview with the Add Data wizard, won't work in the wild when splunk intrudes midway through a cron jobby's output.

There has to be a zillion people with my problem. I will continue to try new things and look for other articles related.

I stumbled upon another article that died without resolution as well. I tried his method of using "timebeforeclose" attribute of 5, but still no luck.

https://answers.splunk.com/answers/188776/events-are-not-properly-split.html

0 Karma
Highlighted

Re: How can I set a custom linemerge sourcetype event break regex to ensure that an entire event is ready to be captured

SplunkTrust
SplunkTrust

Try something like this

inputs.conf

[monitor://.......]
...other settings...
sourcetype=yoursourcetype
time_before_close= 15                  ***This is time in seconds

props.conf

[yoursourcetype]
SHOULD_LINEMERGE=false
LINE_BREAKER = ([\r\n]+)(?=Start: \w+, \d+)
TIME_PREFIX = ^Start:\s
TIME_FORMAT=%a, %d %b %Y %H:%M:%S %z
MAX_TIMESTAMP_LOOKAHEAD  = 31
0 Karma
Highlighted

Re: How can I set a custom linemerge sourcetype event break regex to ensure that an entire event is ready to be captured

Explorer

No dice. My events are still getting split.

props.conf

root@server1:/opt/splunk/etc/apps/search/local# more props.conf 
[custom_source_type_poller_v1]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=Start: \w+, \d+)
TIME_PREFIX = ^Start:\s
TIME_FORMAT = %a, %d %b %Y %H:%M:%S %z
MAX_TIMESTAMP_LOOKAHEAD = 31
NO_BINARY_CHECK = true
category = Custom
pulldown_type = 1
disabled = false

input.conf

root@server1:/opt/splunk/etc/apps/search/local# more inputs.conf 

[monitor:///data/poller/poller_v1.log]
disabled = false
index = poller_index
sourcetype = custom_source_type_poller_v1
time_before_close = 15
0 Karma