Getting Data In

Why is BREAK_ONLY_BEFORE not working as expected for all my events?

New Member

I am using the following configuration in props.conf. It is splitting most of the events correctly, but 2 or 3 events are collapsed. Should I need to include SHOULD_LINEMERGE = false?

[my_source_type]
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE = ^\d{1,11}\s?,(([^\,]+)?\,?\.?),(([^\,]+)?\,?\.?)
MAX_TIMESTAMP_LOOKAHEAD = 100
TIME_FORMAT = %Y%m%d%H%M%S%6N
TIME_PREFIX = ^(?:[^,\n]*,){7}
disabled = false
pulldown_type = true

This is a .dat file and it has more than 8000 events on a single file.

Sample data

Actual events

07986376244,Mrs,xxxx,40369036,29.06.2016,14:00,21:00,20160628070106529271,/ablive/data/xx/serial/yy/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,38,4c7ca670-eddf-4362-8f4b-20ea99007a0b,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1b5a,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1bca,2016-06-28T07:02:23.224Z,2016-06-28T07:02:26.890Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS

07941158158,Mr,yyyyy,40360516,29.06.2016,14:00,21:00,20160628070106516893,/ablive/data/xx/serial/yy/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,36,4a140e0f-69e4-44d3-a5ce-dfb186c9a081,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-19c6,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1a2f,2016-06-28T07:02:17.050Z,2016-06-28T07:02:19.816Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS

indexed events

ELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,38,4c7ca670-eddf-4362-8f4b-20ea99007a0b,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1b5a,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1bca,2016-06-28T07:02:23.224Z,2016-06-28T07:02:26.890Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
xx/serial/JL/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,36,4a140e0f-69e4-44d3-a5ce-dfb186c9a081,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-19c6,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1a2f,2016-06-28T07:02:17.050Z,2016-06-28T07:02:19.816Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
07986356244,Mrs,Mason,40369036,29.06.2016,14:00,21:00,20160628070106529271,/ablive/data/xx/serial/yy/DISTRIBUTION/D
07941156158,Mr,Hurley,40360516,29.06.2016,14:00,21:00,20160628070106516893,/ablive/data/

Thanks in advance

0 Karma

SplunkTrust
SplunkTrust

Give this a try

 [my_source_type]
 NO_BINARY_CHECK = true
 LINE_BREAKER= ([\r\n]+)(\d{1,11}\s?,(([^\,]+)?\,?\.?),(([^\,]+)?\,?\.?))
 MAX_TIMESTAMP_LOOKAHEAD = 20
 TIME_FORMAT = %Y%m%d%H%M%S%6N
 TIME_PREFIX = ^(?:[^,\n]*,){7}
 SHOULD_LINEMERGE = false
0 Karma

SplunkTrust
SplunkTrust

You should specify SHOULDLINEMERGE = true if you want to use BREAKONLY_BEFORE, etc. It's not required though.

Sorry for so many versions of this answer... i get confused on this one all the time 😉

Here's the section in props.conf:

http://docs.splunk.com/Documentation/Splunk/6.4.1/Admin/Propsconf#Line_breaking

See if this works:

BREAKONLYBEFORE = \d{1,11}\s?,(([^\,]+)?\,?.?),(([^\,]+)?\,?.?)

Note about BREAKONLYBEFORE
* When set, Splunk creates a new event only if it encounters a new line that
matches the regular expression

I like the idea of using INDEXED_EXTRACTIONS = CSV instead.

0 Karma

Builder

Just to note, it's recommended to not use SHOULD_LINEMERGE = true if you can help it. You'll notice significant performance gains by not using that setting as it rules out an entire portion of the data pipeline.

0 Karma

Legend

This appears to be a csv file. Have you tried indexed_extractions?

http://docs.splunk.com/Documentation/Splunk/6.4.1/Data/Extractfieldsfromfileswithstructureddata

0 Karma

New Member

i tired indexed_extractions as csv. All events get merged as 1 single event

0 Karma

Builder

Can you let us know exactly what your props.conf looks like for this sourcetype now?

0 Karma