Getting Data In

Why is BREAK_ONLY_BEFORE not working as expected for all my events?

arunloganathan
New Member

I am using the following configuration in props.conf. It is splitting most of the events correctly, but 2 or 3 events are collapsed. Should I need to include SHOULD_LINEMERGE = false?

[my_source_type]
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE = ^\d{1,11}\s?,(([^\,]+)?\,?\.?),(([^\,]+)?\,?\.?)
MAX_TIMESTAMP_LOOKAHEAD = 100
TIME_FORMAT = %Y%m%d%H%M%S%6N
TIME_PREFIX = ^(?:[^,\n]*,){7}
disabled = false
pulldown_type = true

This is a .dat file and it has more than 8000 events on a single file.

Sample data

Actual events

07986376244,Mrs,xxxx,40369036,29.06.2016,14:00,21:00,20160628070106529271,/ablive/data/xx/serial/yy/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,38,4c7ca670-eddf-4362-8f4b-20ea99007a0b,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1b5a,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1bca,2016-06-28T07:02:23.224Z,2016-06-28T07:02:26.890Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS

07941158158,Mr,yyyyy,40360516,29.06.2016,14:00,21:00,20160628070106516893,/ablive/data/xx/serial/yy/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,36,4a140e0f-69e4-44d3-a5ce-dfb186c9a081,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-19c6,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1a2f,2016-06-28T07:02:17.050Z,2016-06-28T07:02:19.816Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS

indexed events

ELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,38,4c7ca670-eddf-4362-8f4b-20ea99007a0b,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1b5a,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1bca,2016-06-28T07:02:23.224Z,2016-06-28T07:02:26.890Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
xx/serial/JL/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,36,4a140e0f-69e4-44d3-a5ce-dfb186c9a081,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-19c6,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1a2f,2016-06-28T07:02:17.050Z,2016-06-28T07:02:19.816Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
07986356244,Mrs,Mason,40369036,29.06.2016,14:00,21:00,20160628070106529271,/ablive/data/xx/serial/yy/DISTRIBUTION/D
07941156158,Mr,Hurley,40360516,29.06.2016,14:00,21:00,20160628070106516893,/ablive/data/

Thanks in advance

0 Karma

somesoni2
Revered Legend

Give this a try

 [my_source_type]
 NO_BINARY_CHECK = true
 LINE_BREAKER= ([\r\n]+)(\d{1,11}\s?,(([^\,]+)?\,?\.?),(([^\,]+)?\,?\.?))
 MAX_TIMESTAMP_LOOKAHEAD = 20
 TIME_FORMAT = %Y%m%d%H%M%S%6N
 TIME_PREFIX = ^(?:[^,\n]*,){7}
 SHOULD_LINEMERGE = false
0 Karma

jkat54
SplunkTrust
SplunkTrust

You should specify SHOULD_LINEMERGE = true if you want to use BREAK_ONLY_BEFORE, etc. It's not required though.

Sorry for so many versions of this answer... i get confused on this one all the time 😉

Here's the section in props.conf:

http://docs.splunk.com/Documentation/Splunk/6.4.1/Admin/Propsconf#Line_breaking

See if this works:

BREAK_ONLY_BEFORE = \d{1,11}\s?,(([^\,]+)?\,?.?),(([^\,]+)?\,?.?)

Note about BREAK_ONLY_BEFORE
* When set, Splunk creates a new event only if it encounters a new line that
matches the regular expression

I like the idea of using INDEXED_EXTRACTIONS = CSV instead.

0 Karma

ryanoconnor
Builder

Just to note, it's recommended to not use SHOULD_LINEMERGE = true if you can help it. You'll notice significant performance gains by not using that setting as it rules out an entire portion of the data pipeline.

0 Karma

sundareshr
Legend

This appears to be a csv file. Have you tried indexed_extractions?

http://docs.splunk.com/Documentation/Splunk/6.4.1/Data/Extractfieldsfromfileswithstructureddata

0 Karma

arunloganathan
New Member

i tired indexed_extractions as csv. All events get merged as 1 single event

0 Karma

ryanoconnor
Builder

Can you let us know exactly what your props.conf looks like for this sourcetype now?

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

Industry Solutions for Supply Chain and OT, Amazon Use Cases, Plus More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Enterprise Security Content Update (ESCU) | New Releases

In November, the Splunk Threat Research Team had one release of new security content via the Enterprise ...