I am using the following configuration in props.conf. It is splitting most of the events correctly, but 2 or 3 events are collapsed. Should I need to include SHOULD_LINEMERGE = false
?
[my_source_type]
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE = ^\d{1,11}\s?,(([^\,]+)?\,?\.?),(([^\,]+)?\,?\.?)
MAX_TIMESTAMP_LOOKAHEAD = 100
TIME_FORMAT = %Y%m%d%H%M%S%6N
TIME_PREFIX = ^(?:[^,\n]*,){7}
disabled = false
pulldown_type = true
This is a .dat file and it has more than 8000 events on a single file.
Sample data
Actual events
07986376244,Mrs,xxxx,40369036,29.06.2016,14:00,21:00,20160628070106529271,/ablive/data/xx/serial/yy/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,38,4c7ca670-eddf-4362-8f4b-20ea99007a0b,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1b5a,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1bca,2016-06-28T07:02:23.224Z,2016-06-28T07:02:26.890Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
07941158158,Mr,yyyyy,40360516,29.06.2016,14:00,21:00,20160628070106516893,/ablive/data/xx/serial/yy/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,36,4a140e0f-69e4-44d3-a5ce-dfb186c9a081,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-19c6,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1a2f,2016-06-28T07:02:17.050Z,2016-06-28T07:02:19.816Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
indexed events
ELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,38,4c7ca670-eddf-4362-8f4b-20ea99007a0b,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1b5a,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1bca,2016-06-28T07:02:23.224Z,2016-06-28T07:02:26.890Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
xx/serial/JL/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./MessageReminderPM201606280700120000.csv,MessageReminderPM201606280700120000.csv,36,4a140e0f-69e4-44d3-a5ce-dfb186c9a081,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-19c6,225b00fe-26-a633-5e21f14e2-ac168f26_5772129e_37501fc-1a2f,2016-06-28T07:02:17.050Z,2016-06-28T07:02:19.816Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
07986356244,Mrs,Mason,40369036,29.06.2016,14:00,21:00,20160628070106529271,/ablive/data/xx/serial/yy/DISTRIBUTION/D
07941156158,Mr,Hurley,40360516,29.06.2016,14:00,21:00,20160628070106516893,/ablive/data/
Thanks in advance
Give this a try
[my_source_type]
NO_BINARY_CHECK = true
LINE_BREAKER= ([\r\n]+)(\d{1,11}\s?,(([^\,]+)?\,?\.?),(([^\,]+)?\,?\.?))
MAX_TIMESTAMP_LOOKAHEAD = 20
TIME_FORMAT = %Y%m%d%H%M%S%6N
TIME_PREFIX = ^(?:[^,\n]*,){7}
SHOULD_LINEMERGE = false
You should specify SHOULD_LINEMERGE = true if you want to use BREAK_ONLY_BEFORE, etc. It's not required though.
Sorry for so many versions of this answer... i get confused on this one all the time 😉
Here's the section in props.conf:
http://docs.splunk.com/Documentation/Splunk/6.4.1/Admin/Propsconf#Line_breaking
See if this works:
BREAK_ONLY_BEFORE = \d{1,11}\s?,(([^\,]+)?\,?.?),(([^\,]+)?\,?.?)
Note about BREAK_ONLY_BEFORE
* When set, Splunk creates a new event only if it encounters a new line that
matches the regular expression
I like the idea of using INDEXED_EXTRACTIONS = CSV instead.
Just to note, it's recommended to not use SHOULD_LINEMERGE = true if you can help it. You'll notice significant performance gains by not using that setting as it rules out an entire portion of the data pipeline.
This appears to be a csv file. Have you tried indexed_extractions?
http://docs.splunk.com/Documentation/Splunk/6.4.1/Data/Extractfieldsfromfileswithstructureddata
i tired indexed_extractions as csv. All events get merged as 1 single event
Can you let us know exactly what your props.conf looks like for this sourcetype now?