i am indexing .dat file which contains more than 5000 events.
in the middle 1 or 2 events breaked wrongly
This the config i used
Props.conf
NO_BINARY_CHECK = true
BREAK_ONLY_BEFORE = ^\d{1,11}\s?,(([^\,]+)?\,?.?),(([^\,]+)?\,?.?)
MAX_TIMESTAMP_LOOKAHEAD = 100
TIME_FORMAT = %Y%m%d%H%M%S%6N
TIME_PREFIX = ^(?:[^,\n]*,){7}
disabled = false
pulldown_type = true
inputs.conf
[monitor:///xxxx]
disabled = false
whitelist=*.dat
time_before_close = 120
multiline_event_extra_waittime = true
index = xxxx
sourcetype = yyyy
Actual Events
00000000000,,xxxx,40673673,19.08.2016,14:00,21:00,20160818070100184759,/ablive/data/yyyy/serial/yyyy/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./xxxx201608180700060000.csv,xxxx201608180700060000.csv,26,c2038af5-5b95-4532-bfa2-e2fa54d8a29e,22a301ea-26-a666-5e1b87780-ac168f26_57b54f17_2dc00d6-11b7,22a301ea-26-a666-5e1b87780-ac168f26_57b54f17_2dc00d6-1232,2016-08-18T07:01:50.679Z,2016-08-18T07:01:52.994Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
00000000000,,xxxx,40667760,19.08.2016,17:00,21:00,20160818070100167747,/ablive/data/yyyy/serial/yyyy/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./xxxx201608180700060000.csv,xxxx201608180700060000.csv,24,854f6e61-bf00-4914-9799-c539eb30be81,22a301ea-26-a666-5e1b87780-ac168f26_57b54f17_2dc00d6-1023,22a301ea-26-a666-5e1b87780-ac168f26_57b54f17_2dc00d6-1066,2016-08-18T07:01:46.089Z,2016-08-18T07:01:49.160Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
Indexed Events
e,22a301ea-26-a666-5e1b87780-ac168f26_57b54f17_2dc00d6-11b7,22a301ea-26-a666-5e1b87780-ac168f26_57b54f17_2dc00d6-1232,2016-08-18T07:01:50.679Z,2016-08-18T07:01:52.994Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
60,19.08.2016,17:00,21:00,20160818070100167747,/ablive/data/yyyy/serial/yyyy/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./xxxx201608180700060000.csv,xxxx201608180700060000.csv,24,854f6e61-bf00-4914-9799-c539eb30be81,22a301ea-26-a666-5e1b87780-ac168f26_57b54f17_2dc00d6-1023,22a301ea-26-a666-5e1b87780-ac168f26_57b54f17_2dc00d6-1066,2016-08-18T07:01:46.089Z,2016-08-18T07:01:49.160Z,44,GB,Scheduled,Success,SUCCESS,SUCCESS
00000000000,,xxxx,40673673,19.08.2016,14:00,21:00,20160818070100184759,/ablive/data/yyyy/serial/yyyy/DISTRIBUTION/DELIVERY/delivery_messages_inbound/pending/./xxxx201608180700060000.csv,xxxx201608180700060000.csv,26,c2038af5-5b95-4532-bfa2-e2fa54d8a29
00000000000,,xxxx,406677
Indextimings
indextime source count
2016-08-18 07:01:49 xxxx 2162
2016-08-18 07:01:52 xxxx 2
2016-08-18 07:01:53 xxxx 2137
2016-08-18 07:01:56 xxxx 2
2016-08-18 07:01:58 xxxx 1266
same file indexed in above mentioned time and count 2 contains splitted events.
I used time_before_close and multiline_event_extra_waittime=true even though 1 or 2 events get splitted.
Thanks in advance.
That actually all looks good. I was going to suggest that possibly an EOF was causing Splunk to split the event. I've had something similar happen before. I think a good test would be taking that log file (the one with 5000 events) uploading it directly to your indexer through the GUI with the "Add Data" feature. Configure everything the same and see if the event is still breaking weird in the middle. I use this method sometimes if it seems like it should be working based on the config. If it works there then it means its something else.
I tired indexing data using GUI. There is no issue in line breaking. This line break issue not happened every day . It happens randomly one day 1 event get splitted another day 2 events but not more than 2 events. No issue in some days
Have you checked splunkd.log for any error messages relating to the LineBreakingProcessor?
Also, if your events always start with 00000000000, why don't you simplify your props.conf setting to BREAK_ONLY_BEFORE=^00000000000
?
events are not always start with 00000000000. It will have random numbers like 07548521430
OK, so what is the pattern then? 1-11 digits, followed by a comma?
If so, you could still simplify it by using BREAK_ONLY_BEFORE=^\d{1,11},
I suspect your line breaking issues stem from an overly complex RegEx, so I would try to use the simplest expression that matches the beginning of your events.
Did you check splunkd.log for any warning/error messages that may provide a hint as to what may be going on? You may also run into default limits as to total event length and/or maximum number of lines per multi-line event.
i checked splunkd.log there is no error or warning events.