I discovered this quite by accident while testing with a sample of Tomcat logs.
If I index them without customizing any datetime settings, there are 78,909 events.
If I index them with DATETIME_CONFIG = NONE then there are only 78,403 events.
Any ideas why?
See the following note from the spec file:
Set DATETIME_CONFIG = NONE to prevent the timestamp processor from running. When timestamp processing is off, Splunk Enterprise does not look at the text of the event for the timestamp--it instead uses the event's "time of receipt"; in other words, the time the event is received via its input. For file-based inputs, this means that Splunk Enterprise derives the event timestamp from the modification time of the input file.
The events are being combined, thus the reduced count. Also, it is probably using the file mod time and therefore certain events will be combined into one. If it's all multi-line, this could be a bit hard to figure out which events get put together. I would check the setting for your sourcetype to ensure that the props settings are correct for the events themselves.
This seems like issue with your event breaking. I'm guessing you've not configured any explicit event breaking and there are some events which are multiline in you logs. Can you post your full props.conf entries (From indexer/heavy forwarder) for this sourcetype.
For reference, By default Splunk breaks the events based on timestamp (BREAK_ONLY_BEFORE_DATE = true). For better efficiency, you should configure both the event parsing (LINE_BREAKER, BREAK_ONLY_BEFORE etc ) and timestamp recognition (TIME_FORMAT, TIME_PREFIX etc)
Ok, I do not have anything configured other than DATETIME_CONFIG = NONE.
If you can provide some sample logs we can help you setup proper props.conf, so that you'll not have any count mismatch.
Here are two lines from the log file. One is indexed, but the other is not. (I had to sanitize the identifiable text for security.) Only the time is different, so why was only one indexed?
2015-09-01 06:02:35 10.xx.xx.xx 10.xx.xx.xx POST /uri/Cmd.be - 200 96 0.015 'Jakarta Commons-HttpClient/3.0.1' '$Version=0; FE=1111635210.20480.0000; $Path=/' - someplace.com'
2015-09-01 06:08:50 10.xx.xx.xx 10.xx.xx.xx POST /uri/Cmd.be - 200 96 0.015 'Jakarta Commons-HttpClient/3.0.1' '$Version=0; FE=1111635210.20480.0000; $Path=/' - someplace.com'
One other thing. Tomcat is a widely adopted and used application, so are you trying to tell me that Splunk cannot handle Tomcat logs without custom props and transforms? That seems like a waste.