Sadly, that doesn't seem to be the cause of my problem. I cleaned all indexes, deleted the data partition entirely, recreated the instance from scratch. After that, I checked and eventcount (as well as tstats) return 0 events for my indexes as expected. When I move my files into the input folder monitored by Splunk, the count start to go up, but the two count start to diverge over time, and after one hour of ingestion, I stumble back to a huge difference : 142 millions seen by eventcount for 68 millions seen by tstats. I'm the only user of this test instance, so no deletion was made. I checked the monitoring console index details of the instance for a particular index, and the numbers shown here are coherent with the numbers returned by eventcount for that index. There seems to be an incoherence between my input files and the events retrieved by tstats. For a specific sourcetype, I have 54 inputs files for a total of 68 millions events (files full of JSON events, with nothing special in it, no specific line breaking or anything). If I index only those files, I see in the logs of the server that the TailReader did saw the 54 files and counted 68 millions events. If I do a "| eventcount index=XXX", it returns 68 millions events. But if I do the search "| tstats count where index=XXX by source", I only have 35 source files for 28 millions of events. When checking the logs of the instance, there's no log of error indicating anything wrong with my sourcetype or indicating reading or parsing problems. Only error messages from a service "STMgr" which indicate that for some buckets, there's an unexpected return code of -9. Would that mean that the indexing is taking too much memory, and if those process are killed by the OS, could that explain the incoherent numbers ?
... View more