I am able to get data without multi-indexing in the QA. When it comes to PROD, I am seeing multi-indexing, which is creating duplicate data in the index. Sometimes logs are indexing more than 200+ times. How can I avoid multi-indexing? Below are my inputs.conf configs
You shouldn't need to clear the fishbucket when you update inputs.conf.
When you clear the fishbucket, you delete everything that Splunk 'remembers' about what files it has already read. So when you next start Splunk, it re-reads all the files again.
Otherwise, are your files being rolled in that directory? crcSalt=<SOURCE> means that it treats each filename as a unique, regardless of the contents. So if you have a file called log/file.log which gets rolled to log/file.log.1, it will get re-read. I don't think this is applicable here, but just something to double check.
We noticed that without the crcSALT= we get duplicates in QA therefore we kept the crcSALT in PROD.
The files are rolled in that directory from type1_log to type1_log.20181910 in the same directory. The PROD is a HAMPC cluster which switches nodes every month or so. We do not have an environment to test the effects of ingestion from such cluster. We installed Splunk UF on the cluster successfully. We know ingestion is occurring normally as type2_log is ingesting normally and with no duplication.
We do notice this in the internal logs:
10-18-2018 20:10:06.900 -0700 INFO WatchedFile - Logfile truncated while open, original pathname file='data/server1/.../LOG/type1_log', will begin reading from start.
We are not sure why this type of INFO is appearing or why the file is being read from the start over and over again. We notice events get indexed 100s of times.