I am able to get data without multi-indexing in the QA. When it comes to PROD, I am seeing multi-indexing, which is creating duplicate data in the index. Sometimes logs are indexing more than 200+ times. How can I avoid multi-indexing? Below are my inputs.conf configs
[monitor:///data/server1/.../LOG/type1_log] index=ABC host=host1 sourcetype=type1_log crcSALT=[SOURCE] [monitor:///data/server1/.../LOG/type2_log] index=ABC host=host1 sourcetype=type2_log crcSALT=[SOURCE]
NOTE: I am using the same configs for QA. I am not sure why i am seeing this issue in PROD.
I am following the below steps to update inputs.conf
Please let me know where I am making a mistake.
Thanks in advance
You shouldn't need to clear the fishbucket when you update inputs.conf.
When you clear the fishbucket, you delete everything that Splunk 'remembers' about what files it has already read. So when you next start Splunk, it re-reads all the files again.
Otherwise, are your files being rolled in that directory?
crcSalt=<SOURCE> means that it treats each filename as a unique, regardless of the contents. So if you have a file called
log/file.log which gets rolled to
log/file.log.1, it will get re-read. I don't think this is applicable here, but just something to double check.
Thanks for responding.
We noticed that without the crcSALT= we get duplicates in QA therefore we kept the crcSALT in PROD.
The files are rolled in that directory from type1_log to type1_log.20181910 in the same directory. The PROD is a HAMPC cluster which switches nodes every month or so. We do not have an environment to test the effects of ingestion from such cluster. We installed Splunk UF on the cluster successfully. We know ingestion is occurring normally as type2_log is ingesting normally and with no duplication.
We do notice this in the internal logs:
10-18-2018 20:10:06.900 -0700 INFO WatchedFile - Logfile truncated while open, original pathname file='data/server1/.../LOG/type1_log', will begin reading from start.
We are not sure why this type of INFO is appearing or why the file is being read from the start over and over again. We notice events get indexed 100s of times.
Thanks for your help!