Archive

Why am I seeing multi-indexing in Splunk? What are the reasons to create multi-indexing?

chandana204
Communicator

Hi folks,

I am able to get data without multi-indexing in the QA. When it comes to PROD, I am seeing multi-indexing, which is creating duplicate data in the index. Sometimes logs are indexing more than 200+ times. How can I avoid multi-indexing? Below are my inputs.conf configs

[monitor:///data/server1/.../LOG/type1_log]
index=ABC
host=host1
sourcetype=type1_log
crcSALT=[SOURCE]

[monitor:///data/server1/.../LOG/type2_log]
index=ABC
host=host1
sourcetype=type2_log
crcSALT=[SOURCE]

NOTE: I am using the same configs for QA. I am not sure why i am seeing this issue in PROD.

I am following the below steps to update inputs.conf

  1. Stop UF
  2. update inputs.conf
  3. Clear Fishbucket
  4. Restart UF

Please let me know where I am making a mistake.

Thanks in advance

0 Karma

sduff_splunk
Splunk Employee
Splunk Employee

You shouldn't need to clear the fishbucket when you update inputs.conf.

When you clear the fishbucket, you delete everything that Splunk 'remembers' about what files it has already read. So when you next start Splunk, it re-reads all the files again.

Otherwise, are your files being rolled in that directory? crcSalt=<SOURCE> means that it treats each filename as a unique, regardless of the contents. So if you have a file called log/file.log which gets rolled to log/file.log.1, it will get re-read. I don't think this is applicable here, but just something to double check.

0 Karma

chandana204
Communicator

Thanks for responding.

We noticed that without the crcSALT= we get duplicates in QA therefore we kept the crcSALT in PROD.

The files are rolled in that directory from type1_log to type1_log.20181910 in the same directory. The PROD is a HAMPC cluster which switches nodes every month or so. We do not have an environment to test the effects of ingestion from such cluster. We installed Splunk UF on the cluster successfully. We know ingestion is occurring normally as type2_log is ingesting normally and with no duplication.

We do notice this in the internal logs:
10-18-2018 20:10:06.900 -0700 INFO WatchedFile - Logfile truncated while open, original pathname file='data/server1/.../LOG/type1_log', will begin reading from start.

We are not sure why this type of INFO is appearing or why the file is being read from the start over and over again. We notice events get indexed 100s of times.

Thanks for your help!

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!