I'm using the Splunk Add-on for Unix and Linux, a 6.4.x Universal Forwarder as the forwarder, and Splunk Enterprise 6.5 as the indexer.
I found the search results of index=os are duplicated a lot, so I investigated the detail:
- the "os" rawdata which is named "journal.gz" includes duplicated all fields data.
- the number of duplicated data increased as time proceeds.
- if I change the index name from "os", the data are not duplicated.
- if forward stand-alone Splunk Enterprise 6.4 which is configured same as above, the data are not duplicated.
This issue occurs to only for the "os" index, so I'm guessing that the cause of duplication exists at the indexing process using the *nix add-on, but I don't have any idea how to solve this problem, and I would not like to solve with a search statement (like dedup command).
Please kindly tell me any idea to solve?
Sorry, there are some misunderstandings and new findings in my question:
This issue occurs to all indexes except for internal indexes ( such as _internal, _introspection and _audit ).
The trigger of duplication is stopping indexer in the indexer cluster. ( forwarder uses Indexer Discovery and useACK. )
The number of increase is as many as the number of stopped indexers.
If splunkd of forwarder is restarted, new data forwarded after restarting are normal. ( no duplication for new data )
If indexer is stopped after cluster master-node is stopped, the data is not duplicated.
My guessing is below:
if forwarder fails to connect indexer, forwarder is about to send the data which is in failed connection as below.
step 0. [indexer A] stopped
step 1. [forwarder] get peer list from cluster master
step 2. [forwarder] duplicated data X for [indexer A]
step 3. [forwarder] send data X and duplicated data X' to indexer B
So, indexer has two same data.
After that, indexer makes raw data and idx.
If my understanding is correct, I should configure the forwarder outputs.conf not to send duplicated data while indexer is stopped. But I don't understand why "_internal" index is not duplicated.
Could you validate my understanding and tell me how to configure Splunk?