All Apps and Add-ons

Splunk Add-on for Unix and Linux: Why am I seeing duplicate raw data increase over time for index=os?

tom8h
Explorer

Hello,

I'm using the Splunk Add-on for Unix and Linux, a 6.4.x Universal Forwarder as the forwarder, and Splunk Enterprise 6.5 as the indexer.

I found the search results of index=os are duplicated a lot, so I investigated the detail:
- the "os" rawdata which is named "journal.gz" includes duplicated all fields data.
- the number of duplicated data increased as time proceeds.
- if I change the index name from "os", the data are not duplicated.
- if forward stand-alone Splunk Enterprise 6.4 which is configured same as above, the data are not duplicated.

This issue occurs to only for the "os" index, so I'm guessing that the cause of duplication exists at the indexing process using the *nix add-on, but I don't have any idea how to solve this problem, and I would not like to solve with a search statement (like dedup command).

Please kindly tell me any idea to solve?
Thank you,

0 Karma
1 Solution

tom8h
Explorer

I solved this problem.
I changed useACK configuration to false, then that duplication stopped.

I know the potential of duplication,
but in this case, index file is not double and one raw data includes same data over multiline.

I did not understand why the duplication occurred and I hope that duplication stopped with useACK=true...

View solution in original post

0 Karma

tom8h
Explorer

I solved this problem.
I changed useACK configuration to false, then that duplication stopped.

I know the potential of duplication,
but in this case, index file is not double and one raw data includes same data over multiline.

I did not understand why the duplication occurred and I hope that duplication stopped with useACK=true...

View solution in original post

0 Karma

tom8h
Explorer

Sorry, there are some misunderstandings and new findings in my question:

  1. This issue occurs to all indexes except for internal indexes ( such as _internal, _introspection and _audit ).
  2. The trigger of duplication is stopping indexer in the indexer cluster. ( forwarder uses Indexer Discovery and useACK. )
  3. The number of increase is as many as the number of stopped indexers.
  4. If splunkd of forwarder is restarted, new data forwarded after restarting are normal. ( no duplication for new data )
  5. If indexer is stopped after cluster master-node is stopped, the data is not duplicated.

My guessing is below:

  • if forwarder fails to connect indexer, forwarder is about to send the data which is in failed connection as below. step 0. [indexer A] stopped step 1. [forwarder] get peer list from cluster master step 2. [forwarder] duplicated data X for [indexer A] step 3. [forwarder] send data X and duplicated data X' to indexer B So, indexer has two same data.
  • After that, indexer makes raw data and idx.

If my understanding is correct, I should configure the forwarder outputs.conf not to send duplicated data while indexer is stopped. But I don't understand why "_internal" index is not duplicated.

Could you validate my understanding and tell me how to configure Splunk?

Thank you,

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!