All Apps and Add-ons

Splunk Add-on for Unix and Linux: Why am I seeing duplicate raw data increase over time for index=os?

tom8h
Explorer

Hello,

I'm using the Splunk Add-on for Unix and Linux, a 6.4.x Universal Forwarder as the forwarder, and Splunk Enterprise 6.5 as the indexer.

I found the search results of index=os are duplicated a lot, so I investigated the detail:
- the "os" rawdata which is named "journal.gz" includes duplicated all fields data.
- the number of duplicated data increased as time proceeds.
- if I change the index name from "os", the data are not duplicated.
- if forward stand-alone Splunk Enterprise 6.4 which is configured same as above, the data are not duplicated.

This issue occurs to only for the "os" index, so I'm guessing that the cause of duplication exists at the indexing process using the *nix add-on, but I don't have any idea how to solve this problem, and I would not like to solve with a search statement (like dedup command).

Please kindly tell me any idea to solve?
Thank you,

0 Karma
1 Solution

tom8h
Explorer

I solved this problem.
I changed useACK configuration to false, then that duplication stopped.

I know the potential of duplication,
but in this case, index file is not double and one raw data includes same data over multiline.

I did not understand why the duplication occurred and I hope that duplication stopped with useACK=true...

View solution in original post

0 Karma

tom8h
Explorer

I solved this problem.
I changed useACK configuration to false, then that duplication stopped.

I know the potential of duplication,
but in this case, index file is not double and one raw data includes same data over multiline.

I did not understand why the duplication occurred and I hope that duplication stopped with useACK=true...

0 Karma

tom8h
Explorer

Sorry, there are some misunderstandings and new findings in my question:

  1. This issue occurs to all indexes except for internal indexes ( such as _internal, _introspection and _audit ).
  2. The trigger of duplication is stopping indexer in the indexer cluster. ( forwarder uses Indexer Discovery and useACK. )
  3. The number of increase is as many as the number of stopped indexers.
  4. If splunkd of forwarder is restarted, new data forwarded after restarting are normal. ( no duplication for new data )
  5. If indexer is stopped after cluster master-node is stopped, the data is not duplicated.

My guessing is below:

  • if forwarder fails to connect indexer, forwarder is about to send the data which is in failed connection as below. step 0. [indexer A] stopped step 1. [forwarder] get peer list from cluster master step 2. [forwarder] duplicated data X for [indexer A] step 3. [forwarder] send data X and duplicated data X' to indexer B So, indexer has two same data.
  • After that, indexer makes raw data and idx.

If my understanding is correct, I should configure the forwarder outputs.conf not to send duplicated data while indexer is stopped. But I don't understand why "_internal" index is not duplicated.

Could you validate my understanding and tell me how to configure Splunk?

Thank you,

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...