Solved: Splunk Add-on for Unix and Linux: Why am I seeing ...

tom8h · ‎11-30-2016

Hello,

I'm using the Splunk Add-on for Unix and Linux, a 6.4.x Universal Forwarder as the forwarder, and Splunk Enterprise 6.5 as the indexer.

I found the search results of index=os are duplicated a lot, so I investigated the detail:
- the "os" rawdata which is named "journal.gz" includes duplicated all fields data.
- the number of duplicated data increased as time proceeds.
- if I change the index name from "os", the data are not duplicated.
- if forward stand-alone Splunk Enterprise 6.4 which is configured same as above, the data are not duplicated.

This issue occurs to only for the "os" index, so I'm guessing that the cause of duplication exists at the indexing process using the *nix add-on, but I don't have any idea how to solve this problem, and I would not like to solve with a search statement (like dedup command).

Please kindly tell me any idea to solve?
Thank you,

tom8h · ‎12-02-2016

I solved this problem.
I changed useACK configuration to false, then that duplication stopped.

I know the potential of duplication,
but in this case, index file is not double and one raw data includes same data over multiline.

I did not understand why the duplication occurred and I hope that duplication stopped with useACK=true...

View solution in original post

tom8h · ‎12-02-2016

I solved this problem.
I changed useACK configuration to false, then that duplication stopped.

I know the potential of duplication,
but in this case, index file is not double and one raw data includes same data over multiline.

I did not understand why the duplication occurred and I hope that duplication stopped with useACK=true...

tom8h · ‎11-30-2016

Sorry, there are some misunderstandings and new findings in my question:

This issue occurs to all indexes except for internal indexes ( such as _internal, _introspection and _audit ).
The trigger of duplication is stopping indexer in the indexer cluster. ( forwarder uses Indexer Discovery and useACK. )
The number of increase is as many as the number of stopped indexers.
If splunkd of forwarder is restarted, new data forwarded after restarting are normal. ( no duplication for new data )
If indexer is stopped after cluster master-node is stopped, the data is not duplicated.

My guessing is below:

if forwarder fails to connect indexer, forwarder is about to send the data which is in failed connection as below. step 0. [indexer A] stopped step 1. [forwarder] get peer list from cluster master step 2. [forwarder] duplicated data X for [indexer A] step 3. [forwarder] send data X and duplicated data X' to indexer B So, indexer has two same data.
After that, indexer makes raw data and idx.

If my understanding is correct, I should configure the forwarder outputs.conf not to send duplicated data while indexer is stopped. But I don't understand why "_internal" index is not duplicated.

Could you validate my understanding and tell me how to configure Splunk?

Thank you,

Splunk Add-on for Unix and Linux: Why am I seeing duplicate raw data increase over time for index=os?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

Splunk Add-on for Unix and Linux: Why am I seeing duplicate raw data increase over time for index=os?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits