Splunk Search

How to avoid duplicate events when using distributed search across cloned indexers


I have a scenario where A and B are indexers with one being the clone of the other. The idea being A is in one data center and B in a DR data center.

C is a search head and if it is configured with both A and B as search peers, the searches return duplicates.

Is there a way to configure C to pull data only from A and automatically start using B if A fails?



You don't need to manually do this. You can let splunk search both indexers and dedup the data. Assuming you don't have many duplicate events with the same content, time stamp, host and source.

`your_search` | dedup host, source, _time, _raw, splunk_server

This way if an event is on both servers, you only get one.
If either server is down, you get events from the other.
If data only got indexed on one it doesn't get missed.



If I understand splunk data cloning correctly, you might not want to just point to one indexer for results and then failover. Say A goes down and now your using B(your DR).. well when A comes back online it does not have the data that B has nor will it ever.. So you will be missing out on data if you only are querying A..

unless your doing some tricky failover the nfs file system or some file system replicaton for failover and then starting splunk up using that 'A' dataset.. which has a bunch of issues in itself..

0 Karma

Path Finder

I would like to see this feature as well, seems simple enough to implement - just the reverse of auto-lb from forwarders to the indexers.


Thanks for the responses. The idea of including the name or IP of the indexer in the searches works as long as you are only doing ad-hoc searches and not if you have scheduled searches and reports. The LB option works but it increases cost and becomes a little more challenging if the indexers are geographically dispersed.

Seems like this feature may be a candidate RFE.

0 Karma

Splunk Employee
Splunk Employee

The LB play is doable but in lieu of that you can prepend all of your searches with splunk_server=A. If it yells at you that A isn't available you can then search B.

Splunk Employee
Splunk Employee

There currently is not without using an external load balancer TCP to alias and failover A and B. Note you probably will have to modify A and B to have the same Splunk search node name (serverName in server.conf).