topic Re: How to avoid duplicate events when using distributed search across cloned indexers in Splunk Search

How to avoid duplicate events when using distributed search across cloned indexers

Phil_T_ — Fri, 23 Apr 2010 01:30:52 GMT

I have a scenario where A and B are indexers with one being the clone of the other. The idea being A is in one data center and B in a DR data center.

C is a search head and if it is configured with both A and B as search peers, the searches return duplicates.

Is there a way to configure C to pull data only from A and automatically start using B if A fails?

Phil

Re: How to avoid duplicate events when using distributed search across cloned indexers

gkanapathy — Fri, 23 Apr 2010 10:12:00 GMT

There currently is not without using an external load balancer TCP to alias and failover A and B. Note you probably will have to modify A and B to have the same Splunk search node name (serverName in server.conf).

Re: How to avoid duplicate events when using distributed search across cloned indexers

dskillman — Fri, 23 Apr 2010 12:24:35 GMT

The LB play is doable but in lieu of that you can prepend all of your searches with splunk_server=A. If it yells at you that A isn't available you can then search B.

Re: How to avoid duplicate events when using distributed search across cloned indexers

Phil_T_ — Sun, 25 Apr 2010 18:06:41 GMT

Thanks for the responses. The idea of including the name or IP of the indexer in the searches works as long as you are only doing ad-hoc searches and not if you have scheduled searches and reports. The LB option works but it increases cost and becomes a little more challenging if the indexers are geographically dispersed.

Seems like this feature may be a candidate RFE.

Re: How to avoid duplicate events when using distributed search across cloned indexers

cudgel — Thu, 20 May 2010 22:26:46 GMT

I would like to see this feature as well, seems simple enough to implement - just the reverse of auto-lb from forwarders to the indexers.

Re: How to avoid duplicate events when using distributed search across cloned indexers

chicodeme — Tue, 10 Aug 2010 04:39:48 GMT

If I understand splunk data cloning correctly, you might not want to just point to one indexer for results and then failover. Say A goes down and now your using B(your DR).. well when A comes back online it does not have the data that B has nor will it ever.. So you will be missing out on data if you only are querying A..

unless your doing some tricky failover the nfs file system or some file system replicaton for failover and then starting splunk up using that 'A' dataset.. which has a bunch of issues in itself..

Re: How to avoid duplicate events when using distributed search across cloned indexers

BobM — Wed, 27 Apr 2011 13:41:38 GMT

You don't need to manually do this. You can let splunk search both indexers and dedup the data. Assuming you don't have many duplicate events with the same content, time stamp, host and source.

`your_search` | dedup host, source, _time, _raw, splunk_server

This way if an event is on both servers, you only get one.
If either server is down, you get events from the other.
If data only got indexed on one it doesn't get missed.

Bob