Splunk Search

How to avoid duplicate events when using distributed search across cloned indexers

Phil_T_
Engager

I have a scenario where A and B are indexers with one being the clone of the other. The idea being A is in one data center and B in a DR data center.

C is a search head and if it is configured with both A and B as search peers, the searches return duplicates.

Is there a way to configure C to pull data only from A and automatically start using B if A fails?

Phil

BobM
Builder

You don't need to manually do this. You can let splunk search both indexers and dedup the data. Assuming you don't have many duplicate events with the same content, time stamp, host and source.

`your_search` | dedup host, source, _time, _raw, splunk_server

This way if an event is on both servers, you only get one.
If either server is down, you get events from the other.
If data only got indexed on one it doesn't get missed.

Bob

chicodeme
Communicator

If I understand splunk data cloning correctly, you might not want to just point to one indexer for results and then failover. Say A goes down and now your using B(your DR).. well when A comes back online it does not have the data that B has nor will it ever.. So you will be missing out on data if you only are querying A..

unless your doing some tricky failover the nfs file system or some file system replicaton for failover and then starting splunk up using that 'A' dataset.. which has a bunch of issues in itself..

0 Karma

cudgel
Path Finder

I would like to see this feature as well, seems simple enough to implement - just the reverse of auto-lb from forwarders to the indexers.

Phil_T_
Engager

Thanks for the responses. The idea of including the name or IP of the indexer in the searches works as long as you are only doing ad-hoc searches and not if you have scheduled searches and reports. The LB option works but it increases cost and becomes a little more challenging if the indexers are geographically dispersed.

Seems like this feature may be a candidate RFE.

0 Karma

dskillman
Splunk Employee
Splunk Employee

The LB play is doable but in lieu of that you can prepend all of your searches with splunk_server=A. If it yells at you that A isn't available you can then search B.

gkanapathy
Splunk Employee
Splunk Employee

There currently is not without using an external load balancer TCP to alias and failover A and B. Note you probably will have to modify A and B to have the same Splunk search node name (serverName in server.conf).

Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...