Hi Experts and Splunkers,
We have an existing Splunk environment which consists of:
- 3 x clustered Search Heads
- 3 x clustered Indexers
- 1 x heavy forwarder which has several add-ons (like DB conn, AWS Add-on) and also exposes HEC endpoint
- Other servers for other functions (like deployer, cluster master, license master etc)
We have been asked by our client to implement a redundancy also in the heavy forwarder as now it is a single point of failure.
More specifically, we would like to have 2 HF servers for high availability purpose - ideally Active-Active like IDX and SH.
Through our our research and reading through Splunk docs and answers, we understand we can set-up multiple HF servers without having to worry about data duplication for the inbound data (such as inbound data from UF with autoLB, inbound data via HEC with loadbalancer).
How can we manage the data the add-ons in the HF servers are pulling from the source system, such as DB connect and AWS-addon? We feel we will end up having a duplicated data if we set-up 2 HF servers (active-active) on which we install a same set of add-ons?
Thanks for your input in advance!
Dbconnect doesn't do HA. Even in a SHC of HFs, since inputs would would require shared checkpointing between the HFs running the queries....you will duplicate your data (unless all your queries are checkpointless...but that is unlikely).
Perhaps look into Splunks Data Stream Processor. It's primarily purpose is working with data in motion before it gets to Splunk. But it also has an HA data-pull mechanism called "large scale data collector" which is basically exactly what you are looking for. Shared checkpointing, multiple redundant/cluster aware nodes available to execute tasks....But i don't think the current version does database querying...but i have heard whispers of it in the pipeline.
Maybe look into Apache NiFi for your database connections?
Hi @darrenfuller - Thank you for your quick response and insight, much appreciated.
I suppose it is the same case for add-ons (like AWS Add-ons) - - we can't do HA with Splunk enterprise only.
HA is not supported by most Splunk add ons as the checkpoint directory cannot replicate to another HF easily and the other HF cannot know when the first has stopped.
DB connect on search head clusters has this idea open:
Please read the Splunk Ideas documentation around voting/contributing or open a more general idea around high availability of addon's on heavy forwarders.
I might be a bit late on this one.
For AWS :
HA is available if you are using SQS, more on this here : https://docs.splunk.com/Documentation/AddOns/released/AWS/ConfigureInputs
For DBconnect :
It is possible to have HA in active passive mode if you make sure your checkpoint information is rsynced between your two HF. No active active though as this will duplicate data. You could index in different places and use one index as a backup if license volume is not an issue.
Hope that helps.
Thank you so much for your inputs - much appreciated.
In fact, the last idea of having the two active-active add-ons to write into a different indexes sounds like a good idea (yes..., provided I can manage to convince the owner to be ok with the additional data volume) and I can set-up an app to query both the indexes and dedup on an unique identifier ... or something like that...
Thank you for your input!!!
You're most welcome !
For SQS on AWS there will be no duplication whatsoever it works like a charm. As for dbconnect, yes you have many options for de-duplication you can chose what you like ^^
Feel free to up-vote and accept if this was helpful !