Getting Data In

Heavy Forwarder Redundancy (with DB Connect, AWS-Addon)

takashi6
Explorer

Hi Experts and Splunkers,

We have an existing Splunk environment which consists of:
- 3 x clustered Search Heads
- 3 x clustered Indexers
- 1 x heavy forwarder which has several add-ons (like DB conn, AWS Add-on) and also exposes HEC endpoint
- Other servers for other functions (like deployer, cluster master, license master etc)

We have been asked by our client to implement a redundancy also in the heavy forwarder as now it is a single point of failure.
More specifically, we would like to have 2 HF servers for high availability purpose - ideally Active-Active like IDX and SH.

Through our our research and reading through Splunk docs and answers, we understand we can set-up multiple HF servers without having to worry about data duplication for the inbound data (such as inbound data from UF with autoLB, inbound data via HEC with loadbalancer).

How can we manage the data the add-ons in the HF servers are pulling from the source system, such as DB connect and AWS-addon? We feel we will end up having a duplicated data if we set-up 2 HF servers (active-active) on which we install a same set of add-ons?

Thanks for your input in advance!

0 Karma
1 Solution

DavidHourani
Super Champion

Hi @takashi6,

I might be a bit late on this one.

For AWS :
HA is available if you are using SQS, more on this here : https://docs.splunk.com/Documentation/AddOns/released/AWS/ConfigureInputs

For DBconnect :
It is possible to have HA in active passive mode if you make sure your checkpoint information is rsynced between your two HF. No active active though as this will duplicate data. You could index in different places and use one index as a backup if license volume is not an issue.

Hope that helps.

Chers,
David

View solution in original post

DavidHourani
Super Champion

Hi @takashi6,

I might be a bit late on this one.

For AWS :
HA is available if you are using SQS, more on this here : https://docs.splunk.com/Documentation/AddOns/released/AWS/ConfigureInputs

For DBconnect :
It is possible to have HA in active passive mode if you make sure your checkpoint information is rsynced between your two HF. No active active though as this will duplicate data. You could index in different places and use one index as a backup if license volume is not an issue.

Hope that helps.

Chers,
David

takashi6
Explorer

Hello @DavidHourani,

Thank you so much for your inputs - much appreciated.

In fact, the last idea of having the two active-active add-ons to write into a different indexes sounds like a good idea (yes..., provided I can manage to convince the owner to be ok with the additional data volume) and I can set-up an app to query both the indexes and dedup on an unique identifier ... or something like that...

Thank you for your input!!!

0 Karma

DavidHourani
Super Champion

You're most welcome !

For SQS on AWS there will be no duplication whatsoever it works like a charm. As for dbconnect, yes you have many options for de-duplication you can chose what you like ^^

Feel free to up-vote and accept if this was helpful !

Cheers,
David

0 Karma

gjanders
SplunkTrust
SplunkTrust

HA is not supported by most Splunk add ons as the checkpoint directory cannot replicate to another HF easily and the other HF cannot know when the first has stopped.

DB connect on search head clusters has this idea open:
https://ideas.splunk.com/ideas/EID-I-85

Please read the Splunk Ideas documentation around voting/contributing or open a more general idea around high availability of addon's on heavy forwarders.

0 Karma

takashi6
Explorer

Thank you, @gjanders - much appreciated. I'm now accepting your answer to close the question.

0 Karma

darrenfuller
Contributor

Dbconnect doesn't do HA. Even in a SHC of HFs, since inputs would would require shared checkpointing between the HFs running the queries....you will duplicate your data (unless all your queries are checkpointless...but that is unlikely).

Perhaps look into Splunks Data Stream Processor. It's primarily purpose is working with data in motion before it gets to Splunk. But it also has an HA data-pull mechanism called "large scale data collector" which is basically exactly what you are looking for. Shared checkpointing, multiple redundant/cluster aware nodes available to execute tasks....But i don't think the current version does database querying...but i have heard whispers of it in the pipeline.

Maybe look into Apache NiFi for your database connections?

takashi6
Explorer

Hi @darrenfuller - Thank you for your quick response and insight, much appreciated.
I suppose it is the same case for add-ons (like AWS Add-ons) - - we can't do HA with Splunk enterprise only.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...