Solved: Splunk multiple heavy forwarders and Splunk Add On...

Nicolas2203 · ‎10-11-2024

Hi Splunk community,

I have a quick question about an app, such as the Microsoft Cloud Services app, in a multiple Heavy Forwarder environment.

The app is installed on one Heavy Forwarder and makes some API calls to Azure to retrieve data from an event hub and store this data in an indexer cluster.

If the Heavy Forwarder where the add-on is installed goes down, no logs are retrieved from the event hub. So, what are the best practices for this kind of app, which retrieves logs through API calls, to be more resilient? The same applies to some Cisco add-ons that collect logs from Cisco devices via an API.

For now, I will configure the app on another Heavy Forwarder without enabling data collection, but in case of failure, human intervention will be needed.

I would be curious to know what solutions you implement for this kind of issue.

Thanks

Nicolas

I'm curious

gcusello · ‎10-11-2024

Hi @Nicolas2203 ,

it's a lack in Splunk architecture: there isn't an HA solution for Heavy Forwarders.

You have two solutions:

install the Add-On on a Search Head Cluster, so the cluster manages add-ons and HA is guaranteed, but many users don't love to have the ingestion systems in the user fornt-end.

The second solution is to configure more HFs and manually enable one at a time, but this isn't an automatic recovery solution and yu have to manage checkpoints between HFs.

I hint to add a request in Splunk ideas about this.

Ciao.

Giuseppe

View solution in original post

PickleRick · ‎10-15-2024

OK. Short story is Splunk has no means for native inbuilt HA when we're talking about inputs (regardless of whether we're talking UFs or HFs). Period.

So the only thing you can do is use external means to replicate config and state between nodes and make sure that only one node is actually active.

That's not a trivial issue. While replicating config is usually relatively easy (maybe except for some border cases when you - for example - need to authenticate with a private key and don't want the key to leave the box), the other two points are tricky.

Different inputs keep their state using different methods. Some store checkpoints as simple text files, some use kvstore, some (monitor input) use fishbucket. So you have to find where the state is being stored and replicate it to the passive node. You also need to have a way to make sure only one input is active at a time.

It's not a trivial task and there are several different approaches to this. From some rsync-based handcrafted scripts to simply migrating whole VMs with a forwarder between separate hypervisors (with several other possible solutions "in between"). I think there was a .conf presentation about this topic but I can never find it 😕

Nicolas2203 · ‎10-15-2024

Hello @PickleRick

Thats an interesting topic, I will dig more information about it.

I let you know here If I found something interesting

Thanks !

Nicolas

Nicolas2203 · ‎10-14-2024

Hello @gcusello

Thanks for the answer.

Ok I understand, I will install the app on both HF and just activate it on one.

When you say yu have to manage checkpoints between HFs.

How is that possible in Splunk ?

Assuming that logs are stored on the source for 2 weeks in case of an outage, when I activate log collection on the second HF, it will start collecting logs from the day it is activated, and it won't be aware of the logs already ingested into Splunk?

gcusello · ‎10-14-2024

Hi @Nicolas2203,

checkpoints are managed in different ways (e.g. DB-Connect uses a kv-store table), so you have to understand what's the repository of your checkpoints and you have to align between HFs using a scheduled script that copies configurations and checkpoints, so the HFs will be aligned to the last run of the script.

ciao.

Giuseppe

Nicolas2203 · ‎10-17-2024

Hello,

I just checked, and the Microsoft Cloud Services manage checkpoints locally on heavy forwarders. However, there is a configuration in the app that allows you to store checkpoints in a container within an Azure storage account. This way, when you need to start log collection on another heavy forwarder, it could facilitate the process.

Will configure that and test, I let you know !

Thanks

Nico

gcusello · ‎10-17-2024

Hi @Nicolas2203 ,

ok, good for you, let me know, see next time!

Ciao and happy splunking

Giuseppe

P.S.: Karma Points are appreciated by all the contributors 😉

gcusello · ‎10-11-2024

Hi @Nicolas2203 ,

it's a lack in Splunk architecture: there isn't an HA solution for Heavy Forwarders.

You have two solutions:

install the Add-On on a Search Head Cluster, so the cluster manages add-ons and HA is guaranteed, but many users don't love to have the ingestion systems in the user fornt-end.

The second solution is to configure more HFs and manually enable one at a time, but this isn't an automatic recovery solution and yu have to manage checkpoints between HFs.

I hint to add a request in Splunk ideas about this.

Ciao.

Giuseppe

Splunk multiple heavy forwarders and Splunk Add Ons

configuration

Splunk Observability Synthetic Monitoring - Resolved Incident on Detector Alerts

Video | Tom’s Smartness Journey Continues

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?