Getting Data In

Should I activate Indexer Acknowledgement and Persistent Queuing on all forwarders to prevent data loss when upgrading an indexer cluster?

pinVie
Path Finder

Hello all - hope someone can tell me if the following is a good idea.

I have to upgrade an Indexer cluster and search heads from 6.1.2 to 6.2.4 without losing data sent from any forwarder during the indexer downtime.

So what I want to do is to make the forwarders cache all information while the indexers are down. afaik, universal forwarders do this by default, but only 500 KB in memory. I am very sure that 500 KB are not enough so what I want to do is to activate Indexer Acknowledgement and Persistent Queueing on all forwarders. I'd activate it before updating the indexers and deactivate it as soon as the indexers are running fine and the cache has been emptied out.

Two questions:
- In general, is this a good idea to prevent loss of incoming logs. I have forwarder with multiple functionality (reading windows event logs, receiving syslog, reading custom logfiles, ...)?
- Can persistent queuing be deactivated without problems?

Thx a lot !

0 Karma

maciep
Champion

Not an answer, but did want to mention that I submitted an enhancement request to allow for a rolling upgrade of the indexer cluster regardless if major/minor/maintenance upgrade. I find it ridiculous that I have to bring down my entire HA cluster to do an upgrade.

Not sure if that will be implemented soon or ever, but maybe the more people that request it, the more priority they'll give it.

Good luck not losing events during your upgrade, hopefully this approach will work.

0 Karma

atari1050
Path Finder

hello-
That really depends on the velocity of your data coming in and what type it is (TCP/UDP=OK, File-based=NO).

Following this guide: http://docs.splunk.com/Documentation/Splunk/6.2.4/Data/Usepersistentqueues

It looks to be configurable, but you are also going to have to wade through a bunch of error messages, and your indexers are going to be seriously overtaxed for whatever length of time that you would have the Cluster Master down.

You should be testing this in a Sandbox environment before even seriously considering it in a Prod one.

Hypothetically, it looks like it could work, but you also need to consider any possible spikes in data and your users.

Sincerely,
Mike

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...