Deployment Architecture

Upgraded Heavy fowarder wihout data loss

blanky
Explorer

I'm planning to upgrade upgrade splunk environment now.

3 shcluster - 3 index cluster - 2 heavy forwarder - 1 master.

 

i want to upgrade HF without data loss but i have to stop the splunk server during upgrade.

 

is there any other way to upgrade HF without data loss??

0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Depending on your particular setup there might or might not be a way to upgrade the forwarder without data loss. It depends on what inputs you have there and what data you're receiving with them.

For example - if you have a scripted or modular input which must periodically query some API endpoint for value, when you bring your HF down your API calls won't get spawned and you won't get data for those particular scheduled points in time. And short of having a relatively complicated "quasi-HA" setup on HFs there is no way around it.

If you're receiving UDP syslogs on that HF - there is also not much you can do unless you can do some network-level reconfiguration to pass that data into another instance.

There might however be some inputs (or sources generating data for those inputs) which might deal with a situation when they do not run continuously - like buffering data on the sending side or - in case of a pull-mode input- reading an accumulated backlog.

So there is no general answer. It all depends on your particular setup and data flow.

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

Depending on your particular setup there might or might not be a way to upgrade the forwarder without data loss. It depends on what inputs you have there and what data you're receiving with them.

For example - if you have a scripted or modular input which must periodically query some API endpoint for value, when you bring your HF down your API calls won't get spawned and you won't get data for those particular scheduled points in time. And short of having a relatively complicated "quasi-HA" setup on HFs there is no way around it.

If you're receiving UDP syslogs on that HF - there is also not much you can do unless you can do some network-level reconfiguration to pass that data into another instance.

There might however be some inputs (or sources generating data for those inputs) which might deal with a situation when they do not run continuously - like buffering data on the sending side or - in case of a pull-mode input- reading an accumulated backlog.

So there is no general answer. It all depends on your particular setup and data flow.

livehybrid
SplunkTrust
SplunkTrust

Hi @blanky

tl;dr; - If you are sending from source to both HF then upgrading one at a time would be fine. 

Do your client servers all send to both of your HF? If so they should automatically load balance between the two of them and therefore you will not lose data if you gracefully shutdown one, upgrade it and then ensure it has started successfully before doing the other.

If you are unsure check the outputs.conf on the servers sending to the HF which should have a comma-delimited list under the server key in your tcpout group stanza similar to the below:

 

[tcpout]
defaultGroup = My_Cluster_1

[tcpout:My_Cluster_1]
disabled=false
server = 10.1.4.32:9997,10.1.4.33:9997

 

  If you are outputting to a single HF then consider adding the secondary if possible, this will give redundancy for when 1 of the HF is offline.

Either way, if you are sending data from Splunk UF/HF to the HF and the HF goes offline, the client server should queue the data so that it sends when the HF connection is restored. The size of the queue will depend on your configuration and knowing if the queue would withstand the downtime would depend on the amount of data the client is sending. For more about queues see https://docs.splunk.com/Documentation/Splunk/latest/Data/Usepersistentqueues

Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards

Will

0 Karma

kiran_panchavat
SplunkTrust
SplunkTrust

@blanky 

Use Persistent Queues

Configure persistent queues on your HFs to store data on disk while the Splunk service is stopped.

Edit inputs.conf on each HF to enable persistent queues for your inputs (e.g., set persistentQueueSize to an appropriate value like 1GB or more, depending on your data volume).

Stop the Splunk service, perform the upgrade, and restart.

The HF will process the queued data after restarting.

Data is preserved on disk during the outage and forwarded once the HF is back online.
Requires sufficient disk space and pre-configuration. Not all input types support persistent queues (e.g., HTTP Event Collector doesn’t).

https://docs.splunk.com/Documentation/Splunk/9.4.1/Data/Usepersistentqueues 

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!
0 Karma

kiran_panchavat
SplunkTrust
SplunkTrust

@blanky 

Options to Upgrade HFs Without Data Loss

If you have two HFs, configure them as a redundant pair with a load balancer or configure your data sources to send data to both HFs (e.g., Syslog can send to multiple destinations).

Steps:-

  • Ensure both HFs are forwarding identical data to the indexers.
  • Stop Splunk on HF1, upgrade it, and restart it.
  • Validate HF1 is working, then repeat the process for HF2.

HF2 continues processing data while HF1 is down, and vice versa, ensuring no data loss.

Your data sources must support sending to multiple endpoints, or you need a load balancer in front of the HFs.

Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!
0 Karma

kiran_panchavat
SplunkTrust
SplunkTrust

@blanky 

When you stop the Splunk service on an HF for an upgrade, it stops accepting new data from inputs and forwarding data to indexers. Any data generated by your sources during this downtime could be lost unless mitigated.
 
Are your HFs collecting data from files (e.g., log files), network inputs (e.g., Syslog, HTTP Event Collector), or scripts?  HFs have in-memory queues and can use persistent queues (if configured) to buffer data during brief interruptions.
 
A typical Splunk HF upgrade is relatively quick (minutes), but preparation and validation can extend the outage window.
Did this help? If yes, please consider giving kudos, marking it as the solution, or commenting for clarification — your feedback keeps the community going!
0 Karma
Get Updates on the Splunk Community!

A Season of Skills: New Splunk Courses to Light Up Your Learning Journey

There’s something special about this time of year—maybe it’s the glow of the holidays, maybe it’s the ...

Announcing the Migration of the Splunk Add-on for Microsoft Azure Inputs to ...

Announcing the Migration of the Splunk Add-on for Microsoft Azure Inputs to Officially Supported Splunk ...

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI! Discover how Splunk’s agentic AI ...