I'm planning to upgrade upgrade splunk environment now.
3 shcluster - 3 index cluster - 2 heavy forwarder - 1 master.
i want to upgrade HF without data loss but i have to stop the splunk server during upgrade.
is there any other way to upgrade HF without data loss??
Depending on your particular setup there might or might not be a way to upgrade the forwarder without data loss. It depends on what inputs you have there and what data you're receiving with them.
For example - if you have a scripted or modular input which must periodically query some API endpoint for value, when you bring your HF down your API calls won't get spawned and you won't get data for those particular scheduled points in time. And short of having a relatively complicated "quasi-HA" setup on HFs there is no way around it.
If you're receiving UDP syslogs on that HF - there is also not much you can do unless you can do some network-level reconfiguration to pass that data into another instance.
There might however be some inputs (or sources generating data for those inputs) which might deal with a situation when they do not run continuously - like buffering data on the sending side or - in case of a pull-mode input- reading an accumulated backlog.
So there is no general answer. It all depends on your particular setup and data flow.
Depending on your particular setup there might or might not be a way to upgrade the forwarder without data loss. It depends on what inputs you have there and what data you're receiving with them.
For example - if you have a scripted or modular input which must periodically query some API endpoint for value, when you bring your HF down your API calls won't get spawned and you won't get data for those particular scheduled points in time. And short of having a relatively complicated "quasi-HA" setup on HFs there is no way around it.
If you're receiving UDP syslogs on that HF - there is also not much you can do unless you can do some network-level reconfiguration to pass that data into another instance.
There might however be some inputs (or sources generating data for those inputs) which might deal with a situation when they do not run continuously - like buffering data on the sending side or - in case of a pull-mode input- reading an accumulated backlog.
So there is no general answer. It all depends on your particular setup and data flow.
Hi @blanky
tl;dr; - If you are sending from source to both HF then upgrading one at a time would be fine.
Do your client servers all send to both of your HF? If so they should automatically load balance between the two of them and therefore you will not lose data if you gracefully shutdown one, upgrade it and then ensure it has started successfully before doing the other.
If you are unsure check the outputs.conf on the servers sending to the HF which should have a comma-delimited list under the server key in your tcpout group stanza similar to the below:
[tcpout]
defaultGroup = My_Cluster_1
[tcpout:My_Cluster_1]
disabled=false
server = 10.1.4.32:9997,10.1.4.33:9997
If you are outputting to a single HF then consider adding the secondary if possible, this will give redundancy for when 1 of the HF is offline.
Either way, if you are sending data from Splunk UF/HF to the HF and the HF goes offline, the client server should queue the data so that it sends when the HF connection is restored. The size of the queue will depend on your configuration and knowing if the queue would withstand the downtime would depend on the amount of data the client is sending. For more about queues see https://docs.splunk.com/Documentation/Splunk/latest/Data/Usepersistentqueues
Please let me know how you get on and consider adding karma to this or any other answer if it has helped.
Regards
Will
Use Persistent Queues
Configure persistent queues on your HFs to store data on disk while the Splunk service is stopped.
Edit inputs.conf on each HF to enable persistent queues for your inputs (e.g., set persistentQueueSize to an appropriate value like 1GB or more, depending on your data volume).
Stop the Splunk service, perform the upgrade, and restart.
The HF will process the queued data after restarting.
Data is preserved on disk during the outage and forwarded once the HF is back online.
Requires sufficient disk space and pre-configuration. Not all input types support persistent queues (e.g., HTTP Event Collector doesn’t).
https://docs.splunk.com/Documentation/Splunk/9.4.1/Data/Usepersistentqueues
Options to Upgrade HFs Without Data Loss
If you have two HFs, configure them as a redundant pair with a load balancer or configure your data sources to send data to both HFs (e.g., Syslog can send to multiple destinations).
Steps:-
HF2 continues processing data while HF1 is down, and vice versa, ensuring no data loss.
Your data sources must support sending to multiple endpoints, or you need a load balancer in front of the HFs.