I'm encountering an issue where, after changing or creating a new script-based app that runs periodically (e.g. once per hour), the forwarders check-in, deploy the app, then all, perhaps 1000's, restart and execute the script at effectively the same time. Among other issues, this causes an unwanted spike in traffic.
To fix this, either the forwarders need to check in after longer, hopefully random times, or the forwarders need to not all restart at the same time. I'd like to not have to add a random wait at the start of each script if I can help it.
Is there a way to have the forwarders restart after a random delay generated by each forwarder?
I think this should be a feature of the Splunk forwarders.
Thanks!
Assuming *NIX and a random interval from 1 to 100 seconds:
Do NOT use restartSplunkd
in your serverclass.conf
Run this command once:
/bin/cksum /opt/splunk/etc/apps | /bin/sed "s/ .*$//" > /tmp/splunkcksum.txt; echo /tmp/splunkcksum.txt
Then add this cron job
*/5 * * * * if [ $(/bin/cksum /opt/splunk/etc/apps | /bin/sed "s/ .*$//") -ne $(cat /tmp/splunkcksum.txt) ]; then /bin/sleep $(((RANDOM%100 )+1)); /opt/splunk/bin/splunk restart; /bin/cksum /opt/splunk/etc/apps | /bin/sed "s/ .*$//" > /tmp/splunkcksum.txt; fi
Obviously, this will need to be adjusted to fit.
@woodcock - why shouldn't we use restartSplunkd
in the serverclass.conf?
You should in the normal situation, but in this situation, using this cron job to do the restarts so that they are more staggered, you should not (that is the whole point of the question).
Got it ; -) thank you!