I have a question regarding timeouts and return codes when Splunk is shutting down a cluster peer on a Linux system.
I ran a script that issues a "splunk offline", waits for the command to return, and then starts the next action unless the previous command comes back with a non-zero return code.
If that happens, the script stops and asks for the user's input, to either abort, retry, skip, or continue.
We encountered a situation where the offlining ran into a timeout and the command returned with Splunk still being in the process of terminating.
However, the script started the next command (which then stopped the flow when it detected an inconsistency), indicating that we received a ERR_NOERR return code from Splunk.
Is that expected Splunk behaviour?
Short info about the environment:
Splunk 6.6.5 (build b119a2a8b0ad)
multisite Indexer-Cluster with 16 peers
Shutting down Splunk on can take a while if the box is performing lots of searches as it will wait for these to stop.
Clustered Indexers can also take a long time as they try to finalise operations before the process quits.
Instead of waiting for the return from the offline command, I would poll the output from ./splunk status instead