Hello,
It seems that my current process of quarantining a search peer and then running 'splunk offline' causes searches to become zombified.
"This search has encountered a fatal error and has been marked as zombied."
Is it best practice to quarantine the peer before or after running the splunk offline command? I know that running 'splunk offline' graceful haults new searches from reaching that indexer but for some reason, I think there is an interference when quarantining the host first and then running 'splunk offline'.
Thoughts?
I've never seen anyone quarantine an indexer before stopping it as the offline command accomplishes the same thing. Since using quarantine seems to cause problems for you, you should stop doing it.
Thanks richgalloway. My goal is upgrade the indexer cluster without the end user seeing warnings when a peer goes down for the upgrade. Since removing the quarantine task, things are a little better,however, I am still occasionally get the "connection refused for peer=x" when the peer goes down via 'splunk offline' and a search was ran at the same time.
Is it nearly impossible to perform an indexer cluster upgrade without a few "connection refused" warnings when a search is ran during a peer being down?
Many customers will perform upgrade during off-peak hours in part to reduce this problem. I know of no way to avoid it completely.