Has anyone seen what happens to a Universal Forwarder when the filesystem it is running from goes away?
I just found out about some weekend maintenance to our network storage that will cause connectivity issues with the SAN mount points we have our Splunk UFs installed on. I'm not sure what Splunk will do when the mount disappears, and may not have a lot of time to test this scenario.
A few basic thoughts I have on what would occur:
How I could approach the handling of this:
As part of the SAN work, can service splunk stop
and service splunk start
be completed before/after the work? This assumes that the SAN workers already have coordinated other processes to shut down during the work. I'd be surprised if the SAN work didn't include some coordination of other production processes graceful start/stops.
That is a great idea...although I'm not 100% sure if our sys admins will be available for this and the powers that be are pushing all of this type of work off to be manually handled by each individual application team.
Interesting. I used to have issues with SHP where the mount would get lost and it was all sorts of bad news for Splunk.
Here's a zany idea: I've used a scripted input of Splunk to restart Splunk. I don't know how much of the scripted input is kept in memory but you could maybe try an experiment of pushing out a scripted input job that runs after the work should be done and just does a splunk restart for the forwarder. For the experiment, try deploying the script to a different filesystem (using the deploymentclient or serverclass settings) so that even when the mount is gone, the splunk forwarder running in memory can still load the restart script.
Some other options: