What happens when Universal Forwarder loses its fi...

jhupka · ‎07-20-2016

Has anyone seen what happens to a Universal Forwarder when the filesystem it is running from goes away?

I just found out about some weekend maintenance to our network storage that will cause connectivity issues with the SAN mount points we have our Splunk UFs installed on. I'm not sure what Splunk will do when the mount disappears, and may not have a lot of time to test this scenario.

A few basic thoughts I have on what would occur:

Splunk can’t log its own internal log files
Splunk can’t update its fishbucket data
Splunk can't read/run scripted inputs (not too worried about this, though - it is ok if we are missing that data since it is mostly *nix)
Will Splunk continue forwarding data during this scenario?

How I could approach the handling of this:

Manually Splunk Forwarders down before hand, and manually start up after filesystem comes back (most work for jhupka, but safest scenario)
Let things be, then trick Splunk Forwarders into restarting via Deployment Server after filesystem comes back to start fresh (minimal work/coordination)
Do absolutely nothing (least work, jhupka gets to sleep in on Saturday morning)

sloshburch · ‎07-20-2016

As part of the SAN work, can service splunk stop and service splunk start be completed before/after the work? This assumes that the SAN workers already have coordinated other processes to shut down during the work. I'd be surprised if the SAN work didn't include some coordination of other production processes graceful start/stops.

jhupka · ‎07-20-2016

That is a great idea...although I'm not 100% sure if our sys admins will be available for this and the powers that be are pushing all of this type of work off to be manually handled by each individual application team.

sloshburch · ‎07-21-2016

Interesting. I used to have issues with SHP where the mount would get lost and it was all sorts of bad news for Splunk.

Here's a zany idea: I've used a scripted input of Splunk to restart Splunk. I don't know how much of the scripted input is kept in memory but you could maybe try an experiment of pushing out a scripted input job that runs after the work should be done and just does a splunk restart for the forwarder. For the experiment, try deploying the script to a different filesystem (using the deploymentclient or serverclass settings) so that even when the mount is gone, the splunk forwarder running in memory can still load the restart script.

Some other options:

an enterprise job scheduling tool?
remote ssh has some nifty features for running the same commands remotely across many hosts
make sure splunk is a boot service and ask them to reboot the machines after their work

What happens when Universal Forwarder loses its filesystem?

Developer Spotlight with Paul Stout

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

Data-Driven Success: Splunk & Financial Services