Getting Data In

What happens when Universal Forwarder loses its filesystem?

jhupka
Path Finder

Has anyone seen what happens to a Universal Forwarder when the filesystem it is running from goes away?

I just found out about some weekend maintenance to our network storage that will cause connectivity issues with the SAN mount points we have our Splunk UFs installed on. I'm not sure what Splunk will do when the mount disappears, and may not have a lot of time to test this scenario.

A few basic thoughts I have on what would occur:

  • Splunk can’t log its own internal log files
  • Splunk can’t update its fishbucket data
  • Splunk can't read/run scripted inputs (not too worried about this, though - it is ok if we are missing that data since it is mostly *nix)
  • Will Splunk continue forwarding data during this scenario?

How I could approach the handling of this:

  • Manually Splunk Forwarders down before hand, and manually start up after filesystem comes back (most work for jhupka, but safest scenario)
  • Let things be, then trick Splunk Forwarders into restarting via Deployment Server after filesystem comes back to start fresh (minimal work/coordination)
  • Do absolutely nothing (least work, jhupka gets to sleep in on Saturday morning)
0 Karma

sloshburch
Splunk Employee
Splunk Employee

As part of the SAN work, can service splunk stop and service splunk start be completed before/after the work? This assumes that the SAN workers already have coordinated other processes to shut down during the work. I'd be surprised if the SAN work didn't include some coordination of other production processes graceful start/stops.

0 Karma

jhupka
Path Finder

That is a great idea...although I'm not 100% sure if our sys admins will be available for this and the powers that be are pushing all of this type of work off to be manually handled by each individual application team.

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Interesting. I used to have issues with SHP where the mount would get lost and it was all sorts of bad news for Splunk.

Here's a zany idea: I've used a scripted input of Splunk to restart Splunk. I don't know how much of the scripted input is kept in memory but you could maybe try an experiment of pushing out a scripted input job that runs after the work should be done and just does a splunk restart for the forwarder. For the experiment, try deploying the script to a different filesystem (using the deploymentclient or serverclass settings) so that even when the mount is gone, the splunk forwarder running in memory can still load the restart script.

Some other options:

  • an enterprise job scheduling tool?
  • remote ssh has some nifty features for running the same commands remotely across many hosts
  • make sure splunk is a boot service and ask them to reboot the machines after their work
0 Karma
Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...