Getting Data In

Looking for a workaround for Windows UFs not starting up after an improper shutdown (SPL-36597)

martin_mueller
SplunkTrust
SplunkTrust

Occasionally, our Windows terminal servers kill the UF service during shutdown, leaving in a stale .pid file behind. This results in Splunk not starting up, requiring manual interaction. With a large number of Windows machines that's not an option, I'm looking for a workaround - Splunk Support currently doesn't have a bug-fix schedule for me.

I see two ways: Either clean up the .pid file through a custom script/whatever, or make the UF shut down more quickly to skirt around the killing of the service during Windows shutdown.
How do you handle this issue?

Tags (3)
1 Solution

dshakespeare_sp
Splunk Employee
Splunk Employee

http://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/6.1.4

Resolved : Startup script should handle stale PID files gracefully after server crashes. (SPL-36597)

,

View solution in original post

dshakespeare_sp
Splunk Employee
Splunk Employee

http://docs.splunk.com/Documentation/Splunk/latest/ReleaseNotes/6.1.4

Resolved : Startup script should handle stale PID files gracefully after server crashes. (SPL-36597)

,

dshakespeare_sp
Splunk Employee
Splunk Employee

This should work for both *ix and Windows I believe

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Awesome, thanks David!

I don't have the means to test right now, does that fix apply to both full and UF installs on both Windows and Unix platforms?

0 Karma

letienne
Path Finder

I have the same issue.

I just opened a support case to see if they have any plan to find a viable solution ([141636])

Kind regards,

martin_mueller
SplunkTrust
SplunkTrust

I have now tested this with 6.0, and the issue exists there as well.

martin_mueller
SplunkTrust
SplunkTrust

The key problem here is that during startup the forwarder (both Windows and Linux) do only check whether their old PID exists as a process, but do not check whether that process actually is a Splunk process. As a result, the forwarder believes it already is running if a different process happens to have the old Splunk PID.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I've done some more testing, and 5.0.5 does not fix this issue. Additionally, I've now found precise steps to reproduce, tested under 5.0.1 and 5.0.5:

  • Start a UF as a Windows Service
  • Kill the process
  • Edit the conf-mutator.pid file in /var/run/splunk to change the PID to the PID of an existing process. This simulates that during the UF being down a different process has been assigned its old PID. During system startup the chances for this are considerable.
  • Attempt to start the UF service. This will fail with logged events like this: FATAL loader - Timed out waiting for config lock

martin_mueller
SplunkTrust
SplunkTrust

In the release notes for 5.0.5 I see this entry:

• „Splunk on Windows does not start/restart properly with deployment server, fails with FATAL loader - Timed out waiting for config lock; see splunkd_stderr.log for details. Exiting. (SPL-70075)

This feels similar to my issue, the logged events in case of a start failure are the same. Can anyone confirm this feeling?

0 Karma

MuS
Legend

Hi martin_mueller

you forgot the third option, increasing the waittokill registry entry in Windows. I did not test it, but maybe this would be a way to go for you.

In HEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control you will find the string WaitToKillServiceTimeout.

If you double click it and then in the > Edit String window, change the > Value data from the default of 12000 (12 seconds) to whatever. (Click OK to save the change).

hope this helps...

cheers, MuS

martin_mueller
SplunkTrust
SplunkTrust

Thanks for your input. That should work on its own, I'll have to see how feasible it is around here to change registry settings on thousands of machines.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...