On one of our heavy forwarders (Windows 2012 R2 server) we're having an issue when an app is updated/installed from the deployment server and a restart is required that the splunkd service is shutting down but failing to restart. This then requires a manual start of the service again.
The last two logs before I manually restarted the service where:
11-14-2017 00:02:54.925 +0000 WARN ProcessRunner - Process with pid 11776 did not exit within a given grace period after being signaled to exit. Will have to forcibly terminate.
11-14-2017 00:02:55.263 +0000 INFO loader - All pipelines finished.
Thanks
Came across this post, I've added these to the release notes a while ago:
http://docs.splunk.com/Documentation/DBX/3.1.3/ReleaseNotes/Releasenotes#Known_issues
DBX-4603
Windows Only: Hitting debug/refresh endpoint with DB Connect installed makes splunkweb not restart
Workaround: Restart splunk through services.msc
DBX-4387, DBX-4383
Splunk cannot be restarted via web interface, deployer or deployment server on Windows
Workaround: Use the CLI or the services control panel (services.msc) to restart Splunk
Any update on this?
I am seeing this on several heavy forwarders in our environment...
Splunk Support have advised it's a known issue due to DB Connect on the HF.
A fix is currently being tested and should be out in the next release of DB Connect
Nice.
Thank you!
Did you find a fix for this issue? We're facing the same thing. We setup an alert to run every 15 minutes to check that the HFs are up:
| inputlookup splunk_critical_systems.csv
| JOIN type=outer host
[ search *
| stats count AS currentcount by host ]
| eval status=if(ignore="true", "ignored", if(isnull(currentcount), "missing", "present"))
| table host status
| search status=missing
LOOKUP TABLE
host || ignore
HF1 || false
HF2 || false
HF_LINUX || true
Still no fix, nice alert! I'll try to implement that in our environment to help detect the problem.