Hello, I have an issue with my Splunk Universal Forwarder as it keeps randomly stopping on Windows server without any explanation. The error found in the logs is "The SplunkForwarder Service service terminated unexpectedly. It has done this 1 time(s)." Does anyone have any idea what might be causing this issue?
This question is a bit like "my car broke down, what's going on?" 😉
There can be so many things that can go wrong.
0. What version are you using? Aren't you running out of resources?
1. As @SanjayReddy already mentioned, check the log ($SPLUNK_HOME/var/log/splunk/splunkd.log)
2. If the log doesn't show anything "reasonable" (i.e. you see normal operation logs and then the file is abruptly cut), check if there is any file matching crash-*.log in the same directory. It might show the cause of the crash.
3. The UF very rarely crashes on its own. Check the release notes from your version and subsequent ones to see if there are known bugs which could be affecting you.
Dear @PickleRick and @SanjayReddy
Thanks for replying.
I have checked the following:
04-16-2023 09:23:34.477 +0300 INFO TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\etc\splunk.version.
04-16-2023 09:23:34.477 +0300 INFO TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\log\splunk.
04-16-2023 09:23:34.477 +0300 INFO TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\log\watchdog.
04-16-2023 09:23:34.477 +0300 INFO TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\run\splunk\search_telemetry.
04-16-2023 09:23:34.477 +0300 INFO TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\spool\splunk.
04-16-2023 09:23:34.493 +0300 INFO TcpOutputProc [12564 parsing] - _isHttpOutConfigured=NOT_CONFIGURED
04-16-2023 09:23:34.493 +0300 ERROR TcpOutputProc [12564 parsing] - LightWeightForwarder/UniversalForwarder not configured. Please configure outputs.conf.
04-16-2023 09:23:34.493 +0300 INFO ConfigWatcher [11140 SplunkConfigChangeWatcherThread] - SplunkConfigChangeWatcher initializing...
04-16-2023 09:23:34.493 +0300 INFO ConfigWatcher [11140 SplunkConfigChangeWatcherThread] - Watching path: C:\Program Files\SplunkUniversalForwarder\etc\system\local, C:\Program Files\SplunkUniversalForwarder\etc\system\default, C:\Program Files\SplunkUniversalForwarder\etc\apps, C:\Program Files\SplunkUniversalForwarder\etc\users, C:\Program Files\SplunkUniversalForwarder\etc\peer-apps, C:\Program Files\SplunkUniversalForwarder\etc\instance.cfg
04-16-2023 09:23:34.493 +0300 INFO ConfigWatcher [11140 SplunkConfigChangeWatcherThread] - Finding the deleted watched configuration files (while splunkd was down) completed in duration=0 secs
04-16-2023 09:23:34.493 +0300 INFO loader [6504 HTTPDispatch] - Limiting REST HTTP server to 3333 sockets
04-16-2023 09:23:34.493 +0300 INFO loader [6504 HTTPDispatch] - Limiting REST HTTP server to 1365 threads
04-16-2023 09:23:34.493 +0300 WARN X509Verify [6504 HTTPDispatch] - X509 certificate (O=SplunkUser,CN=SplunkServerDefaultCert) should not be used, as it is issued by Splunk's own default Certificate Authority (CA). This puts your Splunk instance at very high-risk of the MITM attack. Either commercial-CA-signed or self-CA-signed certificates must be used; see: <http://docs.splunk.com/Documentation/Splunk/latest/Security/Howtoself-signcertificates>
04-16-2023 09:23:34.540 +0300 INFO UiHttpListener [10672 WebuiStartup] - Web UI disabled in web.conf [settings]; not starting
04-16-2023 09:23:40.524 +0300 WARN TailReader [5296 tailreader0] - Could not send data to output queue (parsingQueue), retrying...
04-16-2023 09:23:58.338 +0300 INFO loader [6504 HTTPDispatch] - Shutdown HTTPDispatchThread
04-16-2023 09:23:58.338 +0300 INFO Shutdown [3020 Shutdown] - Shutting down splunkd
04-16-2023 09:23:58.338 +0300 INFO Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_Begin"
04-16-2023 09:23:58.338 +0300 INFO Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_NoahHealthReport"
04-16-2023 09:23:58.338 +0300 INFO Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_FileIntegrityChecker"
04-16-2023 09:23:58.338 +0300 INFO Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_JustBeforeKVStore"
04-16-2023 09:23:58.338 +0300 INFO Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_KVStore"
04-16-2023 09:23:58.338 +0300 INFO CollectionCacheManager [10296 CollectionCacheBookkeepingThread] - CollectionCacheBookkeepingThread finished eloop
04-16-2023 09:23:58.338 +0300 INFO Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_DFM"
04-16-2023 09:23:58.338 +0300 INFO Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_Thruput"
04-16-2023 09:23:58.338 +0300 INFO Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_FederatedHeartBeat"
04-16-2023 09:23:58.338 +0300 INFO Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_TcpInput1"
04-16-2023 09:23:58.338 +0300 INFO TcpInputProc [3020 Shutdown] - Running shutdown level 1. Closing listening ports.
04-16-2023 09:23:58.338 +0300 INFO TcpInputProc [3020 Shutdown] - Done setting shutdown in progress signal.
04-16-2023 09:23:58.338 +0300 INFO TcpInputProc [10636 TcpListener] - Shutting down listening ports
It looks more or less like a normal shutdown but there is one interesting thing.
04-16-2023 09:23:40.524 +0300 WARN TailReader [5296 tailreader0] - Could not send data to output queue (parsingQueue), retrying...
This one.
It suggests (but I'm just shooting blindly here mostly) that there might be some overuse of memory and blocked queues and so on.
But again - is there any crash log in forwarder's log directory?
Hi @PickleRick
Thank you for your reply.
The weird thing is that the server itself didn't shutdown, only Splunk service. And regarding crash log, no I didn't find crash log in any of the servers facing this issue.
No, server as such should not crash. Why would it? That's what the OS-level resource management is for 😉
Anyway, how many and what kinds of inputs do you have on this box? Are you hitting any limits? (like throttling outputs and building up queues on the forwarder side) What are the server's specs? (RAM/CPU). Is it busy otherwise?
Hi @daniaabujuma
can you please check any error messages in splunkd.log in the <Splunk installation dir>\var\log\splunk
depending one the error message we need to troubleshoot furthur.