Splunk Enterprise

Why is Splunk universal forwarder crashing randomly?

daniaabujuma
Explorer

Hello, I have an issue with my Splunk Universal Forwarder as it keeps randomly stopping on Windows server without any explanation. The error found in the logs is "The SplunkForwarder Service service terminated unexpectedly.  It has done this 1 time(s)." Does anyone have any idea what might be causing this issue?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

This question is a bit like "my car broke down, what's going on?" 😉

There can be so many things that can go wrong.

0. What version are you using? Aren't you running out of resources?

1. As @SanjayReddy already mentioned, check the log ($SPLUNK_HOME/var/log/splunk/splunkd.log)

2. If the log doesn't show anything "reasonable" (i.e. you see normal operation logs and then the file is abruptly cut), check if there is any file matching crash-*.log in the same directory. It might show the cause of the crash.

3. The UF very rarely crashes on its own. Check the release notes from your version and subsequent ones to see if there are known bugs which could be affecting you.

0 Karma

daniaabujuma
Explorer

Dear @PickleRick  and @SanjayReddy 

Thanks for replying.

I have checked the following:

  1. Splunk Universal Forwarder for Windows version: 9.0.4 . The release notes state that one of the fixed issues is UF crash on EventLoop::run assert rv > 0 ( https://docs.splunk.com/Documentation/Splunk/9.0.4/ReleaseNotes/Fixedissues ).
  2. I have checked splunkd.log on one of the servers where Splunk has crashed, but unfortunately, I haven't found something reasonable. Here is a part of the logs:
    • 04-16-2023 09:23:34.477 +0300 INFO  TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\etc\splunk.version.

      04-16-2023 09:23:34.477 +0300 INFO  TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\log\splunk.

      04-16-2023 09:23:34.477 +0300 INFO  TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\log\watchdog.

      04-16-2023 09:23:34.477 +0300 INFO  TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\run\splunk\search_telemetry.

      04-16-2023 09:23:34.477 +0300 INFO  TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\spool\splunk.

      04-16-2023 09:23:34.493 +0300 INFO  TcpOutputProc [12564 parsing] - _isHttpOutConfigured=NOT_CONFIGURED

      04-16-2023 09:23:34.493 +0300 ERROR TcpOutputProc [12564 parsing] - LightWeightForwarder/UniversalForwarder not configured. Please configure outputs.conf.

      04-16-2023 09:23:34.493 +0300 INFO  ConfigWatcher [11140 SplunkConfigChangeWatcherThread] - SplunkConfigChangeWatcher initializing...

      04-16-2023 09:23:34.493 +0300 INFO  ConfigWatcher [11140 SplunkConfigChangeWatcherThread] - Watching path: C:\Program Files\SplunkUniversalForwarder\etc\system\local, C:\Program Files\SplunkUniversalForwarder\etc\system\default, C:\Program Files\SplunkUniversalForwarder\etc\apps, C:\Program Files\SplunkUniversalForwarder\etc\users, C:\Program Files\SplunkUniversalForwarder\etc\peer-apps, C:\Program Files\SplunkUniversalForwarder\etc\instance.cfg

      04-16-2023 09:23:34.493 +0300 INFO  ConfigWatcher [11140 SplunkConfigChangeWatcherThread] - Finding the deleted watched configuration files (while splunkd was down) completed in duration=0 secs

      04-16-2023 09:23:34.493 +0300 INFO  loader [6504 HTTPDispatch] - Limiting REST HTTP server to 3333 sockets

      04-16-2023 09:23:34.493 +0300 INFO  loader [6504 HTTPDispatch] - Limiting REST HTTP server to 1365 threads

      04-16-2023 09:23:34.493 +0300 WARN  X509Verify [6504 HTTPDispatch] - X509 certificate (O=SplunkUser,CN=SplunkServerDefaultCert) should not be used, as it is issued by Splunk's own default Certificate Authority (CA). This puts your Splunk instance at very high-risk of the MITM attack. Either commercial-CA-signed or self-CA-signed certificates must be used; see: <http://docs.splunk.com/Documentation/Splunk/latest/Security/Howtoself-signcertificates>

      04-16-2023 09:23:34.540 +0300 INFO  UiHttpListener [10672 WebuiStartup] - Web UI disabled in web.conf [settings]; not starting

      04-16-2023 09:23:40.524 +0300 WARN  TailReader [5296 tailreader0] - Could not send data to output queue (parsingQueue), retrying...

      04-16-2023 09:23:58.338 +0300 INFO  loader [6504 HTTPDispatch] - Shutdown HTTPDispatchThread

      04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - Shutting down splunkd

      04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_Begin"

      04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_NoahHealthReport"

      04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_FileIntegrityChecker"

      04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_JustBeforeKVStore"

      04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_KVStore"

      04-16-2023 09:23:58.338 +0300 INFO  CollectionCacheManager [10296 CollectionCacheBookkeepingThread] - CollectionCacheBookkeepingThread finished eloop

      04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_DFM"

      04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_Thruput"

      04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_FederatedHeartBeat"

      04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_TcpInput1"

      04-16-2023 09:23:58.338 +0300 INFO  TcpInputProc [3020 Shutdown] - Running shutdown level 1. Closing listening ports.

      04-16-2023 09:23:58.338 +0300 INFO  TcpInputProc [3020 Shutdown] - Done setting shutdown in progress signal.

      04-16-2023 09:23:58.338 +0300 INFO  TcpInputProc [10636 TcpListener] - Shutting down listening ports

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

It looks more or less like a normal shutdown but there is one interesting thing.

04-16-2023 09:23:40.524 +0300 WARN  TailReader [5296 tailreader0] - Could not send data to output queue (parsingQueue), retrying...

 This one.

It suggests (but I'm just shooting blindly here mostly) that there might be some overuse of memory and blocked queues and so on.

But again - is there any crash log in forwarder's log directory?

0 Karma

daniaabujuma
Explorer

Hi @PickleRick 

Thank you for your reply.

The weird thing is that the server itself didn't shutdown, only Splunk service. And regarding crash log, no I didn't find crash log in any of the servers facing this issue.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No, server as such should not crash. Why would it? That's what the OS-level resource management is for 😉

Anyway, how many and what kinds of inputs do you have on this box? Are you hitting any limits? (like throttling outputs and building up queues on the forwarder side) What are the server's specs? (RAM/CPU). Is it busy otherwise?

0 Karma

SanjayReddy
SplunkTrust
SplunkTrust

Hi @daniaabujuma 

can you please check  any error messages in splunkd.log in the <Splunk installation dir>\var\log\splunk

depending one the error message we need to troubleshoot furthur. 

0 Karma
Get Updates on the Splunk Community!

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...

Cloud Platform & Enterprise: Classic Dashboard Export Feature Deprecation

As of Splunk Cloud Platform 9.3.2408 and Splunk Enterprise 9.4, classic dashboard export features are now ...

Explore the Latest Educational Offerings from Splunk (November Releases)

At Splunk Education, we are committed to providing a robust learning experience for all users, regardless of ...