Why is Splunk universal forwarder crashing randoml...

daniaabujuma · ‎05-04-2023

Hello, I have an issue with my Splunk Universal Forwarder as it keeps randomly stopping on Windows server without any explanation. The error found in the logs is "The SplunkForwarder Service service terminated unexpectedly. It has done this 1 time(s)." Does anyone have any idea what might be causing this issue?

PickleRick · ‎05-04-2023

This question is a bit like "my car broke down, what's going on?" 😉

There can be so many things that can go wrong.

0. What version are you using? Aren't you running out of resources?

1. As @SanjayReddy already mentioned, check the log ($SPLUNK_HOME/var/log/splunk/splunkd.log)

2. If the log doesn't show anything "reasonable" (i.e. you see normal operation logs and then the file is abruptly cut), check if there is any file matching crash-*.log in the same directory. It might show the cause of the crash.

3. The UF very rarely crashes on its own. Check the release notes from your version and subsequent ones to see if there are known bugs which could be affecting you.

daniaabujuma · ‎05-08-2023

Dear @PickleRick and @SanjayReddy

Thanks for replying.

I have checked the following:

Splunk Universal Forwarder for Windows version: 9.0.4 . The release notes state that one of the fixed issues is UF crash on EventLoop::run assert rv > 0 ( https://docs.splunk.com/Documentation/Splunk/9.0.4/ReleaseNotes/Fixedissues ).

I have checked splunkd.log on one of the servers where Splunk has crashed, but unfortunately, I haven't found something reasonable. Here is a part of the logs:

04-16-2023 09:23:34.477 +0300 INFO  TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\etc\splunk.version.

04-16-2023 09:23:34.477 +0300 INFO  TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\log\splunk.

04-16-2023 09:23:34.477 +0300 INFO  TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\log\watchdog.

04-16-2023 09:23:34.477 +0300 INFO  TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\run\splunk\search_telemetry.

04-16-2023 09:23:34.477 +0300 INFO  TailingProcessor [12784 MainTailingThread] - Adding watch on path: C:\Program Files\SplunkUniversalForwarder\var\spool\splunk.

04-16-2023 09:23:34.493 +0300 INFO  TcpOutputProc [12564 parsing] - _isHttpOutConfigured=NOT_CONFIGURED

04-16-2023 09:23:34.493 +0300 ERROR TcpOutputProc [12564 parsing] - LightWeightForwarder/UniversalForwarder not configured. Please configure outputs.conf.

04-16-2023 09:23:34.493 +0300 INFO  ConfigWatcher [11140 SplunkConfigChangeWatcherThread] - SplunkConfigChangeWatcher initializing...

04-16-2023 09:23:34.493 +0300 INFO  ConfigWatcher [11140 SplunkConfigChangeWatcherThread] - Watching path: C:\Program Files\SplunkUniversalForwarder\etc\system\local, C:\Program Files\SplunkUniversalForwarder\etc\system\default, C:\Program Files\SplunkUniversalForwarder\etc\apps, C:\Program Files\SplunkUniversalForwarder\etc\users, C:\Program Files\SplunkUniversalForwarder\etc\peer-apps, C:\Program Files\SplunkUniversalForwarder\etc\instance.cfg

04-16-2023 09:23:34.493 +0300 INFO  ConfigWatcher [11140 SplunkConfigChangeWatcherThread] - Finding the deleted watched configuration files (while splunkd was down) completed in duration=0 secs

04-16-2023 09:23:34.493 +0300 INFO  loader [6504 HTTPDispatch] - Limiting REST HTTP server to 3333 sockets

04-16-2023 09:23:34.493 +0300 INFO  loader [6504 HTTPDispatch] - Limiting REST HTTP server to 1365 threads

04-16-2023 09:23:34.493 +0300 WARN  X509Verify [6504 HTTPDispatch] - X509 certificate (O=SplunkUser,CN=SplunkServerDefaultCert) should not be used, as it is issued by Splunk's own default Certificate Authority (CA). This puts your Splunk instance at very high-risk of the MITM attack. Either commercial-CA-signed or self-CA-signed certificates must be used; see: <http://docs.splunk.com/Documentation/Splunk/latest/Security/Howtoself-signcertificates>

04-16-2023 09:23:34.540 +0300 INFO  UiHttpListener [10672 WebuiStartup] - Web UI disabled in web.conf [settings]; not starting

04-16-2023 09:23:40.524 +0300 WARN  TailReader [5296 tailreader0] - Could not send data to output queue (parsingQueue), retrying...

04-16-2023 09:23:58.338 +0300 INFO  loader [6504 HTTPDispatch] - Shutdown HTTPDispatchThread

04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - Shutting down splunkd

04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_Begin"

04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_NoahHealthReport"

04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_FileIntegrityChecker"

04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_JustBeforeKVStore"

04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_KVStore"

04-16-2023 09:23:58.338 +0300 INFO  CollectionCacheManager [10296 CollectionCacheBookkeepingThread] - CollectionCacheBookkeepingThread finished eloop

04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_DFM"

04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_Thruput"

04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_FederatedHeartBeat"

04-16-2023 09:23:58.338 +0300 INFO  Shutdown [3020 Shutdown] - shutting down level="ShutdownLevel_TcpInput1"

04-16-2023 09:23:58.338 +0300 INFO  TcpInputProc [3020 Shutdown] - Running shutdown level 1. Closing listening ports.

04-16-2023 09:23:58.338 +0300 INFO  TcpInputProc [3020 Shutdown] - Done setting shutdown in progress signal.

04-16-2023 09:23:58.338 +0300 INFO  TcpInputProc [10636 TcpListener] - Shutting down listening ports

PickleRick · ‎05-11-2023

It looks more or less like a normal shutdown but there is one interesting thing.

04-16-2023 09:23:40.524 +0300 WARN  TailReader [5296 tailreader0] - Could not send data to output queue (parsingQueue), retrying...

This one.

It suggests (but I'm just shooting blindly here mostly) that there might be some overuse of memory and blocked queues and so on.

But again - is there any crash log in forwarder's log directory?

daniaabujuma · ‎05-11-2023

Hi @PickleRick

Thank you for your reply.

The weird thing is that the server itself didn't shutdown, only Splunk service. And regarding crash log, no I didn't find crash log in any of the servers facing this issue.

PickleRick · ‎05-11-2023

No, server as such should not crash. Why would it? That's what the OS-level resource management is for 😉

Anyway, how many and what kinds of inputs do you have on this box? Are you hitting any limits? (like throttling outputs and building up queues on the forwarder side) What are the server's specs? (RAM/CPU). Is it busy otherwise?

SanjayReddy · ‎05-04-2023

Hi @daniaabujuma

can you please check any error messages in splunkd.log in the <Splunk installation dir>\var\log\splunk

depending one the error message we need to troubleshoot furthur.

Why is Splunk universal forwarder crashing randomly?

administration

configuration

installation

splunk-assist

troubleshooting

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...