I have noticed Splunk 7.2.9.1 Universal forwarder on SUSE Linux12.4 is not communicating to deployment server and forwarding logs to indexer after certain period of time. "splunkd" process appears to be running while this issue persists. I have to restart UFW for it to resume communication to deployment and forward logs. But this will again stop communication after certain period of time.
I cannot see any specific logs in splunkd.log while this issue occurs. However, i noticed below message from watchdog.log
06-16-2020 11:51:09.055 +0200 ERROR Watchdog - No response received from IMonitoredThread=0x7f24365fdcd0 within 8000 ms. Looks like thread name='Shutdown' is busy !? Starting to trace with 8000 ms interval.
Can somebody help to understand what is causing this issue.
In limits.conf, try changing file_tracking_db_threshold_mb in the [inputproc] stanza to a lower value.
@josephgreenson, I noticed same issue in 7.2.8v of splunk UF and not sending any events to indexers. Observed there's a sudden spike in server logs and causing hung state for few hours and users are noticing delay/latency. Since, user's are enabling debug mode on servers which cause sudden spike in data and which can't change in logs.
Did you find any solution!
Hi,
For us, It was found to be a platform (DNS lookup deadlock) issue rather splunk. You can take a look at the case summary below,
----------------------------------------------------------------------------------------------------------
Analysis/Troubleshooting(if applicable):
06-23-2020 22:41:16.323 +0200 INFO WatchdogActions - WatchdogActionsManager reload started.
06-23-2020 22:41:16.323 +0200 INFO Watchdog - Starting WatchdogThread for process pid=6433. Threads monitoring is enabled with response timeout set to 8000 ms.
06-24-2020 13:52:42.871 +0200 ERROR Watchdog - No response received from IMonitoredThread=0x7f896bffecc0 within 8000 ms. Looks like thread name='TcpOutEloop' is busy !? Starting to trace with 8000 ms interval.
06-23-2020 22:42:05.642 +0200 INFO ProxyConfig - Failed to initialize https_proxy from server.conf for splunkd. Please make sure that the https_proxy property is set as https_proxy=http://host:port in case HTTP proxying needs to be enabled.
06-23-2020 22:42:05.642 +0200 INFO ProxyConfig - Failed to initialize the no_proxy setting from server.conf for splunkd. Please provide a valid set of no_proxy rules in case HTTP proxying needs to be enabled.
06-23-2020 22:42:16.312 +0200 INFO DC:HandshakeReplyHandler - Handshake done.
06-24-2020 17:30:04.556 +0200 WARN FileClassifierManager - The file '/var/log/messages-20200623.xz' is invalid. Reason: binary.
06-24-2020 17:30:04.556 +0200 INFO TailReader - Ignoring file '/var/log/messages-20200623.xz' due to: binary
06-24-2020 17:30:08.549 +0200 INFO WatchedFile - File too small to check seekcrc, probably truncated. Will re-read entire file='/var/log/messages'.
06-24-2020 22:41:16.323 +0200 INFO ApplicationLicense - app license disabled by conf setting.
Thread 32 (Thread 0x7f49f43ff700 (LWP 24008)):
#0 0x00007f4a03501bdd in __lll_lock_wait () from /usr/lib64/libpthread.so.0
#1 0x00007f4a034fc803 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0
#2 0x00007f49f01d385e in ?? () from /usr/lib64/libnss_ldap.so.2
#3 0x00007f49f01d53bc in ?? () from /usr/lib64/libnss_ldap.so.2
#4 0x00007f49f01d7cf4 in _nss_ldap_gethostbyname2_r () from /usr/lib64/libnss_ldap.so.2
#5 0x00007f49f01d7d92 in _nss_ldap_gethostbyname_r () from /usr/lib64/libnss_ldap.so.2
Root Cause(If Applicable):
Resolution/Workaround:
-----------------------------------------------------------------------------------------------
Regards,
Joseph
good to know, you could try add entry in /etc/hosts to avoid this dns issue if you want.