Hi, For us, It was found to be a platform (DNS lookup deadlock) issue rather splunk. You can take a look at the case summary below, ---------------------------------------------------------------------------------------------------------- Analysis/Troubleshooting(if applicable): Watchdog alert is triggering when there’s a busy thread with > 8 second response time 06-23-2020 22:41:16.323 +0200 INFO WatchdogActions - WatchdogActionsManager reload started. 06-23-2020 22:41:16.323 +0200 INFO Watchdog - Starting WatchdogThread for process pid=6433. Threads monitoring is enabled with response timeout set to 8000 ms. 06-24-2020 13:52:42.871 +0200 ERROR Watchdog - No response received from IMonitoredThread=0x7f896bffecc0 within 8000 ms. Looks like thread name='TcpOutEloop' is busy !? Starting to trace with 8000 ms interval. During that time, there's no output in splunkd.log. 06-23-2020 22:42:05.642 +0200 INFO ProxyConfig - Failed to initialize https_proxy from server.conf for splunkd. Please make sure that the https_proxy property is set as https_proxy=http://host:port in case HTTP proxying needs to be enabled. 06-23-2020 22:42:05.642 +0200 INFO ProxyConfig - Failed to initialize the no_proxy setting from server.conf for splunkd. Please provide a valid set of no_proxy rules in case HTTP proxying needs to be enabled. 06-23-2020 22:42:16.312 +0200 INFO DC:HandshakeReplyHandler - Handshake done. 06-24-2020 17:30:04.556 +0200 WARN FileClassifierManager - The file '/var/log/messages-20200623.xz' is invalid. Reason: binary. 06-24-2020 17:30:04.556 +0200 INFO TailReader - Ignoring file '/var/log/messages-20200623.xz' due to: binary 06-24-2020 17:30:08.549 +0200 INFO WatchedFile - File too small to check seekcrc, probably truncated. Will re-read entire file='/var/log/messages'. 06-24-2020 22:41:16.323 +0200 INFO ApplicationLicense - app license disabled by conf setting. Suspect it's a known issue related with SPL-99316 or SPL-184263. After applied the workaround, it didn't fix the issue Collect coredump and pstack, it's deadlock related to DNS lookup(nss_ldap_gethostbyname) Thread 32 (Thread 0x7f49f43ff700 (LWP 24008)): #0 0x00007f4a03501bdd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 #1 0x00007f4a034fc803 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0 #2 0x00007f49f01d385e in ?? () from /usr/lib64/libnss_ldap.so.2 #3 0x00007f49f01d53bc in ?? () from /usr/lib64/libnss_ldap.so.2 #4 0x00007f49f01d7cf4 in _nss_ldap_gethostbyname2_r () from /usr/lib64/libnss_ldap.so.2 #5 0x00007f49f01d7d92 in _nss_ldap_gethostbyname_r () from /usr/lib64/libnss_ldap.so.2 Root Cause(If Applicable): Resolution/Workaround: Based on update from sustaining, it's DNS lookup issue, not Splunk 2. There's similar issue related to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=340601 Suggest to check DNS and network configurations between working and non-working SuSE server 4. As possible workaround, please pick up one non-working SuSE server and replace MUN-SCE-TSISplunkIndex3.europe.shell.com(outputs.conf) with IP address to bypass DNS lookup ----------------------------------------------------------------------------------------------- Regards, Joseph
... View more