Splunk Enterprise

Splunk 7.2.9.1 Universal forwarder on SUSE Linux12.4 not communicating and forwarding logs to Indexer after certain peri

josephgreenson
New Member

I have noticed Splunk 7.2.9.1 Universal forwarder on SUSE Linux12.4 is not communicating to deployment server and forwarding logs to indexer after certain period of time. "splunkd" process appears to be running while this issue persists. I have to restart UFW for it to resume communication to deployment and forward logs. But this will again stop communication after certain period of time.

I cannot see any specific logs in splunkd.log while this issue occurs. However, i noticed below message from watchdog.log

06-16-2020 11:51:09.055 +0200 ERROR Watchdog - No response received from IMonitoredThread=0x7f24365fdcd0 within 8000 ms. Looks like thread name='Shutdown' is busy !? Starting to trace with 8000 ms interval.

Can somebody help to understand what is causing this issue.

Labels (1)
0 Karma

kiragsplunk
Explorer

In limits.conf, try changing file_tracking_db_threshold_mb in the [inputproc] stanza to a lower value.

0 Karma

CONSORP
Loves-to-Learn Lots

@josephgreenson, I noticed same issue in 7.2.8v of splunk UF and not sending any events to indexers. Observed there's a sudden spike in server logs and causing hung state for few hours and users are noticing delay/latency. Since, user's are enabling debug mode on servers which cause sudden spike in data and which can't change in logs.

Did you find any solution!

0 Karma

josephgreenson
New Member

Hi, 

For us, It was found to be a platform (DNS lookup deadlock) issue rather splunk.  You can take a look at the case summary below, 

----------------------------------------------------------------------------------------------------------

Analysis/Troubleshooting(if applicable):

  1. Watchdog alert is triggering when there’s a busy thread with > 8 second response time

 

06-23-2020 22:41:16.323 +0200 INFO WatchdogActions - WatchdogActionsManager reload started.

06-23-2020 22:41:16.323 +0200 INFO Watchdog - Starting WatchdogThread for process pid=6433. Threads monitoring is enabled with response timeout set to 8000 ms.

06-24-2020 13:52:42.871 +0200 ERROR Watchdog - No response received from IMonitoredThread=0x7f896bffecc0 within 8000 ms. Looks like thread name='TcpOutEloop' is busy !? Starting to trace with 8000 ms interval.

 

  1. During that time, there's no output in splunkd.log.

 

06-23-2020 22:42:05.642 +0200 INFO ProxyConfig - Failed to initialize https_proxy from server.conf for splunkd. Please make sure that the https_proxy property is set as https_proxy=http://host:port in case HTTP proxying needs to be enabled.

06-23-2020 22:42:05.642 +0200 INFO ProxyConfig - Failed to initialize the no_proxy setting from server.conf for splunkd. Please provide a valid set of no_proxy rules in case HTTP proxying needs to be enabled.

06-23-2020 22:42:16.312 +0200 INFO DC:HandshakeReplyHandler - Handshake done.

06-24-2020 17:30:04.556 +0200 WARN FileClassifierManager - The file '/var/log/messages-20200623.xz' is invalid. Reason: binary.

06-24-2020 17:30:04.556 +0200 INFO TailReader - Ignoring file '/var/log/messages-20200623.xz' due to: binary

06-24-2020 17:30:08.549 +0200 INFO WatchedFile - File too small to check seekcrc, probably truncated. Will re-read entire file='/var/log/messages'.

06-24-2020 22:41:16.323 +0200 INFO ApplicationLicense - app license disabled by conf setting.

 

  1. Suspect it's a known issue related with SPL-99316 or SPL-184263. After applied the workaround, it didn't fix the issue

 

  1. Collect coredump and pstack, it's deadlock related to DNS lookup(nss_ldap_gethostbyname)

 

Thread 32 (Thread 0x7f49f43ff700 (LWP 24008)):

#0 0x00007f4a03501bdd in __lll_lock_wait () from /usr/lib64/libpthread.so.0

#1 0x00007f4a034fc803 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0

#2 0x00007f49f01d385e in ?? () from /usr/lib64/libnss_ldap.so.2

#3 0x00007f49f01d53bc in ?? () from /usr/lib64/libnss_ldap.so.2

#4 0x00007f49f01d7cf4 in _nss_ldap_gethostbyname2_r () from /usr/lib64/libnss_ldap.so.2

#5 0x00007f49f01d7d92 in _nss_ldap_gethostbyname_r () from /usr/lib64/libnss_ldap.so.2

 

Root Cause(If Applicable):

 

Resolution/Workaround:

  1. Based on update from sustaining, it's DNS lookup issue, not Splunk 2. There's similar issue related to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=340601
  2. Suggest to check DNS and network configurations between working and non-working SuSE server 4. As possible workaround, please pick up one non-working SuSE server and replace MUN-SCE-TSISplunkIndex3.europe.shell.com(outputs.conf) with IP address to bypass DNS lookup

-----------------------------------------------------------------------------------------------

 

Regards,

Joseph

 

 

0 Karma

kiragsplunk
Explorer

good to know, you could try add entry in /etc/hosts to avoid this dns issue if you want.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...