Splunk Search

ITSI - Some Notables Are Missing intermittently.

sylim_splunk
Splunk Employee
Splunk Employee

Intermittently some notables have been missing over time where ITSI runs in a SHC env, ITSI 4.2.1 + Splunk 7.2.8 in SHC + Multisite Indexer Cluster.

There are times when correlation searches do NOT create Notable Events. This happens throughout the day but at random times. Most of the time Notable Events are created but there are times when business critical alerts are missed.

Tags (1)
1 Solution

sylim_splunk
Splunk Employee
Splunk Employee

Here's what I found:

i) The search below shows that notables are not created intermittently and happening when it were assigned to one search head.

index=_internal source=/scheduler.log status=success result_count > 0 alert_action="" savedsearch_name=A savedsearch_name=B OR savedsearch_name=C | stats count by host

This is found to be happening 7days ago from the SH.

ii) There are also errors in splunkd.log

ERROR ModularInputs - Unable to initialize modular input "itsi_entity_exchange_consumer"

Timechart suggested that this error started to happen from a specific time, i.e: Oct 26 05:49 and persisting since then. Also the splunkd.log shows it stopped and started back.

Checking logs in /var/log/messages, kernel complaint about out of memory and OOM killer killed kvstore and splunkd at Oct 26 05:46. Then splunk might have been restarted at 05:49.

iii) Ever since then it fails to load python libraries.

10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - Traceback (most recent call last):
10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - File "/users/splunk/etc/apps/SA-ITOA/bin/itsi_event_generator.py", line 8, in
10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - from ITOA.setup_logging import getLogger
10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - File "/users/splunk/etc/apps/SA-ITOA/lib/ITOA/setup_logging.py", line 9, in
10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - from splunk.appserver.mrsparkle.lib import i18n

SNIP

10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - _install_highlighting()
10-29-2019 00:12:49.314 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - File "/users/splunk/lib/python2.7/site-packages/mako/exceptions.py", line 252,
in _install_highlighting
10-29-2019 00:12:49.314 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - _install_fallback()
10-29-2019 00:12:49.314 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - File "/users/splunk/lib/python2.7/site-packages/mako/exceptions.py", line 243,
in _install_fallback
10-29-2019 00:12:49.314 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - from mako.filters import html_escape
10-29-2019 00:12:49.314 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - ImportError: cannot import name html_escape

iv) The error with ImportError suggests it fails to load python lib and pyc file corruption is suspected due to the OOM situation.
On the SH, we followed the steps below so that pyc file should be auto-generated by the interpreter when splunk starts and imports modules.

$ splunk stop
$ cd $SPLUNK_HOME/lib/python2.7/site-packages
$ find . -name "*.pyc" -exec rm -f {} \;
$ splunk start

After this step the issue never came back again.

View solution in original post

sylim_splunk
Splunk Employee
Splunk Employee

Here's what I found:

i) The search below shows that notables are not created intermittently and happening when it were assigned to one search head.

index=_internal source=/scheduler.log status=success result_count > 0 alert_action="" savedsearch_name=A savedsearch_name=B OR savedsearch_name=C | stats count by host

This is found to be happening 7days ago from the SH.

ii) There are also errors in splunkd.log

ERROR ModularInputs - Unable to initialize modular input "itsi_entity_exchange_consumer"

Timechart suggested that this error started to happen from a specific time, i.e: Oct 26 05:49 and persisting since then. Also the splunkd.log shows it stopped and started back.

Checking logs in /var/log/messages, kernel complaint about out of memory and OOM killer killed kvstore and splunkd at Oct 26 05:46. Then splunk might have been restarted at 05:49.

iii) Ever since then it fails to load python libraries.

10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - Traceback (most recent call last):
10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - File "/users/splunk/etc/apps/SA-ITOA/bin/itsi_event_generator.py", line 8, in
10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - from ITOA.setup_logging import getLogger
10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - File "/users/splunk/etc/apps/SA-ITOA/lib/ITOA/setup_logging.py", line 9, in
10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - from splunk.appserver.mrsparkle.lib import i18n

SNIP

10-29-2019 00:12:49.313 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - _install_highlighting()
10-29-2019 00:12:49.314 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - File "/users/splunk/lib/python2.7/site-packages/mako/exceptions.py", line 252,
in _install_highlighting
10-29-2019 00:12:49.314 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - _install_fallback()
10-29-2019 00:12:49.314 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - File "/users/splunk/lib/python2.7/site-packages/mako/exceptions.py", line 243,
in _install_fallback
10-29-2019 00:12:49.314 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - from mako.filters import html_escape
10-29-2019 00:12:49.314 -0700 ERROR sendmodalert - action=itsi_event_generator STDERR - ImportError: cannot import name html_escape

iv) The error with ImportError suggests it fails to load python lib and pyc file corruption is suspected due to the OOM situation.
On the SH, we followed the steps below so that pyc file should be auto-generated by the interpreter when splunk starts and imports modules.

$ splunk stop
$ cd $SPLUNK_HOME/lib/python2.7/site-packages
$ find . -name "*.pyc" -exec rm -f {} \;
$ splunk start

After this step the issue never came back again.

sylim_splunk
Splunk Employee
Splunk Employee
0 Karma
Get Updates on the Splunk Community!

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...

Cloud Platform & Enterprise: Classic Dashboard Export Feature Deprecation

As of Splunk Cloud Platform 9.3.2408 and Splunk Enterprise 9.4, classic dashboard export features are now ...