Splunk Enterprise Security

Splunk App for Enterprise Security 3.2.1: Why am I getting hourly script failure errors with configuration_check.py on the search head?

Path Finder

I realize this will be simple for someone with more experience than I have. Running 2 search heads, 2 indexers, management server feeding from our centralized rsyslog server for all information. 1 head and indexer is dedicated to Enterprise security (ES). On the ES search head only--I am getting the following set of "error messages" hourly.
A simple search using source=*python_modular_input.log index=_internal configuration_check.py shows:

2015-02-09 13:00:00,591 INFO  pid=17738 tid=MainThread file=configuration_check.py::184 | status="starting"
2015-02-09 13:00:00,592 INFO  pid=17738 tid=MainThread file=__init__.py:execute:883 | Execute called
2015-02-09 13:00:00,596 INFO  pid=17738 tid=MainThread file=configuration_check.py:run:85 | status="executing"
2015-02-09 13:00:00,596 INFO  pid=17738 tid=MainThread file=configuration_check.py:run:91 | status="retrieved task" task="confcheck_script_errors"
2015-02-09 13:00:00,690 INFO  pid=17738 tid=MainThread file=configuration_check.py:run:101 | 
   status="enabled UI message suppression" task="confcheck_script_errors" 
   pattern="((streamfwd|splunk-(wmi\.path|MonitorNoHandle\.exe|winevtlog\.exe|netmon\.exe|perfmon\.exe|regmon\.exe|winprintmon\.exe|admon\.exe)).
   *exited with code 1)"
2015-02-09 13:00:00,719 ERROR pid=17738 tid=MainThread file=configuration_check.py:run:153 | 
   status="completed" task="confcheck_script_errors" message="msg="A script exited abnormally" 
   input="/opt/splunk/etc/apps/SA-Utils/bin/configuration_check.py" 
   stanza="configuration_check://confcheck_related_searches_not_enabled" 
   status="exited with code 2""
2015-02-09 13:00:00,743 INFO  pid=17738 tid=MainThread file=configuration_check.py:run:180 | status="exiting" exit_status="0"

with the error msg showing up at the console. I have chased this through every log file, found the inputs.conf, tried btool, but nothing leads to what exactly this is looking for. Which "related" search is not there?

I would appreciate any help in resolving this; however, I would also like the "why & how" so I will be able to chase this down in the future. We do have SoS but it is on the general search head which makes the ES search head the only server it cannot see.

1 Solution

Splunk Employee
Splunk Employee

Looking at the log message on the initial post, that does not appear to be an error with a search not being enabled. The script is actually erroring out with a non-zero exit code: "exited with code 2". Looking at configuration_check.py, the exit code "2" corresponds to "ERR_REST_EXC", indicating that there was an exception making a REST call as part of the configuration check. This indicates some other issue with the system. In this case the exception should be logged to configuration_check.log and I would begin looking there for answers, specifically for Python tracebacks that correspond in time to the log messages in python_modular_input.log.

View solution in original post

Champion

If it helps, we were able to find our issue with the following search:

index=_internal configuration_check rest sourcetype=python_modular_input

In our case, we had a correlation search that was created but something happened - maybe during creation or replication (sh cluster) that kind of hung. Someone tried to delete the search in Slunk Web which looked like it worked. But in fact, it was still there on the back end. Once found, we deleted it and restarted splunk. That resolved it for us.

Thanks,

0 Karma

Splunk Employee
Splunk Employee

That's correct and is probably how this problem gets introduced most frequently, although it's probably not the only way it can be introduced. Deletion of Enterprise Security correlation searches via the normal Splunk Manager GUI is not supported because a Correlation Search consists of several configuration file artifacts - stanzas in savedsearches.conf and correlationsearches.conf. The normal Splunk Manager page for managing searches is not aware of the other configuration files. So, deleting a Correlation Search via the unapproved mechanism will leave the system in an inconsistent state. All management of Correlation Searches is expected to occur through the configuration pages in the Enterprise Security app - which do not currently support deletion to my knowledge, only disabling.

We've rectified the error in our configuration check in the next ES release to more properly reflect the condition - we now warn that you have an "orphaned" stanza in correlationsearches.conf, possibly as a result of an attempt to manage a Correlation Search via the unapproved mechanism.

Path Finder

Thanks and Kudos to the support people at Splunk--I believe there has been a resolution I probably would have never found.

In this case the app causing the friction was DA-ESS-IdentityManagement so in the /etc/apps/DA-ESS-IdentityManagement/local directory was a couple of files with one being correlationsearches.conf which had an "extra" stanza in it for the non-existant or orphaned search. Removal of that file followed by a reload command;

curl -k -u admin https://your.splunk.server.here:8089/services/alerts/correlationsearches/_reload

hopefully has solved this issue. Will know tomorrow.

0 Karma

Community Manager
Community Manager

Hi @dschmidt_cfi

Just to follow up and confirm, did this solve your issue? If yes, please be sure to accept the answer to resolve this question so other users with the same problem can find this post with a concrete solution. Thanks!

Patrick

0 Karma

Path Finder

Tried and failed. Then again...I am going to retrace my steps again this afternoon. When the correct solution is found I will be sure to give credit to whomever found it because I am pretty sure it will not be me (at this point in time.)

0 Karma

Community Manager
Community Manager

No problem, glad you found a solution through @jervin_splunk 🙂

0 Karma

Path Finder

Actually, it was me that failed. When the messages returned the following morning at 3am I thought the process failed but not until I did another follow up check did I realize it was the same message because of a different correlation search. rinse & repeat the steps and now it has cleared everything up.

Once again, Thanks to everyone.

0 Karma

Splunk Employee
Splunk Employee

Looking at the log message on the initial post, that does not appear to be an error with a search not being enabled. The script is actually erroring out with a non-zero exit code: "exited with code 2". Looking at configuration_check.py, the exit code "2" corresponds to "ERR_REST_EXC", indicating that there was an exception making a REST call as part of the configuration check. This indicates some other issue with the system. In this case the exception should be logged to configuration_check.log and I would begin looking there for answers, specifically for Python tracebacks that correspond in time to the log messages in python_modular_input.log.

View solution in original post

Path Finder

Thanks for pointing out a new avenue of searching. I still do not have an answer from Splunk but with your answer I changed my search and found;
2015-02-15 03:08:03,287 ERROR pid=22083 tid=MainThread file=configuration_check.py:run:168 | status="RESTException when executing configuration check" exc="[HTTP 404] https://127.0.0.1:8089/servicesNS/dhorn/DA-ESS-IdentityManagement/saved/searches/Access%20-%20Intera...; [{'code': None, 'text': "\n In handler 'savedsearch': Could not find object id=Access - Interactive Logon by a Service Account - Rule", 'type': 'ERROR'}]"
Traceback (most recent call last):

and the traceback which gives a new trail to go hunting down.

0 Karma

Splunk Employee
Splunk Employee

(Splitting reply in half due to arbitrary character limits)

There are probably two errors here:

  1. The configuration check is verifying that any correlation searches that use the new "Extreme Search" capabilities packaged with ES are enabled in tandem with their baselining searches. This check should not be erroring out when it encounters a search that it cannot identify. I'll track this as a new bug in the ES project (I am one of the developers on the team).
  2. The real error here is that this search: "Access - Interactive Logon by a Service Account - Rule" likely does not exist or cannot be found. Based on the URL being shown in the log, which contains a username ( https://127.0.0.1:8089/servicesNS/**dhorn**/DA-ESS-IdentityManagement/saved/searches) I expect that the error here may be that we are using the "owner" field from the correlation search, and that field is returning a user name. In order to retrieve a Correlation Search without incident, it should probably be using the unscoped name "nobody" here.

Splunk Employee
Splunk Employee

I would recommend running these two queries via CURL on your system and appending them to the support ticket if you have already created one. Also feel free to send me the salesforce ticket number directly; my username minus the "_splunk" @splunk.com will get to me.

curl -k -u admin https://127.0.0.1:8089/servicesNS/dhorn/DA-ESS-IdentityManagement/saved/searches/Access%20-%20Intera...

curl -k -u admin https://127.0.0.1:8089/servicesNS/nobody/DA-ESS-IdentityManagement/saved/searches/Access%20-%20Inter...

Also, if a diag hasn't been created for this issue, submit one as we would like to see the configuration files from the system, if possible.

0 Karma

Path Finder

Everyone helped but Jervin had the correct solution

0 Karma

Champion

Were you able to figure out what was causing that to continually fire? We just noticed it in our environment this morning as well - same message showing up on the hour. We're still in the middle of our implementation and will reach out to our PS resource as well.

0 Karma