Alerting

Why is Splunk saying "Script execution failed for external search command 'runshellscript'." after 9.0.0 upgrade?

kaurinko
Communicator

Hi,

I just upgraded splunk to 9.0.0 and realized the log ~/var/log/splunk/splunkd.log started to get populated with messages like

06-14-2022 16:41:00.924 +0300 ERROR script [2047436 SchedulerThread] - Script execution failed for external search command 'runshellscript'.

06-14-2022 16:41:00.906 +0300 WARN SearchScheduler [2047436 SchedulerThread] - addRequiredFields: SearchProcessorException is ignored, sid=AlertActionsRequredFields_1655214060.1451, error=Error in 'script': Script execution failed for external search command 'runshellscript'.


The above comes to the logs regardless of whether the alert has been fired or not, and we rely quite heavily on running external scripts to make external systems aware of problems.

I thought, now all our script bindings to alerts are broken and we must do a rollback. However, I tested and the scripts were executed nicely. My question is, what has changed here, if anything? I would like to get rid of those messages cluttering the logs in vain. An the other things is, if something else really has changes, what should I do to make splunk happy about the scripts in alerts? I am looking for something else than "Please write a Python script to do the job."

Any clues?

Labels (1)
0 Karma

supreet
Explorer

I am also facing similar issue after upgrading to 9.0.1. These errors are on SH and not HF. I do not want to create a custom action as well and would like to continue using run a script action instead. @kaurinko : can you please explain more on "per_result_alert/*.gz" as well as any other solution you might have found

0 Karma

kaurinko
Communicator

Now I am actually on thin ice, because this is very much about reverse-engineering things, but here is what I have. First, I have a simple SPL search for the purpose of generating five random integers ranging from 1-10. If the random number is greater than 2

 

| makeresults count=5
| eval low = 0, high = 10, rand = random() % (high - low + 1) + low
| streamstats count AS ID
| where (rand > 2)
| table ID

 

 

Using this as a search expression in an alert and defining the alert to be fired for each result individually and employing throttling for a reasonable period of time I should be able to follow what is going on. When the alert is fired, I simply execute a Perl-script to do what I want. I have used something like this:

 

#!/usr/bin/perl
use strict;
my ($numev, $srch, $FQNSS, $searchname, $reason, $saved_search, $tags, $resultfile) = @ARGV;

# -- Begin throttling bug workaround
my $resultdir  = $resultfile;
$resultdir =~ s/[^\/]+$/per_result_alert/;
my $file = qx(ls -rt $resultdir | tail -1);
$file =~ s/\s*$//;
$file = "$resultdir/$file";

$resultfile = $file if (-f $file);
# -- End throttling bug workaround

my $sessionKey = "";
$sessionKey=<STDIN>;

my  = qx(/bin/zcat $resultfile);
my $date = qx(/bin/date +"%Y%m%d%H%M%S.%N");

open RESULTFILE, q(>>).q(/tmp/splunk-alert-results.txt);
print RESULTFILE qq(\nTimestamp $date);
print RESULTFILE "Resultfile = $resultfile \n";

foreach (@data)
{
        my @fields = split(',');
        $fields[0] =~ s/\"//g;
        print RESULTFILE join(';',@fields) ;
}
print RESULTFILE qq(Timestamp ) . qx(/bin/date +"%F %H:%M:%S.%N");
close RESULTFILE;

exit(0);

 

 

This is now supposed to read the latest file in the per_result_alert directory, and with this kind of a test it seemed to work. However, with real events and real data, I faced unexpected problems, and I had to implement the throttling manually to the real alert with real data. In any case, working with these per_result_alert files feels a little shaky, as one really doesn't know what one will find from that directory.

In any case, I played around with the above and monitored the output files and picked from there the proper locations for alert-results.

0 Karma

supreet
Explorer

Seems Complex! I decided to go custom alert action route replacing run a script and all those errors are now gone 🙂

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

haven't seen this or cannot try it by myself now.

Can you share any of your alert scripts (heavily modified/dropped), just to see it's parameters etc.?

With quickly search I found two thing which you could check.

SPL-146802Distributed environment requires index defined on search head for log event alerts

and  https://docs.splunk.com/Documentation/Splunk/9.0.0/Security/SPLsafeguards

Maybe one of those can lead to this situation?

r. Ismo

0 Karma

kaurinko
Communicator

Hi,

I discovered also, that the whole mechanism of executing an external script from an alert was broken in 9.0.0 release. The throttling mechanism doesn't work properly. I have opened a bug for it and hopefully there will be a solution. It is Case #3045767 if you are interested.

/Petri

 

0 Karma

JorgeFT
Explorer

Hi,
Have you found any solution to this issue?

I'm also getting this following message:

SearchMessage orig_component= sid=AlertActionsRequredFields_1665516540.1276 message_key=EXTERN:RUNSHELLSCRIPT_INVALID_SID__%s message=Invalid search ID ''.

Thanks!

0 Karma

kaurinko
Communicator

Hi,

No solution so far. Now that you mention it, it seems like I get the same error message as you together with the one I had quoted with the same timestamp, so obviously they originate from the same root cause. Trying the following should make it obvious:

index=_internal "message_key=EXTERN:RUNSHELLSCRIPT_INVALID_SID__%s message=Invalid search ID" OR "Script execution failed for external search command 'runshellscript'."

 

It seems like these messages appear whenever an alert with script execution is processed, because on another Splunk server with very few alerts in the first place, there are no such error messages when no alerts with script execution are active. Clearly something has gone badly wrong in the Splunk-9 development, and for reasons unclear to me Splunk would rather not have any external script execution available at all. It is really unfortunate, as we use external script execution extensively in alerts.

JorgeFT
Explorer

I received a reply from Splunk Support.

They said so far it appears to be a benign message -- the 'runshellscript' actually does complete. The message only occurs on HFs and can be eliminated by setting precalculate_required_fields_for_alerts = 0 in your /local/savedsearches.conf file -- add to the stanza for the saved search associated with your 'run a script' trigger action.

They DID mention this is safe if no other alert actions are attached to that search. Basically, the required fields logic is in place for all alert actions but is no use in this one, so the logic is not needed. However, if other alert actions are attached to the search, you could encounter a performance impact.
 

I just decided to create a custom alert action, so I haven't tested this workaround and I cannot confirm that it works but perhaps this could help to fix your issues.

kaurinko
Communicator

Good to know. My original problem is making progress. It has been narrowed down to having an issue with throttling and running an external script only, but unfortunately there is no solution yet. Just I recently I learnt about the per_result_alert/*.gz files, and I just might be able to create a dirty workaround around them. I just have to hope for a stable solution to a hopefully soon to be released new version.

abhishekkalokhe
Explorer

Hey, I am facing exact same issue

We have perl script running as alert action and getting same error.

 

Did you manage to do work around for this?

0 Karma

kaurinko
Communicator

Hi,

Unfortunately there was no real solution. I had to get rid of the throttling, or try to work around it some other way. It is really a pity, but the message from Splunk support was clear: They don't care. I will have to create a script executor of my own. It should not be difficult, but I have a hard time finding time for it.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...