Hi,
I just upgraded splunk to 9.0.0 and realized the log ~/var/log/splunk/splunkd.log started to get populated with messages like
06-14-2022 16:41:00.924 +0300 ERROR script [2047436 SchedulerThread] - Script execution failed for external search command 'runshellscript'.
06-14-2022 16:41:00.906 +0300 WARN SearchScheduler [2047436 SchedulerThread] - addRequiredFields: SearchProcessorException is ignored, sid=AlertActionsRequredFields_1655214060.1451, error=Error in 'script': Script execution failed for external search command 'runshellscript'.
The above comes to the logs regardless of whether the alert has been fired or not, and we rely quite heavily on running external scripts to make external systems aware of problems.
I thought, now all our script bindings to alerts are broken and we must do a rollback. However, I tested and the scripts were executed nicely. My question is, what has changed here, if anything? I would like to get rid of those messages cluttering the logs in vain. An the other things is, if something else really has changes, what should I do to make splunk happy about the scripts in alerts? I am looking for something else than "Please write a Python script to do the job."
Any clues?
I am also facing similar issue after upgrading to 9.0.1. These errors are on SH and not HF. I do not want to create a custom action as well and would like to continue using run a script action instead. @kaurinko : can you please explain more on "per_result_alert/*.gz" as well as any other solution you might have found
Now I am actually on thin ice, because this is very much about reverse-engineering things, but here is what I have. First, I have a simple SPL search for the purpose of generating five random integers ranging from 1-10. If the random number is greater than 2
| makeresults count=5
| eval low = 0, high = 10, rand = random() % (high - low + 1) + low
| streamstats count AS ID
| where (rand > 2)
| table ID
Using this as a search expression in an alert and defining the alert to be fired for each result individually and employing throttling for a reasonable period of time I should be able to follow what is going on. When the alert is fired, I simply execute a Perl-script to do what I want. I have used something like this:
#!/usr/bin/perl
use strict;
my ($numev, $srch, $FQNSS, $searchname, $reason, $saved_search, $tags, $resultfile) = @ARGV;
# -- Begin throttling bug workaround
my $resultdir = $resultfile;
$resultdir =~ s/[^\/]+$/per_result_alert/;
my $file = qx(ls -rt $resultdir | tail -1);
$file =~ s/\s*$//;
$file = "$resultdir/$file";
$resultfile = $file if (-f $file);
# -- End throttling bug workaround
my $sessionKey = "";
$sessionKey=<STDIN>;
my = qx(/bin/zcat $resultfile);
my $date = qx(/bin/date +"%Y%m%d%H%M%S.%N");
open RESULTFILE, q(>>).q(/tmp/splunk-alert-results.txt);
print RESULTFILE qq(\nTimestamp $date);
print RESULTFILE "Resultfile = $resultfile \n";
foreach (@data)
{
my @fields = split(',');
$fields[0] =~ s/\"//g;
print RESULTFILE join(';',@fields) ;
}
print RESULTFILE qq(Timestamp ) . qx(/bin/date +"%F %H:%M:%S.%N");
close RESULTFILE;
exit(0);
This is now supposed to read the latest file in the per_result_alert directory, and with this kind of a test it seemed to work. However, with real events and real data, I faced unexpected problems, and I had to implement the throttling manually to the real alert with real data. In any case, working with these per_result_alert files feels a little shaky, as one really doesn't know what one will find from that directory.
In any case, I played around with the above and monitored the output files and picked from there the proper locations for alert-results.
Seems Complex! I decided to go custom alert action route replacing run a script and all those errors are now gone 🙂
Hi
haven't seen this or cannot try it by myself now.
Can you share any of your alert scripts (heavily modified/dropped), just to see it's parameters etc.?
With quickly search I found two thing which you could check.
SPL-146802 | Distributed environment requires index defined on search head for log event alerts |
and https://docs.splunk.com/Documentation/Splunk/9.0.0/Security/SPLsafeguards
Maybe one of those can lead to this situation?
r. Ismo
Hi,
I discovered also, that the whole mechanism of executing an external script from an alert was broken in 9.0.0 release. The throttling mechanism doesn't work properly. I have opened a bug for it and hopefully there will be a solution. It is Case #3045767 if you are interested.
/Petri
Hi,
Have you found any solution to this issue?
I'm also getting this following message:
SearchMessage orig_component= sid=AlertActionsRequredFields_1665516540.1276 message_key=EXTERN:RUNSHELLSCRIPT_INVALID_SID__%s message=Invalid search ID ''.
Thanks!
Hi,
No solution so far. Now that you mention it, it seems like I get the same error message as you together with the one I had quoted with the same timestamp, so obviously they originate from the same root cause. Trying the following should make it obvious:
index=_internal "message_key=EXTERN:RUNSHELLSCRIPT_INVALID_SID__%s message=Invalid search ID" OR "Script execution failed for external search command 'runshellscript'."
It seems like these messages appear whenever an alert with script execution is processed, because on another Splunk server with very few alerts in the first place, there are no such error messages when no alerts with script execution are active. Clearly something has gone badly wrong in the Splunk-9 development, and for reasons unclear to me Splunk would rather not have any external script execution available at all. It is really unfortunate, as we use external script execution extensively in alerts.
I received a reply from Splunk Support.
They said so far it appears to be a benign message -- the 'runshellscript' actually does complete. The message only occurs on HFs and can be eliminated by setting precalculate_required_fields_for_alerts = 0 in your /local/savedsearches.conf file -- add to the stanza for the saved search associated with your 'run a script' trigger action.
I just decided to create a custom alert action, so I haven't tested this workaround and I cannot confirm that it works but perhaps this could help to fix your issues.
Good to know. My original problem is making progress. It has been narrowed down to having an issue with throttling and running an external script only, but unfortunately there is no solution yet. Just I recently I learnt about the per_result_alert/*.gz files, and I just might be able to create a dirty workaround around them. I just have to hope for a stable solution to a hopefully soon to be released new version.
Hey, I am facing exact same issue
We have perl script running as alert action and getting same error.
Did you manage to do work around for this?
Hi,
Unfortunately there was no real solution. I had to get rid of the throttling, or try to work around it some other way. It is really a pity, but the message from Splunk support was clear: They don't care. I will have to create a script executor of my own. It should not be difficult, but I have a hard time finding time for it.