Hi guys.
I'd like to know if there is a way to schedule an input script to run multiple times also if it does not end with an exit code.
I explain:
i have some scripts which need to wait some time before exiting with an output and/or an exit code.
At the same time, i need to rerun the same script also if the previous is still running in background.
Splunk can't do this, since it monitors the ran script and wait and exit code before running the new one.
Example:
[script://./bin/my_script.sh]
index=blablabla
sourcetye=blablabla
source=blablabla
interval=60
...
Let's say "my_script.sh" contains a simple (it's only an example),
#!/bin/bash
date
sleep 90
Now, with all current methods i used, also running it with
[script://./bin/my_script.sh &]
Or with a launcher.sh which detaches a child process with a "bash -c 'my_script.sh &' &" or "exec my_script.sh &" the "sleep 90" prevents splunkd to rerun the script every 60 secs, since it needs 90s to exit the previous script sleep.
So, in my indexed data, i'll found 2 mins data, as for the 90s sleep,
10:00 splunkd launches "my_script.sh" and waits its exit code to index data
10:01 splunkd tries to launch a new "my_script.sh", but stops it since there's a previous "sleep 90"
10:02 splunkd indexes previous 10:00 data, and reschedule a new "my_script.sh"
10:03 as 10:01
10:04 as 10:02
... and so on...
Is there a way to force a re-run, also if a previous script pid is currently still running, and have data for,
10:00 output from "my_script.sh"
10:01 output from "my_script.sh"
10:02 output from "my_script.sh"
10:03 output from "my_script.sh"
......?
Thanks.
Expanding on @PickleRick's answer, we can use Splunk as the scheduler by "forking" the script to the background. The detached background process will continue to run after the parent script exits with 0 (no error):
#!/bin/bash
if [ "${FORK:="0"}" = "0" ]
then
FORK=1 nohup "$0" "$@" >/dev/null 2>&1 &
exit 0
fi
BASENAME=$(basename -- "$0")
logger --id=$$ -t "${BASENAME}" "start"
sleep 90
logger --id=$$ -t "${BASENAME}" "finish"
I've used the logger command in the example. On standard Linux configurations, this will log messages to /var/log/messages or /var/log/syslog, depending on the local syslog daemon configuration. We can use any log file, but since the background process is detached from splunkd, we can't use stdout.
The scripted input can use either intervals or cron expressions. The file input or the input specific to wherever you write your script's output would be configured separately as required.
Just be careful not to unintentionally fork b*mb yourself. Check Splunk (limits.conf) and host (ulimit) limits.
We can also write a long-lived script or modular input that manages its own child processes.
We're all about options, so here's a prototype designed to work with inputs.conf interval = 0 (run at startup and re-run on exit) or interval = -1 (run once at startup):
#!/bin/bash
function cleanup_script() {
# script cleanup
for worker in $(jobs -p)
do
echo "$(date +"%b %e %H:%M:%S") $(hostname) ${BASENAME}[${BASHPID}]: killing worker ${worker}"
kill "${worker}"
done
echo "$(date +"%b %e %H:%M:%S") $(hostname) ${BASENAME}[${BASHPID}]: finish script"
}
function work() {
function cleanup_work() {
# work cleanup
echo "$(date +"%b %e %H:%M:%S") $(hostname) ${BASENAME}[${BASHPID}]: finish work"
}
trap "cleanup_work" RETURN EXIT ABRT
echo "$(date +"%b %e %H:%M:%S") $(hostname) ${BASENAME}[${BASHPID}]: start work"
# do something
sleep 90
# do something else
return
}
trap "cleanup_script" EXIT ABRT
BASENAME=$(basename -- "$0")
echo "$(date +"%b %e %H:%M:%S") $(hostname) ${BASENAME}[${BASHPID}]: start script"
while :
do
work &
sleep 60
done
Splunk will index stdout as in your original script. In this example, I'm writing BSD syslog style messages.
Stopping Splunk or disabling the input at runtime will send SIGABRT to the parent process. Note that stdout won't be captured by Splunk at this point. If you need those messages, write them to a log file.
Actually, I find this even more complicated than a stand-alone cron-launched solution.
I'm saying this as a seasoned admin. It is very "inconsistent".
It is spawned by splunk, emits syslog and of course each distro handles syslog differently.
While it is tempting to use Splunk's internal scheduler, I'd rather advise using the system-wide cron and explicitly created log files. It's more obvious this way.
Anyway, a question to @verbal_666 - why the need to delay the script's output in the first place? It seems a very unusual requirement.
Anyway, a question to - why the need to delay the script's output in the first place? It seems a very unusual requirement.
... originally the script used a "timeout [variable_secs]" launching a while loop to wait with a "test -f file" if "file" was generated during the STARTTIME untill the next "timeout [variable_secs]" (variable_secs is taken by a table, every file has it's own variable_secs). If timeout exits with exit_code 124, an stdout + log entry is written in a log file. If a new script is launched with same identical args, itsself check if a previous one is still running and exit suddenly, waiting the previous to do its job. So i have a single input entry for each file in table (file exists or file does not exists after start_time [the table also have its start_time variable for every file] + variable_secs.
During the "variable_secs" if i had a new file in table to check, the script is blocked from the previously, so i can't check it.
Let's say, having a table like this
server1 | /tmp/flow.log | 07:00 | 07:30 |
server1 | /tmp/flow2.log | 07:10 | 07:15 |
Scheduled script run by splunkd is scheduled every 5m.
Let's say now it's 06:55,
06:55 splunkd run script, it exits with no output/file log since check it's not 07:00 OR 07:10 (script does this)
07:00 splunkd run script, task starts for "/tmp/flow.log" and wait untill 07:30 for file generation
07:05 splunkd run script, aborted since the previous 07:00 is still in background and running
07:10 same as 07:05, "/tmp/flow2.log" is skipped
07:15 same as 07:10, "/tmp/flow2.log" is skipped
07:20 same as 06:55
...
So "/tmp/flow2.log" is totally skipped.
Now, on some servers, as said, a cron was used.
On other servers i rewrite the script without the timeout/sleep, and write an entry every 5m with a variable "FOUND=[0|1]", and then stating with SPL a "stats count sum(FOUND) as found by host,file" and with some dashboards/alerts who traces them, a sum of 0 in that timerange is a file not present.
To me, it depends on whether or not cron is centrally managed. If the Splunk administrator has access to the tools or teams that manage cron, then it may be preferred.
cron, its derivatives, and Splunk are all very poor schedulers overall. I always recommend a full-featured scheduler or workload automation tool, but they can be cost and resource prohibitive.
logger was just an example.
True. Sometimes users don't have permissions to run their own crons and the system-wide crontab is fixed. That can be problematic here.
Anyway. The (ugly) walkaround to the issue with spawning such stuff from within Splunk itself would be to simply create multiple inputs. If you want to spawn 2-minute long jobs every minute, you can just create two (or better yet - three so that you're sure there's no voverlap) separate inputs.
One is running */3. Another one is 1,4,7,10..., and another one is 2,5,8,11...
Ugly, but should work.
Let me first say that your requirement is very unusuall. Typically with scripts people rather have the opposite requirement - that you don't run another instance if the previous one is still running.
In your case I think I'd rather go for external cron-launched script with each instance writing to own log file (and some tmpreaper or manual script to clean the log files after some time) and a monitor input to ingest those files. It'll probably be way easier (but not pure-splunk).
I know it's a unusual question, for this reason i asked 😂😂😂
THAT is the way i'm using now, getting data from input monitor.
For this reason i wanted to know if i could manage it by splunkd directly 😊
I also used on some servers the /etc/cron.hourly/ directly by splunkd, creating tasks dinamically on the fly by launching a root subshell to create task in cron.hourly and getting its inputs monitor output to file after the timeout/sleep does it's job and the script exits with its exit code.
But the mission was to NOT TOUCH ANYTHING IN THE SYSTEM, so i asked if a monitor script could be forced to rerun also if the previous was still in background 🤷♀️thanks anyway everyone 👍👍👍