Alerting

Data Input Script Suddenly Stopped Running

drautb
Explorer

Hey all,

The Splunk instance that I work with has several data input scripts. (~30) One of them is scheduled to run hourly, it's cron string looks like this: "0 * * * *" It was working great, but it abruptly stopped running for some reason. The last time it ran, (as determined by the timestamp on it's output files) was June 30th at 11:00pm. I restarted splunk, and it started running again, but I still haven't been able to determine what caused it to stop in the first place.

Because of the timing, I thought it might be an error in my cron string, but everything I have found online says that the string is correct. Has anyone else run into this before? Scripts that abruptly stop running?

0 Karma

bmas10
Explorer

I am seeing the same issue. It will just randomly get removed from the schedule. Debug refresh will cause it to get added to the schedule again, but it will not start running again until I disable and re-enable. I have checked the _internal and opened up the splunkd.log locally. No errors. I have performed the basic troubleshooting listed by grijhwani and no luck.

0 Karma

grijhwani
Motivator

Do you understand the script, or are you simply relying on a legacy job instated by someone else? You give no indication as to what the script is, or your underlying platform, so it is hard to answer the question. A number of generic possibilities:

  • Have you tried running the script manually? Does it even run any more?
  • Has someone screwed your source permissions (or indeed the run permissions of the script) in such a fashion that the two are now incompatible?
  • Has the data the script works from become corrupted, or too large for whatever tool you are using to manage?
  • Have there been any updates to the system? Perhaps an update of some tool critical to the success of the script has been broken by the update.
  • Have you checked for disk space problems wherever it is that the script places any intermediate or output files?
  • Have you checkpointed the script? If it is running on unix you could try "touch"ing semaphore files to track progress in the background.

In short - have you done any debugging before assuming it is the scheduling which has broken?

0 Karma

lguinn2
Legend

You could find out more by looking in the Splunk logs. Log into Splunk as a admin, and run this search

index=_internal nameofyourscript

You should be able to see what happened each time that Splunk attempted to run your script.
You can find this same information in $SPLUNK_HOME/var/log/splunk/splunkd.log*

0 Karma

grijhwani
Motivator

The cron notation is just fine - the zeroth minute of every hour. And besides, if you didn't change it why would it be wrong now when it wasn't before?

drautb
Explorer

I just fixed the cron string in my post, I didn't realize the website removed some of the asterisks. Is yours correct? I'm not seeing how * * /1 * would equate to an hourly job.

0 Karma

linu1988
Champion

The cron schedule should be like this:
* * * * /1 * * *
Please refer this: docs.splunk.com/Documentation/Splunk/5.0.3/Alert/Scheduledsearch

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...