Splunk Search

Report to show missed schedule searches duo downtime

Matthias_BY
Communicator

Hello,

i have some scheduled searches. Some run every 5 minutes, some 15 minutes some hourly etc.

Some of those searches are there to generate a summary index, a few other to exportcsv to feed it into another tool regularly.

if there is an outage of the search head (even with two search heads or SH pooling) some jobs might be skipped or missed as they won't rerun. This will result in a not complete dataset.

In the first step, i'll would need a Report which shows me, based on the actual schedule which search was skipped.

index=_internal source=*scheduler.log | eval sched = strftime(scheduled_time, "%Y-%m-%d %H:%M:%S") | search savedsearch_name="Project Honey Pot - Threatscore AVG Last 4 Hours" NOT continued | table sched status savedsearch_name

alt text

Report Like:

User Activity Search
- Last RUN | scheduled every 5 Minutes | STATUS=Completed
- Last RUN - 5 minutes | STATUS=Completed
- Last RUN -10 Minutes | STATUS=Completed
- Last RUN -15 Minutes | Status=Not Executed
- Last RUN -20 Minutes | Status-Completed

and this for each scheduled search dynamically with the scheduled every 5 minutes.

The above example would show a potential Restart of the SH 15 minutes ago. And then i can manually investigate and re-run the export for this specific timeframe to add the data again... or it can review the last 10 successfull runs - subtract the times and then automatically detect that it is running all 5 minutes.

Thanks a lot
Matthias

1 Solution

Matthias_BY
Communicator

Hi,

i found a nearly best solution including being independent from the schedule with kmeans. So first listing all successfull runs, then calculating the delta of each run and then with kmeans adding some statistical calculation to find the outlier. as if the system was off or one scheduled task was missing this can be shown:

`set_internal_index` host="mmaier-mbp15.local" source=*scheduler.log savedsearch_name="BlueCoat - Stats - Collect" status!=continued | stats max(run_time) as Max, count by _time savedsearch_name | sort -_time | delta _time as delta | where Max>0 | eval delta = round (delta/60*-1,2) | kmeans delta | sort -_time | replace 1 with OK, 2 with "ERROR" in CLUSTERNUM

alt text

Even in the line visualization i can visualize it very good:

alt text

View solution in original post

Matthias_BY
Communicator

Hi,

i found a nearly best solution including being independent from the schedule with kmeans. So first listing all successfull runs, then calculating the delta of each run and then with kmeans adding some statistical calculation to find the outlier. as if the system was off or one scheduled task was missing this can be shown:

`set_internal_index` host="mmaier-mbp15.local" source=*scheduler.log savedsearch_name="BlueCoat - Stats - Collect" status!=continued | stats max(run_time) as Max, count by _time savedsearch_name | sort -_time | delta _time as delta | where Max>0 | eval delta = round (delta/60*-1,2) | kmeans delta | sort -_time | replace 1 with OK, 2 with "ERROR" in CLUSTERNUM

alt text

Even in the line visualization i can visualize it very good:

alt text

_d_
Splunk Employee
Splunk Employee

Take a look at the SoS app under Search | Scheduler Activity.

0 Karma

_d_
Splunk Employee
Splunk Employee

Correct, and that's because the scheduler does not keep state across restarts and naturally does not log anything. I suppose that you can modify your search to include a condition that checks whether a shutdown event occurred in splunkd.log during the timerange in question and add a field to indicate so. Then use the information in this field to make a decision on whether or not to re-run the reports.

0 Karma

Matthias_BY
Communicator

thanks for your hint - but this shows only successfull run vs. errors. if i have a scheduled activity every 5 minutes and i shutdown my instance of splunk for 1 hour and start it again - i won't get it displyed that it misses several scheduled activities...

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...