Splunk Search

How to write a search to show any scheduled search jobs that may have been delayed and for how long?

supabuck
Path Finder

We have a problem with scheduled searches where they will sometimes be delayed due to heavy load on our search heads/indexers. I want to know what jobs have been delayed and how long in minutes if possible so that they can be scheduled for a new time or backfilled.

0 Karma
1 Solution

Raghav2384
Motivator

Hey Supabuck,

Try this ....I got this from Smart Answers

index=_internal sourcetype=scheduler app="cvod" scheduled_time=* 
 | eval time=strftime(_time,"%Y-%m-%d %H:%M:%S") | eval delay_in_start = (dispatch_time - scheduled_time) 
 | eval scheduled_time=strftime(scheduled_time,"%Y-%m-%d %H:%M:%S") 
 | eval dispatch_time=strftime(dispatch_time,"%Y-%m-%d %H:%M:%S") 
 | table savedsearch_name,delay_in_start, scheduled_time, dispatch_time, time, run_time, status

Thanks,
Raghav

View solution in original post

Runals
Motivator

We've been working issues with our scheduled searches and came up with the following queries that might be helpful. They need to be run from the SH in question.

This first query looks at the number of searches using the same cron schedule and is pretty straight forward. The query could be tightened up a bit but I was mostly reusing a query out of the config tracker app. What this query doesn't take into account is searches that end up starting at the same time even though they have different cron schedules. In other words if you have 5 searches that run every 5 minutes (*/5 * * * *) and 5 searches that run every 10 minutes (*/10 * * * *) you will have a state where every 10 minutes you will have 10 searches running. Hope that makes sense. Something to think about doing if you have lots of these is offset these searches like 1/5 * * * *. Another thing to impress on people is using time snapping so that if they want searches to cover midnight to midnight don't run that AT midnight. Instead do something like earliest=-1d@d latest=@d and then have the cron run at some minute not divisible by 5 (since people naturally gravitate to that AND it those are default GUI options).

| rest splunk_server=local /servicesNS/-/-/configs/conf-savedsearches | rename eai:appName as app eai:acl.sharing as sharing | eval status = if(disabled=0, "Enabled" , "Disabled") | foreach cron_schedule action.email.to action.email.subject [eval <<FIELD>> = if(len('<<FIELD>>') > 0,'<<FIELD>>', "-")] | fields app title author search cron_schedule action.email action.email.subject action.email.to splunk_server sharing status | join app type=left [| rest splunk_server=local /servicesNS/-/-/apps/local | rename title as app label as app_label | table app app_label] | search status=enabled cron_schedule!="-" | where cron_schedule!="-" | eventstats dc(title) as concurrentCron by cron_schedule | table app app_label title author sharing cron_schedule concurrentCron search | sort -concurrentCron cron_schedule app title

This search breaks down what scheduled searches fired and are running concurrently at a 1 sec level of granularity. From a results perspective the activeSrchs field is the number of searches (scheduled, ad-hoc, inline) currently running. The schedSrchConcurrency field is the number of scheduled searches currently running while the schedSrchFired field shows the distinct count of scheduled searches that actually initiated with that list just to the right. Relative to your question I believe the status of continued is similar to the delay you are talking about though to get a feel for how long it is delayed you will need to use Raghav's query.

index=_internal sourcetype=scheduler [| rest splunk_server=local /servicesNS/-/-/configs/conf-savedsearches | head 1 | rename splunk_server AS host | fields host] | eval _time = if(isnotnull(run_time), _time - run_time, scheduled_time) | eval run_time = coalesce(run_time, 0) | eval reason = coalesce(reason, "-") | concurrency duration=run_time | bin span=1s _time | stats count by concurrency savedsearch_id run_time status _time reason | stats max(concurrency) AS schedSrchConcurrency dc(savedsearch_id) AS schedSrchFired list(savedsearch_id) as schedSrchName list(status) as Status list(reason) as Reason list(run_time) as runTime by _time | append [search index=_internal "group=search_concurrency" "system total" source="/opt/splunk/var/log/splunk/metrics.log" [| rest splunk_server=local /servicesNS/-/-/configs/conf-savedsearches | head 1 | rename splunk_server AS host | fields host] | table _time active_hist_searches] | rename active_hist_searches as activeSrchs | search activeSrchs!=0 OR schedSrchFired=* | table _time activeSrchs schedSrchConcurrency schedSrchFired schedSrchName Status Reason runTime | sort _time

supabuck
Path Finder

Hi Runals,

Thank you very much. I will add these searches to my list and use it when troubleshooting scheduled searches in the future. I also like those tips concerning the scheduling of cron jobs!

Best regards,
Supabuck

0 Karma

Runals
Motivator

To work around the cron scheduling/display issue I came up with the following query. Generally speaking Splunk is smart enough to delay (status=continued) scheduled searches when too many are running but if you want a visual on when searches are scheduled to run this works pretty well. For the visualization I use a column chart and convert the maxScheduled field to a chart overlay.

index=_internal sourcetype=scheduler NOT "status=continued" scheduled_time=* [| rest splunk_server=local /servicesNS/-/-/server/status/limits/search-concurrency | rename splunk_server as host | table host] | eval _time = scheduled_time | timechart dc(savedsearch_id) as scheduledSearches  | eval maxScheduled = [| rest splunk_server=local /servicesNS/-/-/server/status/limits/search-concurrency | table max_hist_scheduled_searches | rename max_hist_scheduled_searches as query]
0 Karma

Raghav2384
Motivator

Hey Supabuck,

Try this ....I got this from Smart Answers

index=_internal sourcetype=scheduler app="cvod" scheduled_time=* 
 | eval time=strftime(_time,"%Y-%m-%d %H:%M:%S") | eval delay_in_start = (dispatch_time - scheduled_time) 
 | eval scheduled_time=strftime(scheduled_time,"%Y-%m-%d %H:%M:%S") 
 | eval dispatch_time=strftime(dispatch_time,"%Y-%m-%d %H:%M:%S") 
 | table savedsearch_name,delay_in_start, scheduled_time, dispatch_time, time, run_time, status

Thanks,
Raghav

supabuck
Path Finder

Wow Raghav, thank you. I did a quick test and it looks like what I need. You da man! 🙂

0 Karma

ppablo
Retired

Glad you found an answer through @Raghav2394 🙂 For reference, this is the Smart AnSwerS blog series he was referring to in case you wanted to see any other useful tips and tricks with Splunk. http://blogs.splunk.com/2016/08/05/smart-answers-73/

supabuck
Path Finder

Hi ppablo, thank you very much for this information. I will read through the entries within this blog series. It's great that the Splunk community is so helpful!

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...