I'm working w/ a similar issue as: https://answers.splunk.com/answers/512103/how-to-get-a-list-of-schedules-searches-reports-al.html
The addendum to that is I want to find the run time of each of the searches. I'm thinking perhaps there are too many searches running at the same time and is causing Splunk inner-connectivity issues.
It would be really nice to have a scheduled job time and the amount of time it took to run the last time (or several times).
In that question, they look at the rest api. However, timings can be found in index=_audit. Depending on what your exact criteria is, you may want to join two searches. Below I demonstrate the timings are in _audit:
index=_audit savedsearch_name=* savedsearch_name!="" timestamp=* total_run_time=*
| eval temp=strptime(total_run_time,"%Y%m%d%H%M%S")
| convert timeformat="%m-%d-%Y %H:%M:%S" ctime(temp)
| table timestamp total_run_time savedsearch_name
| sort - timestamp
For more information, this was cobbled together from:
https://answers.splunk.com/answers/507790/index-audit-contents.html
https://answers.splunk.com/answers/39402/convert-timeformat.html
Good morning,
I think this SPL does what you're asking to do. It's similar to searches built into the monitoring console but more specifically tailored to your requirements.
index=_internal sourcetype=scheduler
| stats values(app) AS splunk_app, values(scheduled_time) AS scheduled_time, values(dispatch_time) AS dispatch_time, values(result_count) AS result_count, values(search_type) AS search_type, values(status) AS status, values(run_time) AS run_time
by sid
| convert ctime(scheduled_time) AS scheduled_time_pretty, ctime(dispatch_time) AS dispatch_time_pretty
| eval schedule_dispatch_delta = dispatch_time-scheduled_time, schedule_dispatch_delta_pretty = tostring(schedule_dispatch_delta,"duration")
| table sid, splunk_app,status,search_type,result_count,run_time,scheduled_time_pretty,dispatch_time_pretty,schedule_dispatch_delta_pretty
| sort - run_time
This search uses the _internal index and the scheduled sourcetype to pull meta information about your scheduled searches. It specifically focuses on scheduling and run-time of the searches and works to identify searches that are struggling.
I recommend using this as a starting place then investigating further by adding |stats count by FIELDNAME based on fields you want to investigate. For example, adding |stats count by scheduled_time_pretty will give you a count of searches based on the times they are scheduled to run. That can help you identify if you have too many searches scheduled at the same time.
In that question, they look at the rest api. However, timings can be found in index=_audit. Depending on what your exact criteria is, you may want to join two searches. Below I demonstrate the timings are in _audit:
index=_audit savedsearch_name=* savedsearch_name!="" timestamp=* total_run_time=*
| eval temp=strptime(total_run_time,"%Y%m%d%H%M%S")
| convert timeformat="%m-%d-%Y %H:%M:%S" ctime(temp)
| table timestamp total_run_time savedsearch_name
| sort - timestamp
For more information, this was cobbled together from:
https://answers.splunk.com/answers/507790/index-audit-contents.html
https://answers.splunk.com/answers/39402/convert-timeformat.html
i'll take this answer as it's exactly what i was looking for! I do have a follow up though, what is the "total run time" value if it's "*" in the resultset?
example: i assume records show run time in seconds and they are 5, 6, 15, 400 and * what's the value of "*" in the output result set?
The * is to say we want something in there - not null. The details I found on total_run_time were for the history
command: "The total time it took to run the search in seconds." Source: http://docs.splunk.com/Documentation/Splunk/7.1.2/SearchReference/History
If you are trying to troubleshoot scheduled search concurrency why not use the monitoring console? check "Search >> Scheduler Activity: Instance". You can get alot of information there (inc. the average runtime for the searches).
I'll double check but i don't think it has the info i'm looking for. In short, we're seeing connection issues at waht appears to be random times/intervals. Knowing "nothing is random" there's a pattern somewhere so "average" isn't going to give me the info i think i need. HAHAH notice i said think i need, not sure it'll answer what i'm looking for.
Other answers posts indicate that it's likely due to a query timeout in the configs. We've more than doubled the default timeouts but i'm still thinking it's bottle-necked somewhere. We can run the same query that times out a couple minutes later and it's fine.