I have this search that basically gets the longest current running jobs based on logs from a job scheduling system. For ones that are currently "Running", it also calculates the current runtime. This is to keep track of long running jobs.
I set this up as an alert to check for jobs running past 4 hours, then it will trigger and give a list of the jobs by email. However..the alert works 90% of the time except when an issue occurs on the server. The alert runs every hour and it is based on the past 3 days worth of logs.
This is the error "Unknown error for peer . Search Results might be incomplete. If this occurs frequently, please check on the peer." And it will trigger incomplete results. The splunk admins in our company say the search may need to be optimized, is this actually the case??
index=autosys source= jobName=
| where NOT ((LIKE(jobName, "%box%")) OR (LIKE(jobName, "%bx%")))
| stats latest(statusText) AS "latestStatus" latest(timestamp) AS "latestTimestamp" by jobName
| where latestStatus="RUNNING"
| eval nowstring=strftime(now(), "%Y-%m-%d %H:%M:%S")
| eval now = strptime(nowstring,"%Y-%m-%d %H:%M:%S" )
| eval start = strptime(latestTimestamp, "%Y-%m-%d %H:%M:%S")
| eval diffInSecs = now-start
| eval strSecs=tostring(diffInSecs,"duration")
| eval strFormatSecs=if(match(strSecs,"\+"),replace(strSecs,"(\d+)\+(\d+)\:(\d+)\:(\d+)","\1 Day \2 Hr \3 Min \4 Secs"),replace(strSecs,"(\d+)\:(\d+)\:(\d+)","\1 Hr \2 Min \3 Secs"))
| eval hour = diffInSecs / 3600
| sort -hour
| where (hour > 4)
| fields jobName,latestStatus,latestTimestamp,nowstring, strFormatSecs
| rename latestStatus As "Status" latestTimestamp As "Job/Box Start Time" nowstring AS "Current Time" strFormatSecs AS "Runtime Days Hrs Min Secs"