Splunk Search

How to manipulate the time formats to get the exact results in Status

varunghai
Engager

Hi,

I have created a query to fetch the status of some jobs in a particular format.
There are different scheduled jobs which run in the environment which we want to monitor, some of these runs once daily and some of the runs every 5 or 10 or 30 mins etc.

Query-
index=tomcat source ="/files0/nlhyp*" [Job] cronjob earliest=@d Action=Starting | table CronJobName _time | rename _time as time1 | eval Actual_Start_Time=strftime(time1,"%d-%m-%Y %H:%M:%S") | Join

[

search index=tomcat source ="/files0/nlhyp*" [Job] cronjob earliest=@d Action=Finished | table CronJobName _time | rename _time as time2 | eval Actual_End_Time=strftime(time2,"%d-%m-%Y %H:%M:%S")

] | table CronJobName Actual_Start_Time Actual_End_Time|dedup CronJobName
| join CronJobName type=inner[|inputlookup CronJobLookup.csv]
| dedup CronJobName | table CronJobName Job_Frequency_min Actual_Start_Time Expected_Start_Time Actual_End_Time Expected_End_Time
| eval epoch_a=now() | eval CurrentDate=strftime(now(),"%d-%m-%Y")
| eval epoch_b=strptime(CurrentDate." ".Expected_Start_Time,"%d-%m-%Y %H:%M:%S"), ExpectedStart=strftime(epoch_b,"%d-%m-%Y %H:%M:%S"), CurrentTime=strftime(epoch_a,"%d-%m-%Y %H:%M:%S")
| eval epoch_c=strptime(CurrentDate." ".Expected_End_Time,"%d-%m-%Y %H:%M:%S"), ExpectedEnd=strftime(epoch_c,"%d-%m-%Y %H:%M:%S"), CurrentTime=strftime(epoch_a,"%d-%m-%Y %H:%M:%S")
| table CronJobName Job_Frequency_min Actual_Start_Time ExpectedStart Actual_End_Time ExpectedEnd
| eval Expected_Start=strftime(Expected_Start_Time, "%H:%M:%S") | eval Expected_End=strftime(Expected_End_Time, "%H:%M:%S")
| eval Status = case(
Actual_End_Time > ExpectedEnd AND isnotnull(Actual_End_Time), "Over Run",
Actual_End_Time < ExpectedEnd AND isnotnull(Actual_End_Time), "OK",
isnull(Actual_Start_Time), "Not Run")

Issue 1:
currently for the jobs with frequency as Once, i am getting the result as :
CronJobName Job_Frequency_min Actual_Start_Time ExpectedStart Actual_End_Time ExpectedEnd Status
staffAuditFeedJob Once 21-11-2017 06:30:01 21-11-2017 05:30:00 21-11-2017 06:30:02 21-11-2017 06:30:00 Over Run

here the difference between expected and actual end time is only 2 secs yet the status says Over Run, so i want to give some lead time for 5 mins. like if the job finishes after 5 mins i should get the over run status if it finishes within 5 mins over the expectedend time then it should show as OK.

Issue 2:

For the jobs with frequency other than 1, say the frequency is every 10 min so the expected start and expected end times should vary for each run. Or the status should be more accurate based on some calculations of the time and frequency of the job, which I am not able to get. Can you please suggest?

Tags (1)
0 Karma
1 Solution

DalJeanis
SplunkTrust
SplunkTrust

Change from

Actual_End_Time > ExpectedEnd 

to

(Actual_End_Time > ExpectedEnd +300)

to give 5 minutes (300 seconds) leeway.


The other one requires a major refactor... which the query does need...

Okay, here's the refactor. This should SCREAM in speed relative to the other one.

Obviously, it is air code, since we don't have your data to test with.

index=tomcat source ="/files0/nlhyp*" [Job] cronjob (Action=Starting OR Action=Finished)
| fields _time CronJobName Action

| rename COMMENT as "Find the latest _time for each action for each CronJobName, then roll them together"
| stats max(_time) as _time by CronJobName Action
| stats max(eval(case(Action="Starting",_time))) as Actual_Start_Epoch, max(eval(case(Action="Finished",_time))) as Actual_End_Epoch by CronJobName

| rename COMMENT as "Default the start time to end time if start is missing, then kill the end time if it's before the start time."
| eval Actual_Start_Epoch=coalesce(Actual_Start_Epoch, Actual_End_Epoch)
| eval Actual_End_Epoch=case(Actual_End_Epoch>=Actual_Start_Epoch, Actual_End_Epoch)

| rename COMMENT as "look up the time and frequency for each CronJobName"
| lookup CronJobLookup.csv CronJobName OUTPUT Job_Frequency_min  Expected_Start_Time Expected_End_Time
| fields CronJobName Actual_Start_Epoch Actual_End_Epoch Job_Frequency_min Expected_Start_Time Expected_End_Time

| rename COMMENT as "Mark the records so we can delete each one after handling it one of two ways"
| eval rectype="detail1"

| rename COMMENT as "This section handles the once-only jobs"
| appendpipe [
    | where rectype="detail1" AND  Job_Frequency_min="Once"
    | eval Current_Date=strftime(now(),"%Y-%m-%d")
    | eval Expected_Start=Current_Date." ".Expected_Start_Time 
    | eval Expected_Start_Epoch=strptime(Expected_Start,"%Y-%m-%d %H:%M:%S")
    | eval Expected_End=Current_Date." ".Expected_End_Time
    | eval Expected_End_Epoch=strptime(Expected_End,"%Y-%m-%d %H:%M:%S")
    | eval Actual_Start=strftime(Actual_Start_Epoch,"%Y-%m-%d %H:%M:%S")
    | eval Actual_End=strftime(Actual_End_Epoch,"%Y-%m-%d %H:%M:%S")
    | eval Status = case(
        isnull(Actual_End_Epoch),"Not Complete",
        Actual_End_Epoch <= Expected_End_Epoch + 300, "OK", 
        true(), "Over Run")
    | table CronJobName Job_Frequency_min Expected_Start Expected_End Actual_Start Actual_End Status
    | eval rectype="detailOnce"
    ]

| rename COMMENT as "This section handles the more-than-once jobs"
| rename COMMENT as "Assumes CronJobLookup.csv table has one record per CronJobName, w expected start and end time of one job. "
| appendpipe [
    | where rectype="detail1" AND Job_Frequency_min!="Once"

    | rename COMMENT as "Calculate the epoch time of the 'Anchor' version of the Expected Start Time "        
    | eval Current_Date=strftime(Actual_Start_Epoch,"%Y-%m-%d")
    | eval Anchor_Start=CurrentDate." ".Expected_Start_Time 
    | eval Anchor_Start_Epoch=coalesce(strptime(Anchor_Start,"%Y-%m-%d %H:%M:%S")

    | rename COMMENT as "Calculate the job frequency in seconds and figure out how many steps we are probably away from the anchor time"        
    | eval Job_Frequency_Sec=if(isnum(Job_Frequency_min),60*tonumber(Job_Frequency_min),600) 
    | eval steps_off=round( (Actual_Start_Epoch-Anchor_Start_Epoch) / Job_Frequency_Sec,0)
    | eval secs_off=Job_Frequency_Sec*steps_off

    | rename COMMENT as "Calculate the Expected Start Time given the timestep"        
    | eval Expected_Start_Epoch=Anchor_Start_Epoch+secs_off

    | rename COMMENT as "Calculate the Expected End Time given the timestep"        
    | eval Expected_End=CurrentDate." ".Expected_End_Time
    | eval Expected_End_Epoch=strptime(Expected_End,"%Y-%m-%d %H:%M:%S")+secs_off
    | eval Expected_End=strftime(Expected_End_Epoch,"%Y-%m-%d %H:%M:%S")

    | eval Status = case(
        isnull(Actual_End_Epoch),"Not Complete",
        Actual_End_Epoch <= Expected_End_Epoch +300, "OK", 
        true(), "Over Run")

    | table CronJobName Job_Frequency_min Expected_Start Expected_End Actual_Start Actual_End Status
    | eval rectype="detailMultiples"
    ]

| rename COMMENT as "Now we get rid of the pre-processing details"
| where rectype!="detail1"

| rename COMMENT as "and sort either by name or by start or end date (see note 1)"
| sort 0 CronJobName 

NOTES -

1) The dates are now presented in ISO format, so that you can sort by date/time without converting back to epoch. If you get your techs and users used to looking at yyyy-mm-dd, then it will save you huge headaches in the long run, because ISO format cannot be misinterpreted. (No one uses yyyy-dd-mm.)

2) It should be easy enough to modify the second appendpipe based on different assumptions.

View solution in original post

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Change from

Actual_End_Time > ExpectedEnd 

to

(Actual_End_Time > ExpectedEnd +300)

to give 5 minutes (300 seconds) leeway.


The other one requires a major refactor... which the query does need...

Okay, here's the refactor. This should SCREAM in speed relative to the other one.

Obviously, it is air code, since we don't have your data to test with.

index=tomcat source ="/files0/nlhyp*" [Job] cronjob (Action=Starting OR Action=Finished)
| fields _time CronJobName Action

| rename COMMENT as "Find the latest _time for each action for each CronJobName, then roll them together"
| stats max(_time) as _time by CronJobName Action
| stats max(eval(case(Action="Starting",_time))) as Actual_Start_Epoch, max(eval(case(Action="Finished",_time))) as Actual_End_Epoch by CronJobName

| rename COMMENT as "Default the start time to end time if start is missing, then kill the end time if it's before the start time."
| eval Actual_Start_Epoch=coalesce(Actual_Start_Epoch, Actual_End_Epoch)
| eval Actual_End_Epoch=case(Actual_End_Epoch>=Actual_Start_Epoch, Actual_End_Epoch)

| rename COMMENT as "look up the time and frequency for each CronJobName"
| lookup CronJobLookup.csv CronJobName OUTPUT Job_Frequency_min  Expected_Start_Time Expected_End_Time
| fields CronJobName Actual_Start_Epoch Actual_End_Epoch Job_Frequency_min Expected_Start_Time Expected_End_Time

| rename COMMENT as "Mark the records so we can delete each one after handling it one of two ways"
| eval rectype="detail1"

| rename COMMENT as "This section handles the once-only jobs"
| appendpipe [
    | where rectype="detail1" AND  Job_Frequency_min="Once"
    | eval Current_Date=strftime(now(),"%Y-%m-%d")
    | eval Expected_Start=Current_Date." ".Expected_Start_Time 
    | eval Expected_Start_Epoch=strptime(Expected_Start,"%Y-%m-%d %H:%M:%S")
    | eval Expected_End=Current_Date." ".Expected_End_Time
    | eval Expected_End_Epoch=strptime(Expected_End,"%Y-%m-%d %H:%M:%S")
    | eval Actual_Start=strftime(Actual_Start_Epoch,"%Y-%m-%d %H:%M:%S")
    | eval Actual_End=strftime(Actual_End_Epoch,"%Y-%m-%d %H:%M:%S")
    | eval Status = case(
        isnull(Actual_End_Epoch),"Not Complete",
        Actual_End_Epoch <= Expected_End_Epoch + 300, "OK", 
        true(), "Over Run")
    | table CronJobName Job_Frequency_min Expected_Start Expected_End Actual_Start Actual_End Status
    | eval rectype="detailOnce"
    ]

| rename COMMENT as "This section handles the more-than-once jobs"
| rename COMMENT as "Assumes CronJobLookup.csv table has one record per CronJobName, w expected start and end time of one job. "
| appendpipe [
    | where rectype="detail1" AND Job_Frequency_min!="Once"

    | rename COMMENT as "Calculate the epoch time of the 'Anchor' version of the Expected Start Time "        
    | eval Current_Date=strftime(Actual_Start_Epoch,"%Y-%m-%d")
    | eval Anchor_Start=CurrentDate." ".Expected_Start_Time 
    | eval Anchor_Start_Epoch=coalesce(strptime(Anchor_Start,"%Y-%m-%d %H:%M:%S")

    | rename COMMENT as "Calculate the job frequency in seconds and figure out how many steps we are probably away from the anchor time"        
    | eval Job_Frequency_Sec=if(isnum(Job_Frequency_min),60*tonumber(Job_Frequency_min),600) 
    | eval steps_off=round( (Actual_Start_Epoch-Anchor_Start_Epoch) / Job_Frequency_Sec,0)
    | eval secs_off=Job_Frequency_Sec*steps_off

    | rename COMMENT as "Calculate the Expected Start Time given the timestep"        
    | eval Expected_Start_Epoch=Anchor_Start_Epoch+secs_off

    | rename COMMENT as "Calculate the Expected End Time given the timestep"        
    | eval Expected_End=CurrentDate." ".Expected_End_Time
    | eval Expected_End_Epoch=strptime(Expected_End,"%Y-%m-%d %H:%M:%S")+secs_off
    | eval Expected_End=strftime(Expected_End_Epoch,"%Y-%m-%d %H:%M:%S")

    | eval Status = case(
        isnull(Actual_End_Epoch),"Not Complete",
        Actual_End_Epoch <= Expected_End_Epoch +300, "OK", 
        true(), "Over Run")

    | table CronJobName Job_Frequency_min Expected_Start Expected_End Actual_Start Actual_End Status
    | eval rectype="detailMultiples"
    ]

| rename COMMENT as "Now we get rid of the pre-processing details"
| where rectype!="detail1"

| rename COMMENT as "and sort either by name or by start or end date (see note 1)"
| sort 0 CronJobName 

NOTES -

1) The dates are now presented in ISO format, so that you can sort by date/time without converting back to epoch. If you get your techs and users used to looking at yyyy-mm-dd, then it will save you huge headaches in the long run, because ISO format cannot be misinterpreted. (No one uses yyyy-dd-mm.)

2) It should be easy enough to modify the second appendpipe based on different assumptions.

View solution in original post

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

where Job_Frequency_min is 10 min, what is the value of ExpectedStart and how many records are there? One, or 144?

0 Karma
Register for .conf21 Now! Go Vegas or Go Virtual!

How will you .conf21? You decide! Go in-person in Las Vegas, 10/18-10/21, or go online with .conf21 Virtual, 10/19-10/20.