Splunk Search

How to extract the start and end date/time from my sample data, then calculate the duration of a job?

athorat
Communicator

Hi

I have to extract start date, end date, and the duration of a job based on the following two events:

Started:
2016-01-06 07:00:10,314  INFO ActionStartXCommand:543 - SERVER[redacted.redacted.redacted.com] USER[hdfs] GROUP[-] TOKEN[] APP[BOB_CAS_WF_IM_DataStore] JOB[0018443-150810162217214-oozie-oozi-W] ACTION[0018443-150810162217214-oozie-oozi-W@:start:] [***0018443-150810162217214-oozie-oozi-W@:start:***]Action updated in DB!


End:
2016-01-06 08:10:50,301  INFO ActionStartXCommand:543 - SERVER[redacted.redacted.redacted.com] USER[hdfs] GROUP[-] TOKEN[] APP[BOB_CAS_WF_IM_DataStore] JOB[0018443-150810162217214-oozie-oozi-W] ACTION[0018443-150810162217214-oozie-oozi-W@end] [***0018443-150810162217214-oozie-oozi-W@end***]Action updated in DB!

How do I extract the date and time and then calculate the difference.

0 Karma
1 Solution

sundareshr
Legend

Try something like this (untested code, assuming there are only two events per JOB)

... | rex "JOB\[?(?<jobid>[\d-]+)-" | streamstats window=2 current=f range(_time) as diff by jobid | table jobid diff | search diff=*

View solution in original post

jkat54
SplunkTrust
SplunkTrust

In your props.conf:

[sourcetypeName]
SHOULD_LINEMERGE = false
BREAK_ONLY_BEFORE = \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\,\d{3}
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
MAX_TIMESTAMP_LOOKAHEAD = 24
EXTRACT-myfields = (?<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3})\s(?<log_level>\w+)\s(?<unknownfield>\w+):(?<number>\d+)\s-\sSERVER\[(?<servername>\S+)] USER\[(?<username>\S+)]\sGROUP\[(?<groupname>\S+)]\sTOKEN\[]\sAPP\[(?<appname>\S+)]\sJOB\[(?<jobname>\S+)]\sACTION\[\S+@(?<action>.*)]\s\[

Your search:

 index="foo" servername="server.acme.com" action=:start: 
| stats first(_time) AS startTime BY servername 
| append [
   search index="foo" sourcetype="testing" servername="server.acme.com" action=end 
   | stats first(_time) AS endTime BY servername 
] | chart eval(sum(endTime)-sum(startTime)) AS diffTimeInSecs by servername
0 Karma

athorat
Communicator

thanks for the reply @jka154 @sundareshr

I am using the following query to extract job name and Action=:start: or Action=@end

index=aap_prod sourcetype="HDP:PROD:OOZIE"  ("@:start:" OR "@end")  | rex "TOKEN\[\] APP\[(?[^\]]*)"    | rex "ACTION\[[^\@]*(?[^\d\]]*)"

If I append rex "JOB\[?(?[\d-]+)-" | streamstats window=2 current=f range(_time) as diff by jobid | table jobid diff

I get the table but there are jobid's with the same id with value = 0 and then there is one value with the actual duration.
How do we filter only the values with actual duration.
tried using | stats values(jobid) which does not work.

Thanks for looking into this.

Anil.

0 Karma

jkat54
SplunkTrust
SplunkTrust
index=aap_prod sourcetype="HDP:PROD:OOZIE" ("@:start:" OR "@end") 
 | rex field=_raw "(?<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3})\s(?<log_level>\w+)\s(?<unknownfield>\w+):(?<number>\d+)\s-\sSERVER\[(?<servername>\S+)] USER\[(?<username>\S+)]\sGROUP\[(?<groupname>\S+)]\sTOKEN\[]\sAPP\[(?<appname>\S+)]\sJOB\[(?<jobname>\S+)]\sACTION\[\S+@(?<action>.*)]\s\["
  | search action=:start:
  | stats first(_time) AS startTime by servername
  | append [ 
   search index=aap_prod sourcetype="HDP:PROD:OOZIE" ("@:start:" OR "@end") 
   | rex field=_raw "(?<date>\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2},\d{3})\s(?<log_level>\w+)\s(?<unknownfield>\w+):(?<number>\d+)\s-\sSERVER\[(?<servername>\S+)] USER\[(?<username>\S+)]\sGROUP\[(?<groupname>\S+)]\sTOKEN\[]\sAPP\[(?<appname>\S+)]\sJOB\[(?<jobname>\S+)]\sACTION\[\S+@(?<action>.*)]\s\["
   | stats first(_time) AS endTime by servername
  ]
  | chart eval(sum(endTime)-sum(startTime)) AS diffTimeInSecs by servername
0 Karma

sundareshr
Legend

Updated my answer above to show how you can filter out where diff is not calculated.

0 Karma

athorat
Communicator

@sundareshr

I tired | search diff!=0.0 and it worked.
How can I add the Start and the end dates, can you throw some light on it ?

Thanks,

0 Karma

sundareshr
Legend

Try something like this (untested code, assuming there are only two events per JOB)

... | rex "JOB\[?(?<jobid>[\d-]+)-" | streamstats window=2 current=f range(_time) as diff by jobid | table jobid diff | search diff=*

jkat54
SplunkTrust
SplunkTrust

What happens if there are more than 2 events per job?

I like how you simplified by jobid and used streamstats... pretty cool.

0 Karma

athorat
Communicator

@sundareshr

I tired | search diff!=0.0 and it worked.
How can I add the Start and the end dates, can you throw some light on it ?

Thanks,

0 Karma

jkat54
SplunkTrust
SplunkTrust

My search works fine with the sample data you provided. It's not as efficient as sundareshr's search but i believe it handles more scenarios / possibilities. If you want to still see start/end, and diff, and servername...

Edit my chart command at the end of my search on the bottom of my reply

...| chart startTime endTime eval(sum(endTime)-sum(startTime)) AS diffTimeInSecs by servername

0 Karma

jkat54
SplunkTrust
SplunkTrust

only issue is the time's will be in epoch...

0 Karma

sundareshr
Legend

Use this

.... | streamstats current=f window=2 range(_time) as diff latest(_time) as end earliest(_time) as start| table jobid, diff, start, end | eval start=strftime(start, "%c") | search diff=*
0 Karma

athorat
Communicator

@sundareshr is there a way we can we convert the range-> diff to seconds or minutes?

0 Karma

sundareshr
Legend
| eval diff=tostring(diff, "duration")
0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...