Splunk Search

How to display SLA status based on SLA times defined in lookup file?

bud9
New Member

Lookup file sla_jobs.csv:

Business    AppName  RunDays  BatchStartJob AvgBatchStartTime  BatchEndJob  SLA_time SameDayFlag
Retail      App1     Monday    APP1T001       21:30             APP1T099      03:00    1
Retail      App1     Tuesday   APP1T001       21:30             APP1T099      03:00    1
Medicine    App2     Saturday  APP2T002       20:00             APP2T099      23:00    0
Medicine    App2     Sunday    APP2T002       20:00             APP2T099      23:00    0

Lookup file contains the SLA times for each appname, business and the respective start, end jobs for each of the batch and the day of week they run. SameDayFlag indicates if the SLA_time is for same day or if it can fall to next day. For Eg, for App1 the batch starts usually at 21:30 and runs only on Mon and Tues, but the SLA is till next day 03:00. So for Tuesday's batch the SLA is till Wed 03:00.

Events in the index :

JobName   JobStartTime    JobEndTime     JobStatus
APP1T099                  20190130 0130   COMPLETED
APP1T099  20190130 0100                   STARTED
APP1T001                  20190129 2230   COMPLETED
APP1T001  20190129 2200                   STARTED

As and when the jobs start and complete, the events show with JobStatus as 'STARTED', 'COMPLETED' or 'FAILED'.

Output: Assuming current time is Jan-30-2019(Wednesday) 03:00, for each appname the SLA_Status can be 'Met' or 'Missed' as shown below:

Business  AppName  SLA_Status
Retail    App1     Met
Medicine  App2     Doesnt Run

For the sameday as above, if the batch starts beyond the 'AvgBatchStartTime' field from lookup, the SLA_Status needs to show up as 'Running Late'. Eg.: If APP1T001 starts after 21:30, then the SLA status must show '

Business  AppName  SLA_Status
Retail    App1     Running Late
Medicine  App2     Doesnt Run

Since the question is too long, I will reply to this post the search query I was working on.Please feel free to modify the lookup file if needed. Any help is appreciated. Thanks in advance.

0 Karma

woodcock
Esteemed Legend

OK, this is NOT a complete answer; it is maybe 92%. In any case, it should give you all of the examples and tools that you will need to walk it to the finish line. Be aware that the | append ... lines would be better as a | lookup sla_jobs.csv JobName AS BatchStartJob ... but this was easier for me to work with:

|makeresults | eval raw="JobName=APP1T099,JobEndTime=20190130T0130,JobStatus=COMPLETED JobName=APP1T099,JobStartTime=20190130T0100,JobStatus=STARTED JobName=APP1T001,JobEndTime=20190129T2230,JobStatus=COMPLETED JobName=APP1T001,JobStartTime=20190129T2200,JobStatus=STARTED"
| rename COMMENT AS "As and when the jobs start and complete, the events show with JobStatus as 'STARTED', 'COMPLETED' or 'FAILED'."
| makemv raw
| mvexpand raw
| rename raw AS _raw
| kv
| eval which="EVENTS"
| foreach Job*Time [ rex field=<<FIELD>> mode=sed "s/T/ /" | eval _time=coalesce(_time, strptime(<<FIELD>>, "%Y%m%d %H%M")) ]
| sort 0 - _time
| eval RunDay = strftime(_time, "%A")
| stats first(*) AS * BY JobName
| eval Run_time = strptime(JobEndTime, "%Y%m%d %H%M") - strptime(JobStartTime, "%Y%m%d %H%M")
| append [|makeresults | eval raw="Business=Retail,AppName=App1,RunDays=Monday,BatchStartJob=APP1T001,AvgBatchStartTime=21:30,BatchEndJob=APP1T099,SLA_time=03:00,SameDayFlag=1 Business=Retail,AppName=App1,RunDays=Tuesday,BatchStartJob=APP1T001,AvgBatchStartTime=21:30,BatchEndJob=APP1T099,SLA_time=03:00,SameDayFlag=1 Business=Medicine,AppName=App2,RunDays=Saturday,BatchStartJob=APP2T002,AvgBatchStartTime=20:00,BatchEndJob=APP2T099,SLA_time=23:00,SameDayFlag=0 Business=Medicine,AppName=App2,RunDays=Sunday,BatchStartJob=APP2T002,AvgBatchStartTime=20:00,BatchEndJob=APP2T099,SLA_time=23:00,SameDayFlag=0"
| rename COMMENT AS "Lookup file file sla_jobs.csv contains the SLA times for each appname, business and the respective start, end jobs
for each of the batch and the day of week they run.
SameDayFlag indicates if the SLA_time is for same day or if it can fall to next day.
For example, for App1 the batch starts usually at 21:30 and runs only on Mon and Tues,
but the SLA is till next day 03:00. So for Tuesday's batch the SLA is till Wed 03:00."
| makemv raw
| mvexpand raw
| rename raw AS _raw
| rex field=SLA_time "^(?<hours>\d+):(?<minutes>\d+)$"
| eval SLA_time = 60 * (hours + (60 * minutes))
| eval which="LOOKUP"]
| kv
| fields - _raw _time
| rex field=SLA_time "^(?<hours>\d+):(?<minutes>\d+)$"
| eval SLA_time = coalesce(60 * (minutes + (60 * hours)), SLA_time)
| fields - hours minutes
| multireport
[ eval JOINER = coalesce(JobName, BatchStartJob) | stats list(*) AS * BY JOINER | eval which="BatchStartJob" ]
[ eval JOINER = coalesce(JobName, BatchEndJob) | stats list(*) AS * BY JOINER | eval which="BatchEndJob" ]
| where isnotnull(JobStatus) AND isnotnull(AppName)
| eval zeroBasePos=mvfind(RunDays, RunDay)
| foreach A* B* RunDays SLA_time SameDayFlag [ eval <<FIELD>> = mvindex(<<FIELD>>, zeroBasePos) ]
| rex field=JobStartTime mode=sed "s/^\d+\s+//"
| rex field=AvgBatchStartTime mode=sed "s/://"
| eval Status = case(
   (JobStartTime > AvgBatchStartTime) AND (Run_time > SLA_time),  "Running Late AND Not Met",
   (JobStartTime > AvgBatchStartTime) AND (Run_time <= SLA_time), "Running Late BUT Met",
   Run_time > SLA_time,              "Not Met",
   true(),                           "Met")
0 Karma

bud9
New Member

PFB the query i was working on:
I started with a join query so that i could search for Jobs only present in my lookup file, but was having problems with it so dropped it:

index="test"
| lookup sla_jobs.csv BatchEndJob as JobName OUTPUTNEW Business as EndBusiness,
                             AppName as EndAppName,
                             RunDays as EndRunDays,
                             AvgBatchStartTime as EndAvgBatchStartTime,
                             BatchEndJob as EndBatchEndJob,
                             SLA_time as EndSLA_time,
                             SameDayFlag as EndSameDayFlag
| eval JobEndTime_HH=tonumber(strftime(strptime(JobEndTime,"Y%m%d %H%M"),"%H"))*60,
       JobEndTime_MM=tonumber(strftime(strptime(JobEndTime,"Y%m%d %H%M"),"%M"))
       JobEndTime_Total = JobEndTime_HH + JobEndTime_MM
|eval SLA_HH=tonumber(strftime(strptime(SLA,"%H:%M"),"%H"))*60,
          SLA_MM=tonumber(strftime(strptime(SLA,"%H:%M"),"%M")),
          SLA_Total=SLA_HH+SLA_MM
| eval SLA_Status = if(EndSameDayFlag==0 AND JobStatus=="COMPLETE" AND JobEndTime_Total < SLA_Total,"Met","Missed")
0 Karma

sdchakraborty
Contributor

Hi,

I did some minor changes to the lookup, current structure is below,

Business    AppName RunDays JobName AvgBatchStartTime   JobType SLA_time    SameDayFlag
Retail  App1    Monday  APP1T001    21:30   BatchStart  03:00   1
Retail  App1    Monday  APP1T099    21:30   BatchEnd    03:00   1
Retail  App1    Tuesday APP1T001    21:30   BatchStart  03:00   1
Retail  App1    Tuesday APP1T099    21:30   BatchEnd    03:00   1
Medicine    App2    Saturday    APP2T002    20:00   BatchStart  23:00   0
Medicine    App2    Saturday    APP2T099    20:00   BatchEnd    23:00   0
Medicine    App2    Sunday  APP2T002    20:00   BatchStart  23:00   0
Medicine    App2    Sunday  APP2T099    20:00   BatchEnd    23:00   0

Also I Indexed your data in main index. Below is the sample query. I did it only for SLA status "Running Late". Other status you can code similarly.

index="main"
| table JobName,JobStartTime,JobEndTime,JobStatus
| lookup sla_jobs JobName as JobName OUTPUTNEW Business,AppName,RunDays,JobType,AvgBatchStartTime,SLA_time,SameDayFlag
| eval AppName=mvdedup(AppName), AvgBatchStartTime=mvdedup(AvgBatchStartTime), Business=mvdedup(Business),JobType=mvdedup(JobType),SLA_time=mvdedup(SLA_time),SameDayFlag=mvdedup(SameDayFlag)
| rex field=JobStartTime "(?<job_start_day>\d{8})\s+(?<job_start_hr>\d{2})(?<job_start_mi>\d{2})"
| rex field=JobEndTime "(?<job_end_day>\d{8})\s+(?<job_end_hr>\d{2})(?<job_end_mi>\d{2})"
| eval job_start_hr_mi = job_start_hr + ":" + job_start_mi, job_start_hr_mi_epoc = strptime(job_start_hr_mi,"%H:%M"),job_end_hr_mi = job_end_hr + ":" + job_end_mi, job_end_hr_mi_epoc = strptime(job_end_hr_mi,"%H:%M"), SLA_time_epoc = strptime(SLA_time,"%H:%M"), batch_start_time_epoc = strptime(AvgBatchStartTime,"%H:%M")
| eval job_start_weekday=strftime(strptime(job_start_day,"%Y%m%d"),"%A"), job_end_weekday=strftime(strptime(job_end_day,"%Y%m%d"),"%A")
| eval SLA_status = case(job_start_hr_mi_epoc > batch_start_time_epoc,"Running Late")
| table  Business,AppName,job_start_weekday,SLA_status
| search job_start_weekday = "*"
| rename job_start_weekday as "Batch Start Day"

Sid

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...