Other Usage

Adding a field for the amount of minutes the failure rate is above a certain threshold

joe06031990
Communicator

Hi,

 

I have the bellow search which works out the successes, failures, success_rate, failure_rate and total however I would like to add a field to work out the amount of minutes the failure rate is above a certain threshold for example 20% failure rate however unsure how to do that:

index="main" source="C:\\inetpub\\logs\\LogFiles\\*"
|eval Time = (time_taken/1000)|eval status=case(Time>20,"TimeOut",(sc_status!=200),"HTTP_Error",true(),"Success")|stats sum(Time) as sum_sec,max(Time) as max_sec,count by status,sc_status,host,_time|chart sum(count) by host,status| addcoltotals labelfield=host label="(TOTAL)"| addtotals fieldname=total|eval successes=(total-(timeout+HTTP_Error))|eval failures=(TimeOut+HTTP_Error)|eval success_rate=round((successes/total)*100,2)|eval failure_rate=round((failures/total)*100,2)|table successes failures success_rate failure_rate total

 

Any help would be greatly appreciated.

 

Thanks,

 

Joe

0 Karma

bowesmana
SplunkTrust
SplunkTrust

@joe06031990 

without your data it's a little tricky to produce an example, but see this, which sort of simulates what you are doing - hopefully you can map this technique on to your data.

Here I am just using the _audit index and creating a random Time and status.

Note that I am setting 'status' to a numerical value and then using eval(count=X) in the stats statement to get the totals of each status type, which is basically doing what your chart statement is doing.

This is also using bin _time to create a 1 minute granularity for the stats statement and then calculating the failure rate per minute and finally counting the occurrences where the failure rate is > than the threshold, in this example 40%.

index=_audit
| eval Time = random() % 30
| eval r=random() % 100
| eval h=random() % 4, host=host."_".h
| eval sc_status=case(Time>20,0,r<90,200,1==1,404)
| eval status=case(Time>20,1,(sc_status!=200),2,true(),0) 
| bin _time span=1m 
| stats count(eval(status=1)) as Timeout count(eval(status=2)) as HttpError count(eval(status=0)) as Success sum(Time) as sum_sec,max(Time) as max_sec,count by host,_time
| eval failures=Timeout+HttpError
| eval total=failures+Success
| eval failure_rate=round((failures/total)*100,2)
| stats sum(eval(if(failure_rate>40,1,0))) as AboveThreshold by host

 Hope this helps you to get where you are going.

 

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...