Adding a field for the amount of minutes the failure rate is above a certain threshold




I have the bellow search which works out the successes, failures, success_rate, failure_rate and total however I would like to add a field to work out the amount of minutes the failure rate is above a certain threshold for example 20% failure rate however unsure how to do that:

index="main" source="C:\\inetpub\\logs\\LogFiles\\*"
|eval Time = (time_taken/1000)|eval status=case(Time>20,"TimeOut",(sc_status!=200),"HTTP_Error",true(),"Success")|stats sum(Time) as sum_sec,max(Time) as max_sec,count by status,sc_status,host,_time|chart sum(count) by host,status| addcoltotals labelfield=host label="(TOTAL)"| addtotals fieldname=total|eval successes=(total-(timeout+HTTP_Error))|eval failures=(TimeOut+HTTP_Error)|eval success_rate=round((successes/total)*100,2)|eval failure_rate=round((failures/total)*100,2)|table successes failures success_rate failure_rate total


Any help would be greatly appreciated.





0 Karma



without your data it's a little tricky to produce an example, but see this, which sort of simulates what you are doing - hopefully you can map this technique on to your data.

Here I am just using the _audit index and creating a random Time and status.

Note that I am setting 'status' to a numerical value and then using eval(count=X) in the stats statement to get the totals of each status type, which is basically doing what your chart statement is doing.

This is also using bin _time to create a 1 minute granularity for the stats statement and then calculating the failure rate per minute and finally counting the occurrences where the failure rate is > than the threshold, in this example 40%.

| eval Time = random() % 30
| eval r=random() % 100
| eval h=random() % 4, host=host."_".h
| eval sc_status=case(Time>20,0,r<90,200,1==1,404)
| eval status=case(Time>20,1,(sc_status!=200),2,true(),0) 
| bin _time span=1m 
| stats count(eval(status=1)) as Timeout count(eval(status=2)) as HttpError count(eval(status=0)) as Success sum(Time) as sum_sec,max(Time) as max_sec,count by host,_time
| eval failures=Timeout+HttpError
| eval total=failures+Success
| eval failure_rate=round((failures/total)*100,2)
| stats sum(eval(if(failure_rate>40,1,0))) as AboveThreshold by host

 Hope this helps you to get where you are going.


0 Karma
*NEW* Splunk Love Promo!
Snag a $25 Visa Gift Card for Giving Your Review!

It's another Splunk Love Special! For a limited time, you can review one of our select Splunk products through Gartner Peer Insights and receive a $25 Visa gift card!


Or Learn More in Our Blog >>