Other Usage

Adding a field for the amount of minutes the failure rate is above a certain threshold

joe06031990
Communicator

Hi,

 

I have the bellow search which works out the successes, failures, success_rate, failure_rate and total however I would like to add a field to work out the amount of minutes the failure rate is above a certain threshold for example 20% failure rate however unsure how to do that:

index="main" source="C:\\inetpub\\logs\\LogFiles\\*"
|eval Time = (time_taken/1000)|eval status=case(Time>20,"TimeOut",(sc_status!=200),"HTTP_Error",true(),"Success")|stats sum(Time) as sum_sec,max(Time) as max_sec,count by status,sc_status,host,_time|chart sum(count) by host,status| addcoltotals labelfield=host label="(TOTAL)"| addtotals fieldname=total|eval successes=(total-(timeout+HTTP_Error))|eval failures=(TimeOut+HTTP_Error)|eval success_rate=round((successes/total)*100,2)|eval failure_rate=round((failures/total)*100,2)|table successes failures success_rate failure_rate total

 

Any help would be greatly appreciated.

 

Thanks,

 

Joe

0 Karma

bowesmana
SplunkTrust
SplunkTrust

@joe06031990 

without your data it's a little tricky to produce an example, but see this, which sort of simulates what you are doing - hopefully you can map this technique on to your data.

Here I am just using the _audit index and creating a random Time and status.

Note that I am setting 'status' to a numerical value and then using eval(count=X) in the stats statement to get the totals of each status type, which is basically doing what your chart statement is doing.

This is also using bin _time to create a 1 minute granularity for the stats statement and then calculating the failure rate per minute and finally counting the occurrences where the failure rate is > than the threshold, in this example 40%.

index=_audit
| eval Time = random() % 30
| eval r=random() % 100
| eval h=random() % 4, host=host."_".h
| eval sc_status=case(Time>20,0,r<90,200,1==1,404)
| eval status=case(Time>20,1,(sc_status!=200),2,true(),0) 
| bin _time span=1m 
| stats count(eval(status=1)) as Timeout count(eval(status=2)) as HttpError count(eval(status=0)) as Success sum(Time) as sum_sec,max(Time) as max_sec,count by host,_time
| eval failures=Timeout+HttpError
| eval total=failures+Success
| eval failure_rate=round((failures/total)*100,2)
| stats sum(eval(if(failure_rate>40,1,0))) as AboveThreshold by host

 Hope this helps you to get where you are going.

 

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...