Splunk Search

Alternate to EventStats | Using EventStats for a longer duration results in data loss

nishantberiwal
New Member

Hi Team,

Is there an alternative to count all the events to 'eventstats' using it results in data loss if exceed the duration only works fine until 6 or 7 hours so.

My search is basis 'transactionStatus = all', it works fine if I specify it as 'Failed' but using fail it doesnt calculate percentage by Total...but only by Total failed events or data count.

My BaseSearch....  **transactionStatus=***  earliest=-**1d**
| eventstats count as "Total"
| rex field=_raw "Error response from Sabre pad for getDetails : ErrorType : BusinessLogic Code : ERR.SWS.HOST.ERROR_IN_RESPONSE, Message\s*:\s*(?.+)" 
| rex field=vldCxrs "carrierCode=(?..)" 
| search carrierCode=*
| stats count(eval(transactionStatus="FAILED")) as "Failed_Count" by Total, ErrorMsg, tpid, carrierCode
| eval Error_Percent=round((Failed_Count/Total)*100,2)
| fields tpid, carrierCode, ErrorMsg, Error_Percent, Failed_Count, Total
| sort -Failed_Count

Regards
Nishant

0 Karma

nishantberiwal
New Member

The Join query on 3 days not only returned results but also took about 430 KB compared to single query which is not able to handle 3days plus consumes 450 MB

0 Karma

nishantberiwal
New Member

Lastly - This query could not keep up when did 3 days, but it was a great alternative.....The only alternate way I found to make this work is by joining 3 different queries where 'Total' is calculated separately and 'ErrorMsg' is counted separately.

0 Karma

nishantberiwal
New Member

Can you tell what does this "first(Total) as Total" do.

its also working if I simply do 'by Total'

0 Karma

nishantberiwal
New Member

Actually ignore me the first solution worked, there need to be 'vldcxrs' also in the fields line

0 Karma

nishantberiwal
New Member

Thanks, I am using the first solution where when adding this line

first(Total) as Total by ErrorMsg, tpid, carrierCode

It resulting in no results to be displayed for some reason.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Okay, I marked your code, but a couple of html-like features were stripped out, so be skeptical of this code and validate it carefully.

I've made two changes here. First, I've added a line immediately after the first pipe that limits the fields extracted to the ones you need. Second, I've moved "Total" in the stats command to not being a "by" field. Each of those should get you some speed.

 My BaseSearch....  **transactionStatus=***  earliest=-**1d**
 | fields  _raw  transactionStatus ErrorMsg tpid 
 | eventstats count as "Total"
 | rex field=_raw "Error response from Sabre pad for getDetails : ErrorType : BusinessLogic Code : ERR.SWS.HOST.ERROR_IN_RESPONSE, Message\s*:\s*(?<vldCxrs>.+)" 
 | rex field=vldCxrs "carrierCode=(?<carrierCode>..)" 
 | search carrierCode=*
 | stats count(eval(transactionStatus="FAILED")) as "Failed_Count" first(Total) as Total by ErrorMsg, tpid, carrierCode
 | eval Error_Percent=round((Failed_Count/Total)*100,2)
 | fields tpid, carrierCode, ErrorMsg, Error_Percent, Failed_Count, Total
 | sort - Failed_Count

Here's another way that might work faster, depending on data. Here, instead of calculating Total with an early eventstats, we retain records without a carrierCode and then calculate Total with a late one, and THEN drop the records with no carrierCode.

   My BaseSearch....  **transactionStatus=***  earliest=-**1d**
   | fields  _raw  transactionStatus ErrorMsg tpid 
   | rex field=_raw "Error response from Sabre pad for getDetails : ErrorType : BusinessLogic Code : ERR.SWS.HOST.ERROR_IN_RESPONSE, Message\s*:\s*(?<vldCxrs>.+)" 
   | rex field=vldCxrs "carrierCode=(?<carrierCode>..)" 
   | eval carrierCode=coalesce(carrierCode, "NONE")
   | stats count(eval(transactionStatus="FAILED")) as "Failed_Count" count as allCount by ErrorMsg, tpid, carrierCode
   | eventstats sum(allcount) as Total 
   | where carrierCode!="NONE"
   | eval Error_Percent=round((Failed_Count/Total)*100,2)
   | fields tpid, carrierCode, ErrorMsg, Error_Percent, Failed_Count, allCount, Total
   | sort - Failed_Count
0 Karma