Splunk Search

How can I get a timeline of the percentage of a particular error code among a total of logs coming into Splunk?

Explorer

Hi,

I'm trying to get a timeline of the percentage of a particular error code among the total of logs. And, based on some of the questions around the topic, I've been essentially trying two approaches. The first one I'm not even able to get working:

index="logger" "Log Info" 
| bin _time span=1h
| eval _raw=replace(_raw,"\\\\\"","\"")
| rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
| eval json= raw_request
| spath input=json output=country_code path=customer.billingAddress.countryCode
| spath input=json output=reason_code path=history{}.reasonCode
| eval error=if(like(reason_code,"%X201%"),1,0)
| stats count as total by _time, country_code
| eventstats sum(error) as errors by _time
| eval error_percentage=100*errors/total 
| timechart span=1h max(error_percentage) by country_code

Errors is not even populated (edit, placing the eval error... after the stats count... populates it but shows 0 in all cases)

Looks that the second approach works:

index="logger" "Log Info"
| eval _raw=replace(_raw,"\\\\\"","\"")
| rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
| eval json= raw_request
| spath input=json output=country_code path=customer.billingAddress.countryCode
| spath input=json output=reason_code path=history{}.reasonCode
| eval error=if(like(reason_code,"%X201%"),1,0)
| streamstats count as total sum(error) as errors by country_code
| timechart span=1h max(eval(100*errors/total)) as error_percentage by country_code usenull=f useother=f

But, if I change max for avg , I don't get the same result. I think that should be the case if I had one entry per bucket and country_code as was my intention

If I could get some directions around this I would really appreciate that as well if there's an easy way to avoid displaying all the country_codes and just display the ones that at some point of the timeline displayed have gotten an error

Thanks

0 Karma

SplunkTrust
SplunkTrust

@Esperteyu,

Based on your searches, you are considering only the error codes "%X201%" . Also the percentage is based on the errors and not on the total count of country codes, below should gives us a list of countrycodes and reasoncodes where there is an error exists

 index="logger" "Log Info"
 | eval _raw=replace(_raw,"\\\\\"","\"")
 | rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
 | eval json= raw_request
 | spath input=json output=country_code path=customer.billingAddress.countryCode
 | spath input=json output=reason_code path=history{}.reasonCode
 | eval error=if(like(reason_code,"%X201%"),1,0)
 | where error==1

Now we plot the count of errors in a timechart

 index="logger" "Log Info"
 | eval _raw=replace(_raw,"\\\\\"","\"")
 | rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
 | eval json= raw_request
 | spath input=json output=country_code path=customer.billingAddress.countryCode
 | spath input=json output=reason_code path=history{}.reasonCode
 | eval error=if(like(reason_code,"%X201%"),1,0)
 | where error==1
 | timechart span=1h count by country_code

Now we calculate the percentage for each of the country code

 index="logger" "Log Info"
 | eval _raw=replace(_raw,"\\\\\"","\"")
 | rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
 | eval json= raw_request
 | spath input=json output=country_code path=customer.billingAddress.countryCode
 | spath input=json output=reason_code path=history{}.reasonCode
 | eval error=if(like(reason_code,"%X201%"),1,0)
 | where error==1
 | timechart span=1h count by country_code
 | addtotals row=true fieldname=_Total
 | foreach * [eval <<FIELD>>=<<FIELD>> * 100 / _Total]

Lets know if it works

Explorer

Thanks @renjith.nair, I'm not able to follow the solution specially what is doing in the last line with the equality of the <> but in any case it's not giving a reasonable result. I'm not sure if I had not explained properly my question though, I'm after number of X201\total grouped by countrycode (the removal of the ones with no errors would be a nice to have). In the case of streamstats I can't understand why max and avg differ, in the case of the first option I'm not sure of following you, I thought eventstats would append a integer to each of the events and then the sum could be grouped by countrycode in the same way as the total

0 Karma

SplunkTrust
SplunkTrust

@Esperteyu, ok let me try to explain
- In the first two SPLs, we have calculated count of errors by country_code in a time window of 1 hr. Could you please confirm if you are able to get the expected values there?

sample result

    _time                            US UK AU
    2018-10-14 10:15:00   2   4   6
     2018-10-14 11:15:00  3   5    7

Using addtotals we add up count of all country's for that time window which 12 for first row and 15 for second row
Then using foreach, we traverse through the columns and divide each count by the total of that row which gives you a % for that time window i.e. 2100/12 , 4100/12, 6*100/12 etc.

If this is not what you are looking for, kindly provide some sample data and then the expected output.

Thanks!

0 Karma

Explorer

I have tried to answer to this comment but it has to be reviewed by a moderator (I think it might be because of the "tables" I put), a shorter response with no tables: what I am trying to get is the ratio error/total on a country basis i.e. error per country / total per country in a given period of time (will need to monitor in such running percentage for spikes)

I was playing around now with this query, which is simpler but works independently of using max or avg and I think it might do the trick although need to double check with enough data

index="logger" Log Info
 | bin _time span=1h
 | eval _raw=replace(_raw,"\\\\\"","\"")
 | rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
 | eval json= raw_request
 | spath input=json output=country_code path=customer.billingAddress.countryCode
 | spath input=json output=reason_code path=history{}.reasonCode
 | eval error=if(like(reason_code,"%X201%"),1,0) 
 | stats count as total sum(error) as errors by _time, country_code
 | eval error_percentage=100*errors/total 
 | timechart span=1h max(error_percentage) by country_code

Sorry for not explained it properly and thanks for your patience @renjith.nair

0 Karma

SplunkTrust
SplunkTrust

Alright @Esperteyu . In the above SPL, don't you want the total and sum interchanged? i.e. count as errors , sum(error) as total.

0 Karma

Explorer

@renjith.nair you make me doubt now, but I understand the count will bring all the rows and the sum(error) will bring the ones which actually have the "X201" substring so I think it should be like this

0 Karma

SplunkTrust
SplunkTrust

Ahh sorry for that. I was still thinking about the rows with errors only (1)but you have both 0 and 1 in errors

0 Karma

Explorer

I see, my bad that I didn't explain it properly then. I really appreciate your patience @renjith.nair
What I am trying to get is the ratio errors/total per country, something like this (as it's returned from the streamstats versions)

_time ARE AUS AUT BGR BRA USA
t1 0 0.17 0 0.28
t2 0.18 0 0.28
t3 0 0.18 0 0.28
t4 0 0.19 0 0 0 0.28
t5 0 0.21 0 0 0.28

I guess another acceptable format would be

time countrycode total errors percentage
t1 UK 100 20 20
t1 US 50 5 10
t2 UK 10 3 30
t2 US 40 10 25

With the view of displaying them in the timeline to see how that percentage averages, spikes (I think that brings the concept of running percentage) on a country basis

0 Karma