How can I get a timeline of the percentage of a pa...

Esperteyu · ‎10-13-2018

Hi,

I'm trying to get a timeline of the percentage of a particular error code among the total of logs. And, based on some of the questions around the topic, I've been essentially trying two approaches. The first one I'm not even able to get working:

index="logger" "Log Info" 
| bin _time span=1h
| eval _raw=replace(_raw,"\\\\\"","\"")
| rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
| eval json= raw_request
| spath input=json output=country_code path=customer.billingAddress.countryCode
| spath input=json output=reason_code path=history{}.reasonCode
| eval error=if(like(reason_code,"%X201%"),1,0)
| stats count as total by _time, country_code
| eventstats sum(error) as errors by _time
| eval error_percentage=100*errors/total 
| timechart span=1h max(error_percentage) by country_code

Errors is not even populated (edit, placing the eval error... after the stats count... populates it but shows 0 in all cases)

Looks that the second approach works:

index="logger" "Log Info"
| eval _raw=replace(_raw,"\\\\\"","\"")
| rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
| eval json= raw_request
| spath input=json output=country_code path=customer.billingAddress.countryCode
| spath input=json output=reason_code path=history{}.reasonCode
| eval error=if(like(reason_code,"%X201%"),1,0)
| streamstats count as total sum(error) as errors by country_code
| timechart span=1h max(eval(100*errors/total)) as error_percentage by country_code usenull=f useother=f

But, if I change max for avg , I don't get the same result. I think that should be the case if I had one entry per bucket and country_code as was my intention

If I could get some directions around this I would really appreciate that as well if there's an easy way to avoid displaying all the country_codes and just display the ones that at some point of the timeline displayed have gotten an error

Thanks

renjith_nair · ‎10-13-2018

@Esperteyu,

Based on your searches, you are considering only the error codes "%X201%" . Also the percentage is based on the errors and not on the total count of country codes, below should gives us a list of country_codes and reason_codes where there is an error exists

 index="logger" "Log Info"
 | eval _raw=replace(_raw,"\\\\\"","\"")
 | rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
 | eval json= raw_request
 | spath input=json output=country_code path=customer.billingAddress.countryCode
 | spath input=json output=reason_code path=history{}.reasonCode
 | eval error=if(like(reason_code,"%X201%"),1,0)
 | where error==1

Now we plot the count of errors in a timechart

 index="logger" "Log Info"
 | eval _raw=replace(_raw,"\\\\\"","\"")
 | rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
 | eval json= raw_request
 | spath input=json output=country_code path=customer.billingAddress.countryCode
 | spath input=json output=reason_code path=history{}.reasonCode
 | eval error=if(like(reason_code,"%X201%"),1,0)
 | where error==1
 | timechart span=1h count by country_code

Now we calculate the percentage for each of the country code

 index="logger" "Log Info"
 | eval _raw=replace(_raw,"\\\\\"","\"")
 | rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
 | eval json= raw_request
 | spath input=json output=country_code path=customer.billingAddress.countryCode
 | spath input=json output=reason_code path=history{}.reasonCode
 | eval error=if(like(reason_code,"%X201%"),1,0)
 | where error==1
 | timechart span=1h count by country_code
 | addtotals row=true fieldname=_Total
 | foreach * [eval <<FIELD>>=<<FIELD>> * 100 / _Total]

Lets know if it works

Happy Splunking!

Esperteyu · ‎10-14-2018

Thanks @renjith.nair, I'm not able to follow the solution specially what is doing in the last line with the equality of the <> but in any case it's not giving a reasonable result. I'm not sure if I had not explained properly my question though, I'm after number of X201\total grouped by country_code (the removal of the ones with no errors would be a nice to have). In the case of streamstats I can't understand why max and avg differ, in the case of the first option I'm not sure of following you, I thought eventstats would append a integer to each of the events and then the sum could be grouped by country_code in the same way as the total

renjith_nair · ‎10-14-2018

@Esperteyu, ok let me try to explain
- In the first two SPLs, we have calculated count of errors by country_code in a time window of 1 hr. Could you please confirm if you are able to get the expected values there?

sample result

    _time                            US UK AU
    2018-10-14 10:15:00   2   4   6
     2018-10-14 11:15:00  3   5    7

Using addtotals we add up count of all country's for that time window which 12 for first row and 15 for second row
Then using foreach, we traverse through the columns and divide each count by the total of that row which gives you a % for that time window i.e. 2*100/12 , 4*100/12, 6*100/12 etc.

If this is not what you are looking for, kindly provide some sample data and then the expected output.

Thanks!

Happy Splunking!

Esperteyu · ‎10-14-2018

I have tried to answer to this comment but it has to be reviewed by a moderator (I think it might be because of the "tables" I put), a shorter response with no tables: what I am trying to get is the ratio error/total on a country basis i.e. error per country / total per country in a given period of time (will need to monitor in such running percentage for spikes)

I was playing around now with this query, which is simpler but works independently of using max or avg and I think it might do the trick although need to double check with enough data

index="logger" Log Info
 | bin _time span=1h
 | eval _raw=replace(_raw,"\\\\\"","\"")
 | rex "\"RawRequest\":\"(?<raw_request>.+)\"}$"
 | eval json= raw_request
 | spath input=json output=country_code path=customer.billingAddress.countryCode
 | spath input=json output=reason_code path=history{}.reasonCode
 | eval error=if(like(reason_code,"%X201%"),1,0) 
 | stats count as total sum(error) as errors by _time, country_code
 | eval error_percentage=100*errors/total 
 | timechart span=1h max(error_percentage) by country_code

Sorry for not explained it properly and thanks for your patience @renjith.nair

renjith_nair · ‎10-14-2018

Alright @Esperteyu . In the above SPL, don't you want the total and sum interchanged? i.e. count as errors , sum(error) as total.

Happy Splunking!

Esperteyu · ‎10-14-2018

@renjith.nair you make me doubt now, but I understand the count will bring all the rows and the sum(error) will bring the ones which actually have the "X201" substring so I think it should be like this

renjith_nair · ‎10-14-2018

Ahh sorry for that. I was still thinking about the rows with errors only (1)but you have both 0 and 1 in errors

Happy Splunking!

Esperteyu · ‎10-14-2018

I see, my bad that I didn't explain it properly then. I really appreciate your patience @renjith.nair
What I am trying to get is the ratio errors/total per country, something like this (as it's returned from the streamstats versions)

_time ARE AUS AUT BGR BRA USA
t1 0 0.17 0 0.28
t2 0.18 0 0.28
t3 0 0.18 0 0.28
t4 0 0.19 0 0 0 0.28
t5 0 0.21 0 0 0.28

I guess another acceptable format would be

_time country_code total errors percentage
t1 UK 100 20 20
t1 US 50 5 10
t2 UK 10 3 30
t2 US 40 10 25

With the view of displaying them in the timeline to see how that percentage averages, spikes (I think that brings the concept of running percentage) on a country basis

How can I get a timeline of the percentage of a particular error code among a total of logs coming into Splunk?

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor