Splunk Search

Removing a subset of data from average calculation

Explorer

Hi everyone,

I need some help figuring out how can I exclude certain users' data from my calculation of average of a field. Please also refer to the screenshot.

I have 2 separate queries that give me the raw materials to make this happen but I don't know how to combine them.

The source of my data is a csv file. Here is the first query which is used to calculate the average:

| inputlookup uao0nqok.csv
| where read_seconds > 0
| stats avg(read_seconds)

My second query helps me figure out which users I want excluded from my source data (but that's where my knowledge stops; I don't actually know how to exclude them)

I want to exclude all users where ALL of their values are either 20 or -1. In this context, -1 means that the record contains bad data. 20 is actually the best possible value, but if 100% of records are "best score", that is a bit suspicious so I need to exclude them.

This is the second query; it shows me the data for all users (count of how many of their records are of a particular value...in this case the value is seconds). From the data for all users, I can visually see which users I want to include in my average calculation and which I want to exclude but I don't know how to do that inclusion/exclusion via code:

| inputlookup uao0nqok.csv
| eval user_only = substr(custom_type,0,17)  
| eval first_char = substr(custom_type,0,1)
| where first_char = "X" or first_char = "Y"
| chart count(eval(read_seconds)) over user_only limit=0 by read_seconds

Now, I need one query which gives me the average of "read_seconds" for all the users that I don't want excluded. Any help will be greatly appreciated!

Here is a screenshot to help visualize the situation:

alt text

0 Karma
1 Solution

SplunkTrust
SplunkTrust

Hi @skribble5

Give this a try:

 | inputlookup uao0nqok.csv
 | eval user_only = substr(custom_type,0,17)  
 | eval first_char = substr(custom_type,0,1)
 | where first_char = "X" or first_char = "Y"  
 | eval goodTiming = if(read_seconds != "-1" OR read_seconds != "20", 1, 0)
 | stats max(goodTiming) as goodTiming sum(read_seconds) as read_seconds count as count by user_only
 | search goodTiming=1
 | stats sum(read_seconds) as read_seconds sum(count) as count
 | eval avg = read_seconds / count

All the best!

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Hi @skribble5

Give this a try:

 | inputlookup uao0nqok.csv
 | eval user_only = substr(custom_type,0,17)  
 | eval first_char = substr(custom_type,0,1)
 | where first_char = "X" or first_char = "Y"  
 | eval goodTiming = if(read_seconds != "-1" OR read_seconds != "20", 1, 0)
 | stats max(goodTiming) as goodTiming sum(read_seconds) as read_seconds count as count by user_only
 | search goodTiming=1
 | stats sum(read_seconds) as read_seconds sum(count) as count
 | eval avg = read_seconds / count

All the best!

View solution in original post

0 Karma

Explorer

Thanks @chrisyoungerjds. I ran your code (with one change...i removed the OR in the if statement and just kept the if (read_seconds !="20") ... I decided to manage the "-1" separately) on a smaller subset of my data and the got the expected results.

Thanks again. This was a big help!

0 Karma

Explorer

Thanks @chrisyoungerjds for looking into this and responding.

I tried your code and it has given me an average number which is lower than my initial average...which is GOOD because now we are excluding certain records in calculation. My initial average was 11.8 seconds and now the average is 10.7 seconds.

I will do some testing manually to confirm if the new average is correct. Thanks again for looking into this.

I will confirm shortly.

0 Karma