Solved: Trimmed Average Calculation

zacksoft · ‎05-02-2019

host = Mayhem
sourcetype="phutans:servo" host=R00878
| eval headers=split(_raw," ")
| eval plant_length=mvindex(headers,10) | sort 0 plant_length

(I am trying to calculate mean plant_lenght here but in a different way)

Let me put it as example :
say we get 10 values of plant_length. 1, 2, 2, 3, 4, 4, 4, 5, 20
Ideally to get mean, we would do (1+2+2+3+4+4+4++5+20)/10
But I don't want it this way. What I am looking for is to drop 10% of data in each tail and divide by number of values left.

Like this : (2+2+3+4+4+4++5)/8
In the numerator here the bottom 10% data (1) and top 10% of data(20) is dropped , so we have 8 vales left, hence the denominator is 8.

(In the above example as we have 10 values so 10% of 10 becomes 1. Therefore 1 value is dropped from top end and 1 value from bottom end.) And after dropping 2 values the denominator has to be 8 , not 10)

I could use some help in implementing the above scenario.

Sukisen1981 · ‎05-02-2019

I have bulit out this on action_count on the _audit index, so the code works as is for you. Now I have renamed action_count as x and removed the top and bottom 10% values, the left over values of x is what you need to use to form your average.
WARNING - Run this for last 7 days to have a large data set AND the values keep updating real time so look at maxcount and perc. macount is your total at any point in time and perc is what you need to trim from the bottom and top at any point of time..hope it helps

View solution in original post

Sukisen1981 · ‎05-02-2019

I have bulit out this on action_count on the _audit index, so the code works as is for you. Now I have renamed action_count as x and removed the top and bottom 10% values, the left over values of x is what you need to use to form your average.
WARNING - Run this for last 7 days to have a large data set AND the values keep updating real time so look at maxcount and perc. macount is your total at any point in time and perc is what you need to trim from the bottom and top at any point of time..hope it helps

zacksoft · ‎05-03-2019

stats value(available_count) ...gives me the unique entries only.
if I have six values such as 2,2,3,3,4,4. It only shows 2,3,4

Sukisen1981 · ‎05-03-2019

hi you need to replace vlaues with list if you want non-unique values as well

index="_audit" | stats list(available_count) | rename list(available_count) as x | eval perc=0.1*mvcount(x) | mvexpand x | eventstats count(x) as maxcount | streamstats count as row | where row>perc | eval z=maxcount-row | where z>perc|reverse

zacksoft · ‎05-03-2019

The 'list' only returns 100 values. Is there anyway to override that to see all the values?

Sukisen1981 · ‎05-03-2019

Hi @zacksoft - Yes the list is limited to 100 values - https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/Multivaluefunctions

I honestly, did not think from the examples that you had provided that in any specific case your list of values will exceed 100 .
What is/could be the maximum number of values in your case?

Sukisen1981 · ‎05-03-2019

Try this - this should account for all values , not just the list limit of 100

index="_audit" | table available_count,_time
| where isnotnull(available_count) | rename available_count as x 
|  eventstats count(x) as maxcount
| eval perc=0.1*maxcount  | streamstats count as row | where row>perc | eval z=maxcount-row | where z>perc|reverse

zacksoft · ‎05-06-2019

@Sukisen1981 Thank you for so patiently assisting with this issue. Really appreciate it.
This is exactly what I wanted.
If I may add one small thing...."what if instead of dropping the tail 10 % of data, I choose to replace them with the next available data.
e.g. 1, 2, 2, 3, 4, 4, 4, 5, 20

The 10 % on each side is, 1 and 20.
So the updated data set to become 2, 2, 2, 3, 4, 4, 4, 5, 5. (I just replaced the each extreme 10 % with the nearest value). Could you please guide how to achieve this. Thank you.

Sukisen1981 · ‎05-06-2019

Hi @zacksoft - this looks a bit tough, I suggest asking a separate question on this in the forum. Meanwhile, I will try to find a solution,many apologies but office work calls 😞 😞

zacksoft · ‎05-07-2019

Sure @Sukisen1981 . If you get time.
Following is the link of the new question.
https://answers.splunk.com/answers/744080/winsorized-average-calculation.html?minQuestionBodyLength=...

Trimmed Average Calculation

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Splunk Observability for AI

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler

Are you a member of the Splunk Community?

Trimmed Average Calculation

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Splunk Observability for AI

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler