host = Mayhem
sourcetype="phutans:servo" host=R00878
| eval headers=split(_raw," ")
| eval plant_length=mvindex(headers,10) | sort 0 plant_length
(I am trying to calculate mean plant_lenght here but in a different way)
Let me put it as example :
say we get 10 values of plant_length. 1, 2, 2, 3, 4, 4, 4, 5, 20
Ideally to get mean, we would do (1+2+2+3+4+4+4++5+20)/10
But I don't want it this way. What I am looking for is to drop 10% of data in each tail and divide by number of values left.
Like this : (2+2+3+4+4+4++5)/8
In the numerator here the bottom 10% data (1) and top 10% of data(20) is dropped , so we have 8 vales left, hence the denominator is 8.
(In the above example as we have 10 values so 10% of 10 becomes 1. Therefore 1 value is dropped from top end and 1 value from bottom end.) And after dropping 2 values the denominator has to be 8 , not 10)
I could use some help in implementing the above scenario.
Try this - index="_audit"
| stats values(available_count)
| rename values(available_count) as x
| eval perc=0.1*mvcount(x)
| mvexpand x
| eventstats count(x) as maxcount
| streamstats count as row
| where row>perc
| eval z=maxcount-row
| where z>perc|reverse
I have bulit out this on action_count on the _audit index, so the code works as is for you. Now I have renamed action_count as x and removed the top and bottom 10% values, the left over values of x is what you need to use to form your average.
WARNING - Run this for last 7 days to have a large data set AND the values keep updating real time so look at maxcount and perc. macount is your total at any point in time and perc is what you need to trim from the bottom and top at any point of time..hope it helps
Try this - index="_audit"
| stats values(available_count)
| rename values(available_count) as x
| eval perc=0.1*mvcount(x)
| mvexpand x
| eventstats count(x) as maxcount
| streamstats count as row
| where row>perc
| eval z=maxcount-row
| where z>perc|reverse
I have bulit out this on action_count on the _audit index, so the code works as is for you. Now I have renamed action_count as x and removed the top and bottom 10% values, the left over values of x is what you need to use to form your average.
WARNING - Run this for last 7 days to have a large data set AND the values keep updating real time so look at maxcount and perc. macount is your total at any point in time and perc is what you need to trim from the bottom and top at any point of time..hope it helps
stats value(available_count) ...gives me the unique entries only.
if I have six values such as 2,2,3,3,4,4. It only shows 2,3,4
hi you need to replace vlaues with list if you want non-unique values as well
index="_audit" | stats list(available_count) | rename list(available_count) as x | eval perc=0.1*mvcount(x) | mvexpand x | eventstats count(x) as maxcount | streamstats count as row | where row>perc | eval z=maxcount-row | where z>perc|reverse
The 'list' only returns 100 values. Is there anyway to override that to see all the values?
Hi @zacksoft - Yes the list is limited to 100 values - https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/Multivaluefunctions
I honestly, did not think from the examples that you had provided that in any specific case your list of values will exceed 100 .
What is/could be the maximum number of values in your case?
Try this - this should account for all values , not just the list limit of 100
index="_audit" | table available_count,_time
| where isnotnull(available_count) | rename available_count as x
| eventstats count(x) as maxcount
| eval perc=0.1*maxcount | streamstats count as row | where row>perc | eval z=maxcount-row | where z>perc|reverse
@Sukisen1981 Thank you for so patiently assisting with this issue. Really appreciate it.
This is exactly what I wanted.
If I may add one small thing...."what if instead of dropping the tail 10 % of data, I choose to replace them with the next available data.
e.g. 1, 2, 2, 3, 4, 4, 4, 5, 20
The 10 % on each side is, 1 and 20.
So the updated data set to become 2, 2, 2, 3, 4, 4, 4, 5, 5. (I just replaced the each extreme 10 % with the nearest value). Could you please guide how to achieve this. Thank you.
Hi @zacksoft - this looks a bit tough, I suggest asking a separate question on this in the forum. Meanwhile, I will try to find a solution,many apologies but office work calls 😞 😞
Sure @Sukisen1981 . If you get time.
Following is the link of the new question.
https://answers.splunk.com/answers/744080/winsorized-average-calculation.html?minQuestionBodyLength=...