<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Winsorized Average Calculation in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441540#M171476</link>
    <description>&lt;P&gt;you can test for values 1-20, replace 1-10 with 1-20 (remember the space between each number &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; ) and to make the change in 3 places.&lt;BR /&gt;
The code works, but it is most probably an issue with your index data the parts before where I start doing the real stuff...but I am sure this will give you a good idea of what to do&lt;/P&gt;</description>
    <pubDate>Tue, 21 May 2019 16:46:56 GMT</pubDate>
    <dc:creator>Sukisen1981</dc:creator>
    <dc:date>2019-05-21T16:46:56Z</dc:date>
    <item>
      <title>Winsorized Average Calculation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441532#M171468</link>
      <description>&lt;PRE&gt;&lt;CODE&gt; host = Mayhem
 sourcetype="phutans:servo" host=R00878
 | eval headers=split(_raw," ")
 | eval plant_length=mvindex(headers,10) | sort 0 plant_length
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;(I am trying to calculate mean plant_lenght here but in a different way)&lt;/P&gt;

&lt;P&gt;Let me put it as example :&lt;BR /&gt;
say we get 10 values of plant_length. 1, 2, 2, 3, 4, 4, 4, 5, 20&lt;BR /&gt;
Ideally to get mean, we would do (1+2+2+3+4+4+4++5+20)/10&lt;BR /&gt;
But I don't want it this way. What I am looking is to identify 10% of data in each tail.&lt;BR /&gt;
 And then to replace them with the next available data.&lt;BR /&gt;
e.g. 1, 2, 2, 3, 4, 4, 4, 5, 20 &lt;BR /&gt;
The 10 % on each side is, 1 and 20.&lt;BR /&gt;
So the updated data set to become 2, 2, 2, 3, 4, 4, 4, 5, 5. (I just replaced the each extreme 10 % with the nearest value).&lt;BR /&gt;
And the new average should be calculated like (2+2+2+3+4+4+4++5+5)/10&lt;BR /&gt;
 Could you please guide how to achieve this. Thank you. &lt;/P&gt;

&lt;P&gt;I could use some help in implementing the above scenario.&lt;/P&gt;</description>
      <pubDate>Tue, 07 May 2019 14:29:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441532#M171468</guid>
      <dc:creator>zacksoft</dc:creator>
      <dc:date>2019-05-07T14:29:06Z</dc:date>
    </item>
    <item>
      <title>Re: Winsorized Average Calculation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441533#M171469</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/65483"&gt;@zacksoft&lt;/a&gt; &lt;BR /&gt;
I am back again &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;BR /&gt;
Try this - &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="_audit" | table date_minute,_time
  | where isnotnull(date_minute) | rename date_minute as x 
  |  eventstats count(x) as maxcount
  | eval perc=0.1*maxcount  | streamstats count as row  | eval z=maxcount-row
 | eval diff=row-perc
 | eval diff1=z-perc  | where diff &amp;lt;=perc 
  | where diff &amp;gt;0 
 | append 
     [search index="_audit" | table date_minute,_time
  | where isnotnull(date_minute) | rename date_minute as x 
  |  eventstats count(x) as maxcount
  | eval perc=0.1*maxcount  | streamstats count as row | where row&amp;gt;perc | eval z=maxcount-row | where z&amp;gt;=perc]
 | append 
     [search 
 index="_audit" | table date_minute,_time
  | where isnotnull(date_minute) | rename date_minute as x 
  |  eventstats count(x) as maxcount
  | eval perc=0.1*maxcount  | streamstats count as row  | eval z=maxcount-row
 | eval diff=row-perc
 | eval diff1=z-perc
  | where diff1&amp;lt;perc
 | where diff1&amp;gt;=0]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Once again, run this on the _aidt index before implementing on your actual data.&lt;BR /&gt;
NOTE - I have used  the date_minute field to simulate number values.&lt;BR /&gt;
It would be good if you can run this on a fixed time range , something like yesterday 2-3 PM or something like that, so that the data becomes easier to verify. Remember the first field x is the only one we are interested in, the rest of the fields are added for verification purposes only.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 00:29:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441533#M171469</guid>
      <dc:creator>Sukisen1981</dc:creator>
      <dc:date>2020-09-30T00:29:23Z</dc:date>
    </item>
    <item>
      <title>Re: Winsorized Average Calculation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441534#M171470</link>
      <description>&lt;P&gt;hi @zacksoft - Were you able to test this out?&lt;/P&gt;</description>
      <pubDate>Wed, 15 May 2019 12:28:08 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441534#M171470</guid>
      <dc:creator>Sukisen1981</dc:creator>
      <dc:date>2019-05-15T12:28:08Z</dc:date>
    </item>
    <item>
      <title>Re: Winsorized Average Calculation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441535#M171471</link>
      <description>&lt;P&gt;Not yet . I will test it today and will let you know by tomorrow.&lt;/P&gt;</description>
      <pubDate>Wed, 15 May 2019 13:25:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441535#M171471</guid>
      <dc:creator>zacksoft</dc:creator>
      <dc:date>2019-05-15T13:25:42Z</dc:date>
    </item>
    <item>
      <title>Re: Winsorized Average Calculation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441536#M171472</link>
      <description>&lt;P&gt;Does the value of 'x' contain the nearest value after dropping 10 % of extreme values on both end ?  Which line in the query is confirming that ?&lt;BR /&gt;
Also , I added | sort 0 x  after |  rename command as the values are only to be trimmed after they are sorted. Could you please clarify &lt;/P&gt;</description>
      <pubDate>Thu, 16 May 2019 12:37:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441536#M171472</guid>
      <dc:creator>zacksoft</dc:creator>
      <dc:date>2019-05-16T12:37:35Z</dc:date>
    </item>
    <item>
      <title>Re: Winsorized Average Calculation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441537#M171473</link>
      <description>&lt;P&gt;Hi @zacksoft &lt;BR /&gt;
I have modified the query above. Here is what I understood from your requirements. Firstly, I will try to explain what I have done with the bottom(tail) values for x.&lt;BR /&gt;
this screen shot is an as is run on the _audit index without any trims being done&lt;BR /&gt;
&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/6989i7C7C95E80E2B3E8C/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;This has total 36 events (ref. - column maxcount), trimming 10% of this at the tail means trimming 3.6 , ~ 4 tail rows.&lt;BR /&gt;
So, my understanding is here we need to trim the rows with values for x=41,41,30.30 AND replace them with the immediate 4 rows above. So the last 4 rows should become x=0,0,45,45 AND this will be the same as the 4 rows above that. In essence we will have 8 rows , with sets of values repeating for 4 rows. That means that the last 8 rows become, x=0,0,45,45,0,0,45,45. &lt;BR /&gt;
Is this correct?&lt;BR /&gt;
If it is then refer to this screen shot.&lt;BR /&gt;
&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/6990i20C99E81717C9BD3/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;BR /&gt;
see how the last 8 rows now repeat in sets of 4, is this correct? The same logic has been applied to the first 4(or &lt;span class="lia-unicode-emoji" title=":smiling_face_with_sunglasses:"&gt;😎&lt;/span&gt; rows.&lt;BR /&gt;
Now, on the query -&lt;BR /&gt;
The query actually consists of 3 queries, the first main one followed by the 2 appends.&lt;BR /&gt;
a) The first append or the second query is just trimming off the top and bottom 10%(4 rows) in this example.&lt;BR /&gt;
b) the main search implements the logic for removing the head(top) 4 rows and replacing them by the first 4 rows in the second(first append) query.&lt;BR /&gt;
c) I suggest you test it like this - use this query first.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;index="_audit" | table date_minute,_time
  | where isnotnull(date_minute) | rename date_minute as x 
  |  eventstats count(x) as maxcount
  | eval perc=0.1*maxcount  | streamstats count as row  | eval z=maxcount-row
 | eval diff=row-perc
 | eval diff1=z-perc  
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Here, i merely removed the where conditions from the main search (the top/head) query part. You can then manually see what applying the 2 where conditions ( | where diff &amp;lt;=perc  | where diff &amp;gt;0 ) would do to the overall result.&lt;BR /&gt;
d) exact same goes for the bottom/tail (second append) query.&lt;BR /&gt;
Do not get confused by the query. Look at it as 3 parts, the middle part does the trim of the top/bottom 10% and the first and third queries append the values from the middle query, based on the number of rows as per the 10% rule.&lt;/P&gt;</description>
      <pubDate>Sat, 18 May 2019 14:59:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441537#M171473</guid>
      <dc:creator>Sukisen1981</dc:creator>
      <dc:date>2019-05-18T14:59:16Z</dc:date>
    </item>
    <item>
      <title>Re: Winsorized Average Calculation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441538#M171474</link>
      <description>&lt;P&gt;@Sukisen1981   - Thanks for the response.&lt;BR /&gt;
After sorting the values ,  the 10% isn't ONLY to be removed from the tail end.  10% of the head end also to be removed.&lt;BR /&gt;
like ,    1 2 3 4 5 6 7 8 9 10&lt;BR /&gt;
Lets say I have above 10 values)for simplicity I am considering 1 through 10). So when we say trimming the 10 % on each end it means, the value 1 from left and 10 from right will be removed.  .&lt;/P&gt;

&lt;P&gt;Lets move to the part where we understand what values to be replaced in place of "1"  and "10".&lt;BR /&gt;
After trimming 10% on both end , the nearest value of the trimmed number is to be replaced with.&lt;BR /&gt;
Meaning the new set will be , 2 2 3 4 5 6 7 8 9 9 &lt;/P&gt;

&lt;P&gt;The average now will be calculated as (2+2+3+4+5+6+7+8+9+9)/10&lt;/P&gt;

&lt;P&gt;I hope I didn't confuse you.&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2019 16:10:53 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441538#M171474</guid>
      <dc:creator>zacksoft</dc:creator>
      <dc:date>2019-05-21T16:10:53Z</dc:date>
    </item>
    <item>
      <title>Re: Winsorized Average Calculation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441539#M171475</link>
      <description>&lt;P&gt;Hi @zacksoft &lt;BR /&gt;
Yes this works..to avoid index confusion I have now hard coded the values using makeresults.&lt;BR /&gt;
I have tested for date_minute =1-10 and 1-20. Remember we have to change this in 3 places while testing, main query, first append and second append.&lt;BR /&gt;
Try this :&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults
| eval date_minute="1 2 3 4 5 6 7 8 9 10" | eval date_minute=split(date_minute," ")
| mvexpand date_minute
|  table date_minute,_time
   | where isnotnull(date_minute) | rename date_minute as x 
   |  eventstats count(x) as maxcount
   | eval perc=0.1*maxcount  | streamstats count as row  | eval z=maxcount-row
  | eval diff=row-perc
  | eval diff1=z-perc  | where diff &amp;lt;=perc 
   | where diff &amp;gt;0 
  | append 
      [ | makeresults
| eval date_minute="1 2 3 4 5 6 7 8 9 10" | eval date_minute=split(date_minute," ")
| mvexpand date_minute
|  table date_minute,_time
   | where isnotnull(date_minute) | rename date_minute as x 
   |  eventstats count(x) as maxcount
   | eval perc=0.1*maxcount  | streamstats count as row | where row&amp;gt;perc | eval z=maxcount-row | where z&amp;gt;=perc]
  | append 
      [ | makeresults
| eval date_minute="1 2 3 4 5 6 7 8 9 10" | eval date_minute=split(date_minute," ")
| mvexpand date_minute
|  table date_minute,_time
   | where isnotnull(date_minute) | rename date_minute as x 
   |  eventstats count(x) as maxcount
   | eval perc=0.1*maxcount  | streamstats count as row  | eval z=maxcount-row
  | eval diff=row-perc
  | eval diff1=z-perc
   | where diff1&amp;lt;perc
  | where diff1&amp;gt;=0]
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Here is the screen shot of the output..&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/6991i7D23D9570256E58D/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2019 16:45:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441539#M171475</guid>
      <dc:creator>Sukisen1981</dc:creator>
      <dc:date>2019-05-21T16:45:04Z</dc:date>
    </item>
    <item>
      <title>Re: Winsorized Average Calculation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441540#M171476</link>
      <description>&lt;P&gt;you can test for values 1-20, replace 1-10 with 1-20 (remember the space between each number &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; ) and to make the change in 3 places.&lt;BR /&gt;
The code works, but it is most probably an issue with your index data the parts before where I start doing the real stuff...but I am sure this will give you a good idea of what to do&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2019 16:46:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441540#M171476</guid>
      <dc:creator>Sukisen1981</dc:creator>
      <dc:date>2019-05-21T16:46:56Z</dc:date>
    </item>
    <item>
      <title>Re: Winsorized Average Calculation</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441541#M171477</link>
      <description>&lt;P&gt;Thanks @Sukisen1981  for your time. The solution helps.&lt;/P&gt;</description>
      <pubDate>Wed, 22 May 2019 08:05:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Winsorized-Average-Calculation/m-p/441541#M171477</guid>
      <dc:creator>zacksoft</dc:creator>
      <dc:date>2019-05-22T08:05:17Z</dc:date>
    </item>
  </channel>
</rss>

