Splunk Search

How can I keep only n% of results from a search?

nonspecialist
New Member

I have a set of web page performance measurements spanning quite some time, generated by an external monitoring provider. I want to be able to find the mean page performance after removing spikes caused by external factors out of our control, and am thinking along the lines of using a truncated mean as a best measure of central tendency but am having problems with the implementation.

Here's my thinking so far:

  • find all page render times for the past 7 days
  • order by render time
  • remove the top and bottom 2.5%
  • calculate truncated mean from remaining values

I can calculate how many values I should be removing easily, but can't work out how to actually remove them. If there's a better way, I'd love to know it!

My query string (not yet working properly) so far is:

startdaysago=7 monitorid=<foo> | eventstats count(rendertime) as nresults | eval nkeep=nresults-ceil(nresults*0.05) | sort 0 -rendertime | head nkeep

but of course head can't take a parameter that's not an integer.

0 Karma
1 Solution

southeringtonp
Motivator

Have you considered using outlier to get rid of the edge cases?

     http://www.splunk.com/base/Documentation/4.1.5/SearchReference/Outlier


Alternately, how about this:

startdaysago=7 monitorid=<foo> 
| eventstats count(rendertime) as nresults
| eval low_clipping=(nresults*0.025)
| eval high_clipping=nresults-low_clipping
| sort rendertime
| streamstats count as sequence_number
| where sequence_number>low_clipping AND sequence_number<high_clipping

View solution in original post

southeringtonp
Motivator

Have you considered using outlier to get rid of the edge cases?

     http://www.splunk.com/base/Documentation/4.1.5/SearchReference/Outlier


Alternately, how about this:

startdaysago=7 monitorid=<foo> 
| eventstats count(rendertime) as nresults
| eval low_clipping=(nresults*0.025)
| eval high_clipping=nresults-low_clipping
| sort rendertime
| streamstats count as sequence_number
| where sequence_number>low_clipping AND sequence_number<high_clipping

nonspecialist
New Member

Awesome! I hadn't managed to find any reasonable examples of 'where', but that's exactly what I need. Thanks!

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...