Solved: How can I keep only n% of results from a search?

nonspecialist · ‎10-07-2010

I have a set of web page performance measurements spanning quite some time, generated by an external monitoring provider. I want to be able to find the mean page performance after removing spikes caused by external factors out of our control, and am thinking along the lines of using a truncated mean as a best measure of central tendency but am having problems with the implementation.

Here's my thinking so far:

find all page render times for the past 7 days
order by render time
remove the top and bottom 2.5%
calculate truncated mean from remaining values

I can calculate how many values I should be removing easily, but can't work out how to actually remove them. If there's a better way, I'd love to know it!

My query string (not yet working properly) so far is:

startdaysago=7 monitorid=<foo> | eventstats count(rendertime) as nresults | eval nkeep=nresults-ceil(nresults*0.05) | sort 0 -rendertime | head nkeep

but of course head can't take a parameter that's not an integer.

southeringtonp · ‎10-07-2010

Have you considered using outlier to get rid of the edge cases?

http://www.splunk.com/base/Documentation/4.1.5/SearchReference/Outlier

Alternately, how about this:

startdaysago=7 monitorid=<foo> 
| eventstats count(rendertime) as nresults
| eval low_clipping=(nresults*0.025)
| eval high_clipping=nresults-low_clipping
| sort rendertime
| streamstats count as sequence_number
| where sequence_number>low_clipping AND sequence_number<high_clipping

View solution in original post

southeringtonp · ‎10-07-2010

Have you considered using outlier to get rid of the edge cases?

http://www.splunk.com/base/Documentation/4.1.5/SearchReference/Outlier

Alternately, how about this:

startdaysago=7 monitorid=<foo> 
| eventstats count(rendertime) as nresults
| eval low_clipping=(nresults*0.025)
| eval high_clipping=nresults-low_clipping
| sort rendertime
| streamstats count as sequence_number
| where sequence_number>low_clipping AND sequence_number<high_clipping

nonspecialist · ‎10-08-2010

Awesome! I hadn't managed to find any reasonable examples of 'where', but that's exactly what I need. Thanks!

How can I keep only n% of results from a search?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

How can I keep only n% of results from a search?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers