Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results forย

Splunk Search

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results forย

- Community
- :
- Splunk Answers
- :
- Using Splunk
- :
- Splunk Search
- :
- Find the average of the first 99% range of data

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark Topic
- Subscribe to Topic
- Mute Topic
- Printer Friendly Page

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

sepkarimpour

Path Finder

โ08-25-2017
04:15 AM

My search currently gives me some statistics regarding response times including total count, average, min, max and 99% percentile value (I'm assuming that this is ordered) based on one of the fields from the query.

I need to find the average query count for the first 99% of my data count per query name, so my logic was to take the minimum value from the 99% percentile value, and dividing that by 99% of the total count. But when I run this, I often get values for this not in the range between minimum and maximum for certain values, **e.g. Average: 15533.5, Min: 9076, Max: 24737, *99% Average: 479.0***, which clearly wouldn't make sense even if the numbers are heavily skewed.

This was my initial search attempt:

index=* ...

| stats count AS "Query Count", avg(ResponseTime) AS "AvgRespTime", min(ResponseTime) AS "MinRespTime", max(ResponseTime) AS "MaxRespTime", p99(ResponseTime) AS "99PercRespTime" by *queryName
| eval AvgRespTime=round(AvgRespTime,1), 99PercAvgRespTime=('99PercRespTime'-'MinRespTime')/(0.99*'Query Count'), 99PercAvgRespTime=round('99PercAvgRespTime',1)

| sort - 99PercAvgRespTime

*EDIT: I've seen a flaw in my initial idea where I'm taking single values rather than sums, which are how you get averages (oops) - what I'd ideally want is to take the sum of the 99th percentile (and above) value(s) away from the total sum and divide that by the count of queries minus the count of queries above (and including) the 99th percentile. *

(1) Is it even possible to accurately get the average of the first 99% of the data? My logic was that by removing the last 1% of queries (so if it was 1000, the 10 queries with the highest response times), I can get this average but what should the query be? I considered taking 99% average, but I assume that would remove 0.5% from the lower end as well as the upper end, which is what I've been asked to avoid.

(2) Is there a way to show skewness of data based off statistics on Splunk? If possible, I'd want to show a graph or at least data showing this if possible.

1 Solution

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

sepkarimpour

Path Finder

โ09-12-2017
01:52 AM

I ended up doing it so it was just the values less than or equal to the 99th percentile value for 'x':

index=* ... "OK"

| eventstats perc99(ResponseTime) as p99Resp by x

| where ResponseTime<=p99Resp

| stats min(ResponseTime) as p98ResponseMin, max(ResponseTime) as p98ResponseMax, avg(ResponseTime) as p98AvgRespTime by x

For average, I imagine using an 'eval' to get the value may be necessary.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

sepkarimpour

Path Finder

โ09-12-2017
01:52 AM

I ended up doing it so it was just the values less than or equal to the 99th percentile value for 'x':

index=* ... "OK"

| eventstats perc99(ResponseTime) as p99Resp by x

| where ResponseTime<=p99Resp

| stats min(ResponseTime) as p98ResponseMin, max(ResponseTime) as p98ResponseMax, avg(ResponseTime) as p98AvgRespTime by x

For average, I imagine using an 'eval' to get the value may be necessary.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

DalJeanis

SplunkTrust

โ08-25-2017
11:08 AM

It's not very difficult, just use eventstats to get the 99the percentile and then act accordingly

```
your query that gets the data by queryName
| fields ResponseTime queryName
| eventstats perc99(ResponseTime) as p99Resp by queryName
| where ResponseTime<=p99Resp
| rename COMMENT as "the above gives you only those records which are below the 99th percentile"
| rename COMMENT as "you can now analyze them by themselves"
| stats count as P98Count, avg(ResponseTime) as p98ResponseAvg, stdev(ResponseTime) as p98ResponseStdev,
min(ResponseTime) as p98ResponseMin, max(ResponseTime) as p98ResponseMax
by queryName
```

...where p98 is the population below the 99th percentile.

...or...

```
your query that gets the data by queryName
| fields ResponseTime queryName
| eventstats perc99(ResponseTime) as p99Resp by queryName
| eval p00Unit=1
| eval p98Unit=if(ResponseTime<=p99Resp,1,0)
| eval p99Unit=if(ResponseTime<=p99Resp,0,1)
| eval p00Response=ResponseTime
| eval p98Response=if(ResponseTime<=p99Resp,ResponseTime,null())
| eval p99Response=if(ResponseTime<=p99Resp,null(),ResponseTime)
| stats sum(p00Unit) as P00Count, avg(p00Response) as p00ResponseAvg, stdev(p00Response) as p00ResponseStdev,
min(p00Response) as p00ResponseMin, max(p00Response) as p00ResponseMax,
sum(p98Unit) as P98Count, avg(p98Response) as p98ResponseAvg, stdev(p98Response) as p98ResponseStdev,
min(p98Response) as p98ResponseMin, max(p98Response) as p98ResponseMax,
sum(p99Unit) as P99Count, avg(p99Response) as p99ResponseAvg, stdev(p99Response) as p99ResponseStdev,
min(p99Response) as p99ResponseMin, max(p99Response) as p99ResponseMax
by queryName
```

...where p00 is the entire population, p98 is the population below the 99th percentile, and p99 is the population above the 99th percentile.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

sepkarimpour

Path Finder

โ08-29-2017
02:13 AM

I tweaked the first query so it'd work with the initial search - I was running a comparison of the query counts and I can see whilst some of the query names that have fewer queries don't lose any, some queries are losing more than 1% from the top (e.g. the query with the highest count initially has a count of 152638, but when running your version, it falls to 149949, which is a drop of around 1.8%) - is this expected?

Also, is there a way I can still display the 100% values for certain items, i.e. Query Count, Min and Max but show the 99% Average Response as well? I'd like to show that although some queries may take longer (as outiers), the majority (99%) of searches have a low response time, but it's more of a preference.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

DalJeanis

SplunkTrust

โ09-12-2017
08:15 AM

@sepkarimpour - yes, that result is mathematically possible, especially if the top end of the data has lots of clumping. To make up a simple example, let's suppose the data was one each of the numbers from 1 to 97, then three 100s. the 99th percentile is 100, so only 97% of the data are below the 99th percentile.

Yes, you can do all kinds of things like that easily. I already did that for you in the last example, where P00 is the whole data, p98 is the data below the 99th, and p99 is the data in the 99th.

You can also use the percentile aggregate functions p25(), p50()/median(), and p75() against the p00 data if you'd like, to show the quartiles.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

DalJeanis

SplunkTrust

โ08-25-2017
10:36 AM

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

sepkarimpour

Path Finder

โ08-29-2017
02:17 AM

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

woodcock

Esteemed Legend

โ08-25-2017
06:58 AM

`Event Sampling`

feature that is a blue link on the left side towards the middle under the histogram/timeline and above the `Event/Patterns/Statistics/Visualization`

tabs.

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

sepkarimpour

Path Finder

โ08-25-2017
07:11 AM

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Richfez

SplunkTrust

โ08-25-2017
06:06 AM

- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

sepkarimpour

Path Finder

โ08-25-2017
06:33 AM