Splunk Search

Summary Index & sistats

sranga
Path Finder

Hi

I am seeing some weirdness with one of the saved-searches that we have. One of these searches is of the form:

... | bucket span=1h _time | sistats median(field1), avg(field1) by _time,field2,field3  

When I ran this query for the last 60 minutes, it generated a lot of results (close to 295K). When I ran the same query using the stats command instead, I got close to 160 results.

I am not sure why this is happening. We are using version 4.0.10. Thanks for your help.

Ranga

0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

field1 is probably a numeric field with many distinct values. If you use sistats to calculate a median or other percentiles on such a field (or a distinct_count), you will wind up with a very large summary index, at least one entry for every distinct value. (In your case you probably have about 1600 distinct values, each accounting for about 160 entries.) This is correct, and is the only way to summarize a median in such a way that an aggregated median can be reduced from this.

If you don't want this, then you should first "bucket" field1. this will reduce the size of your summary, at some slight cost in the accuracy of aggregated medians.

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

field1 is probably a numeric field with many distinct values. If you use sistats to calculate a median or other percentiles on such a field (or a distinct_count), you will wind up with a very large summary index, at least one entry for every distinct value. (In your case you probably have about 1600 distinct values, each accounting for about 160 entries.) This is correct, and is the only way to summarize a median in such a way that an aggregated median can be reduced from this.

If you don't want this, then you should first "bucket" field1. this will reduce the size of your summary, at some slight cost in the accuracy of aggregated medians.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

It occurs to me that you can use the function makecontinuous instead of bucket, and that should give you better results.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

that's because bucket creates a range for non-time values, not a single value. you need to eval it apart into two numbers and decide what you want to represent the values within the range.

0 Karma

sranga
Path Finder

I tried adding bucket bins=50 field1 before the sistats command, but that results in the Median & Average values not getting calculated.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...

Customer success is front and center at .conf25

Hi Splunkers, If you are not able to be at .conf25 in person, you can still learn about all the latest news ...