Anybody familiar with a Splunk search command that does the equivalent of the Excel Histogram function that is found in the Analysys Toolpak addin (options --> addin). This Excel function calculates data bins based on the min and max values of a data set, it then shows a count per bin. For example. If you had 1000 data points from 4820 to 71750, this would be the bin breakdown (for 6 bins).
4820 if <= 4820
18206 if > 4820 & <= 18206
31592 if >18206 & <= 31592
44978 if > 31592 & <= 44978
58364 if > 44978 & <= 58364
71750 if > 58364 & <= 71750
Now, we would need to determine the count of data points for bin, so we add frequency...
4820 2 if <= 4820
18206 800 if > 4820 & <= 18206
31592 100 if >18206 & <= 31592
44978 47 if > 31592 & <= 44978
58364 50 if > 44978 & <= 58364
71750 1 if > 58364 & <= 71750
With Splunk I am able to determine the min, max, and bin values, but I am having trouble figuring out the best way to go back through the raw data to count events per bin. Not sure if there is a Splunk search command that will just do this or do I have to use a sub-search?
Thanks in advance for any help.
It looks like you want the buckets to be of even size. If for some reason you wanted to have more control over the bucket sizes/ranges (also useful to be able to lump outliers over a certain value into one "above X" bucket, you can do this, for example:
_your_search_ | fields some_field | eval some_field_group = case(some_field<50, "0-50ms", some_field<100, "50-100ms", some_field<150 , "100-150ms", some_field<200, "150-200ms", some_field<250, "200-250ms", some_field<500, "250-500ms", some_field<750, "500-750ms", some_field<1000, "750-1000ms", 1==1, ">1000ms" ) | chart count by some_field_group
... | chart count, max(datavalue), min(datavalue) over datavalue bins=6
... | chart count(datavalue), max(datavalue), min(datavalue) over datavalue bins=6
bins= can be replaced with fixed or logarithmic bucket sizes as well, see this for bucketing options: http://docs.splunk.com/Documentation/Splunk/5.0/SearchReference/Bucket
bucket command is what you're looking for. Given the following search command (that generates synthetic data):
An auto-extracted field is called
position, which is a monotonically increasing number from 0 to 99. If you want to create 5 buckets of 20 numbers each:
| windbag | bucket position span=20 | stats count by position
This produces a table of values like:
position count 0-20 20 20-40 20 40-60 20 60-80 20 60-100 20
bucket command also handles time natively, so you can specify a relative time for the
| bucket _time span=5m
Yes, the bins generated will always be spaced such that the min and max values of whatever field of interest is covered. Is there some specific method of binning that you're looking for?