Splunk Search

Equivalent Search Command for Excel Histogram (analysis toolpak addin)

kbecker
Communicator

Anybody familiar with a Splunk search command that does the equivalent of the Excel Histogram function that is found in the Analysys Toolpak addin (options --> addin). This Excel function calculates data bins based on the min and max values of a data set, it then shows a count per bin. For example. If you had 1000 data points from 4820 to 71750, this would be the bin breakdown (for 6 bins).

Bins
4820 if <= 4820

18206 if > 4820 & <= 18206

31592 if >18206 & <= 31592

44978 if > 31592 & <= 44978

58364 if > 44978 & <= 58364

71750 if > 58364 & <= 71750

Now, we would need to determine the count of data points for bin, so we add frequency...

Bin Frequency

4820 2 if <= 4820

18206 800 if > 4820 & <= 18206

31592 100 if >18206 & <= 31592

44978 47 if > 31592 & <= 44978

58364 50 if > 44978 & <= 58364

71750 1 if > 58364 & <= 71750

With Splunk I am able to determine the min, max, and bin values, but I am having trouble figuring out the best way to go back through the raw data to count events per bin. Not sure if there is a Splunk search command that will just do this or do I have to use a sub-search?

Thanks in advance for any help.

Glenn
Builder

It looks like you want the buckets to be of even size. If for some reason you wanted to have more control over the bucket sizes/ranges (also useful to be able to lump outliers over a certain value into one "above X" bucket, you can do this, for example:

_your_search_ | fields some_field | eval some_field_group = case(some_field<50, "0-50ms", some_field<100, "50-100ms", some_field<150 , "100-150ms", some_field<200, "150-200ms", some_field<250, "200-250ms", some_field<500, "250-500ms", some_field<750, "500-750ms", some_field<1000, "750-1000ms", 1==1, ">1000ms" ) | chart count by some_field_group

gkanapathy
Splunk Employee
Splunk Employee

... | chart count, max(datavalue), min(datavalue) over datavalue bins=6

or

... | chart count(datavalue), max(datavalue), min(datavalue) over datavalue bins=6

The bins= can be replaced with fixed or logarithmic bucket sizes as well, see this for bucketing options: http://docs.splunk.com/Documentation/Splunk/5.0/SearchReference/Bucket

Johnvey
Contributor

Yes, the bucket command is what you're looking for. Given the following search command (that generates synthetic data):

| windbag

An auto-extracted field is called position, which is a monotonically increasing number from 0 to 99. If you want to create 5 buckets of 20 numbers each:

| windbag | bucket position span=20 | stats count by position

This produces a table of values like:

position    count
0-20        20
20-40       20
40-60       20
60-80       20
60-100      20

The bucket command also handles time natively, so you can specify a relative time for the span argument:

| bucket _time span=5m

You can read more about the bucket command and about relative time identifiers.

Johnvey
Contributor

Yes, the bins generated will always be spaced such that the min and max values of whatever field of interest is covered. Is there some specific method of binning that you're looking for?

0 Karma

kbecker
Communicator

Is there a way for Splunk to calculate the bins based on the min and max values of the position value?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...