Splunk Search

Reliable way to specify span to bucket numeric values

sideview
SplunkTrust
SplunkTrust

OK. A bit of a journey here. I am searching for a good reliable method of bucketing numeric field values into categorical ranges, with no a-priori knowledge of what the range actually is, and preferably where I can specify based on the number of bins rather than with explicit span.

First step, bin command (aka bucket) The problem here is that bin doesn't represent the intervening buckets for which no data exists. Seen from the perspective of the individual search commands, the behavior makes sense. But it's unacceptable for use in charts.

<search terms> | bin foo bins=10 | stats count by foo 

alt text

Next stop: chart command. It can actually do this... sometimes. timechart can have a "bins" or "span" command not just in the beginning to control _time, but also after the split by field, to control on-the-fly bucketing of a numeric split-by. Chart command is even nuttier, allowing on the fly numeric-to-categorical bucketing for both the group-by and the split-by field. Here it is in action:

<search terms> | chart bins=10 count over foo 

And it works! Yay. The intervening buckets of 1-2, 2-3, 3-4 all show up in this particular case.

Except... in most cases this feature completely loses its mind. Here is a good illustration. It's quite interesting behavior but unfortunately answers will only let me post two images so I have tried to find a representatively nutty one for you:

<search terms> | chart bins=50 count over foo 

alt text

So... Is there any explanation for what makes chart command unreliable here, and is there a way to beat it with a stick to make it reliably bucket things? Or has anyone found a good flexible way to add in the empty buckets that bin/bucket omits?

Bad solutions:
- In other questions I've seen makecontinuous put forward as a solution, but much effort seems to suggest that makecontinuous does not do anything useful, at least not in 6.2. =/ If you can make makecontinuous solve this though, please tell me how.
- Various tricks that allow for | append [| stats count | eval foo=mvrange(0,50) ] | stats sum(count) as count by foo to glue on a dummy row for each value being bucketed. While these work for particular cases, it's super clunky and I believe all such tricks require a-priori knowledge that I don't have.

martin_mueller
SplunkTrust
SplunkTrust

Here's how you'd use makecontinuous:

| stats count | eval foo = mvrange(0,100) | mvexpand foo | eval foo = random()%100 | rename comment as "dummy"
| bin bins=100 foo | stats count by foo | makecontinuous foo | fillnull count

That'll create the gaps in foo, and fillnull lets the count be zero for those gaps. Works fine in 6.2.3.

Do you have dummy data to reproduce the wonky chart? Attempting something similar suggests chart doesn't even fill in the blanks:

| stats count | eval foo = mvrange(0,100) | mvexpand foo | eval foo = (random()%500)/100
| chart bins=50 count over foo

alt text

martin_mueller
SplunkTrust
SplunkTrust

I think the reporting-makecontinuous order on your second example is wrong, but it still goes bananas:

| stats count | eval foo=mvrange(1,10) | mvexpand foo | eval foo=random()%10 | bin foo bins=200 | stats count by foo | makecontinuous foo

I've observed something even more weird: If you run the search up to bin|stats and then do |loadjob | makecontinuous it seems to behave differently, most filled-in buckets seem fine but there seems to be some added 6.9-7.0 buckets instead of some 7.0-7.1 buckets...

alt text

sideview
SplunkTrust
SplunkTrust

Thanks for your answer. I understand makecontinuous better now and didn't realize it will only have a useful effect on incoming rows whose values are already bucketed, "1-5","6-10", etc..

However, digging deeper, makecontinuous has the same behavior as chart, and the key is that both of them lose their minds if the bucket size becomes less than 1.

Here's a canned example of chart working perfectly, bucketing correctly, and putting in the intervening empty buckets -- | stats count | eval foo=mvrange(1,10) | mvexpand foo | eval foo=random()%10 | chart count over foo bins=10 | sort foo

and here's a canned example of makecontinuous losing it's mind. | stats count | eval foo=mvrange(1,10) | mvexpand foo | eval foo=random()%10 | bin foo bins=200 | makecontinuous foo | chart count over foo | sort foo

I suppose, in theory I could multiply the values by some very large power of 10, then use something truly evil like rex mode=sed to move the decimal point back over. !! Say it aint so. There has to be a better way. Unfortunately here I definitely have to support bucketing ranges into buckets of size<1. eg: many variables I need to bucket for my users in this reporting UI are floats in [0,1].

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...