How to dynamically set the global search bin span ...

Graham_Hanningt · ‎04-20-2016

I have a dashboard that contains multiple timecharts. (Splunk Enterprise 6.4.)

All of the timecharts present performance metrics from the same events, in the same time range. For example: average CPU time, average response time, count (number of events).

Currently, each timechart has its own self-contained search string.

I want to "save search resources by creating a base search", as described in the Splunk docs.

Due to the documented limitations of post-process searches, I think I ought to set a bin span in the base ("global") search, to limit the number of search results that the base search passes to the post-process search for each timechart, because I want this dashboard to work for any time range: from several years, to fractions of a second.

Hence this question: I want the base search to dynamically set a bin span that is proportional to the time range set by the time range picker. I don't yet have a clear notion of what spans to set for different time ranges, but for a time range of a few years, it might be something like 7 days. The bigger the time range, the bigger the span.

What's the best way to do this?

I'm prepared to investigate an answer myself, but it occurs to me that this might be a common requirement, and I'd prefer not to reinvent the wheel (especially not one that I later discover is inferior to an existing wheel 🙂 ).

A closely related follow-on issue, that probably deserves its own, separate question: when I zoom a timechart, I want (as expressed by others in existing questions) the rest of the dashboard (including the time range picker and other timecharts) to update to match that zoomed time range. But I want something more (which might happen anyway, depending on the implementation of the "dynamic bin based on time range" solution): I want the base search to adjust its bin span to match the zoomed time range.

If all this sounds too complicated, here's my requirement in a nutshell: when I zoom, I want to see more detail.

martin_mueller · ‎04-21-2016

For dynamic sizing of bucket spans, use the bins parameter. Splunk will then use as small a span as possible while not exceeding the number of bins you specified and using a "pretty" span. IIRC Splunk considers these spans to be "pretty": 1s, 5s, 10s, 30s, 1m, 5m, 10m, 30m, 1h, 1d, 1mon.

So for example if you set bins=500 and run a search over 30 days, you will get 30 one-day buckets because one-hour buckets would require 720 of them.

somesoni2 · ‎04-20-2016

The Splunk does select (dynamically) span for timechart based on selected timerange automatically if no span is specified. Was that something didn't work for you correctly (like all panels showing different span)?

Graham_Hanningt · ‎04-21-2016

Thanks for the quick response. Yes, you're absolutely right: timechart dynamically selects the span, and all panels in my dashboard show the same span. Works great.

What I'm concerned about is when I change those (currently self-contained, standalone) search strings for each timechart to use a base search, and then I specify a time range that encompasses a few million events, thereby hitting the limits described in that Splunk doc topic I cited.

That's when I think I'll need to worry about explicitly setting a bin span and aggregating in the base search, so that the base search isn't attempting to pass millions of search results to the post-process searches.

And since the aggregation will be done in the base search, I wonder whether to continue using timechart, but that introduces other issues.

For example, here's a base search (note: with a fixed bin span; not what I want to end up with):

sourcetype=my_log_type | bin _time span=5s | stats avg(response) as avg_response, avg(cpu) as avg_cpu by transaction_code, _time

(in practice, the stats function would process more fields; I've deliberately kept this example short, averaging only two fields.)

and here are two corresponding post-process search strings. A timechart:

timechart avg(avg_cpu) by transaction_code

and an xyseries:

xyseries _time, transaction_code, avg_response

Using xyseries avoids "double aggregation" (averaging averages, producing incorrect results). However, xyseries doesn't "span": it marks every discrete time value on the X-axis with the complete time stamp value (optionally truncated); but if those values don't fit along the X-axis, you'll get no time values displayed on the X-axis at all. Whereas timechart does a good job of marking the time on the X-axis at readable intervals.

I'd appreciate advice on all of this. The more I look at that (bogus) avg(avg_cpu) in my example timechart (above), and then consider the cons of using xyseries instead, the more I think I'm making some fundamental error in my approach to managing that (understandable) "bottleneck" limitation of passing a huge number of search results from the base search to the post-process searches.

Graham_Hanningt · ‎04-21-2016

I've belatedly seen the question "How do I get the time span (span=X) in a search to automatically adjust depending on the time picker...", which I think answers my original question (I'll try it tomorrow), but now I'm mired in the follow-on issues, mentioned in my previous comment, of aggregating over time in the base search, and then charting selected values in the post-process searches.

How to dynamically set the global search bin span based on the time range picker?

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life