Subsecond minspan in auto-span timechart?

Graham_Hanningt · ‎09-13-2017

(How) can I create an auto-span timechart that has a subsecond minimum span, such as 0.001s?

Background to this question

My dashboards show logs from systems that process many transactions per second. I have deliberately designed the dashboards to "work" regardless of the duration specified by the time picker. Timecharts "auto span": they automatically infer span from the data time range. (The X-axis of time-based charts automatically adjusts to match the duration.) These dashboards can be used to analyze logs across a wide variety of durations. In practice, users might initially be interested in a time range of several minutes, and then zoom in to analyze progressively narrower time ranges, down to subsecond time ranges.

As far as I can tell, auto-span timecharts don't create buckets smaller than one second.

Specifying minspan=1ms on my timechart command seems to have no effect.

Here, I am (re)developing in Splunk some dashboards that I have already developed in Kibana (5.5).

In Kibana, I can zoom time-based charts to a minimum span of 0.001s. For example, I can zoom so far in that buckets are marked on the X-axis as 14:34:45.242, 14:34:45.243, 14:34:45.244, etc.

In Splunk, the granularity of auto-span timecharts "bottoms out" at 1 second; If I have a time range of 2 seconds, I'm looking at two buckets. Two fat bars. That's disappointing. Am I missing something? Can I somehow increase this granularity to match what I can do in Kibana?

rjthibod · ‎09-15-2017

You can use a subsearch in the timechart command to simulate what you are doing. Like this,

| timechart [stats count | addinfo | eval range = info_max_time - info_min_time | eval span = "span=" . case(range < 24*3600+3600, "30m", range < 7*24*3600+3600, "2h", 1=1, "4h") | return $span] useother=f limit=10 sum(alert_logon) by domain

The addinfo command adds fields info_max_time and info_min_time to a result set, indicating the epoch second values of latest and earliest for your search. This subsearch calculates their difference (i.e., search duration) and sets the span value based on my predetermined breakpoints in, e.g., less than 25 hours, set span to "30m"; less than one week and an hour, set span to "4h", etc.

If you are going to do this repeatedly, you can put this subsearch in a macro, and then just reference the macro,

 | timechart `dynamic_span` useother=f limit=10 sum(alert_logon) by domain

Or you can do a manual input and then set the input based on a search like you have it.

Here is a blog post I did with more info: https://blog.octoinsight.com/customizing-dynamic-time-spans-in-splunk-dashboards/

Graham_Hanningt · ‎09-17-2017

Hi @rjthibod,

Thanks very much for sticking with me across questions...

With apologies, I'm unlikely to have time to try your answer today (I want a few minutes spare to look at it properly), but I will as soon as I can.

One fatal flaw that I have realized with my approach (my answer)—and one reason I want to test your answer carefully—is that the size of the buckets in my approach are based on the earliest and latest records that exist in the time range, which is not the same thing as the earliest and latest times of the time range; in some instances, depending on the available records, not even close.

rjthibod · ‎09-25-2017

Any update on if this worked for you or not?

Graham_Hanningt · ‎09-15-2017

Answering my own question...

In brief

I'm using a case function to set a token that, depending on the time range, specifies either an empty string (to rely on auto-spanning) or a span attribute:

eval span=case(diff>20," ",diff>1,"span=50ms",diff>0.01,"span=10ms",true(),"span=1ms")

where diff is the difference in seconds between the earliest and latest records in the time range.

The "breakpoints" in the case function are an experimental work in progress. I might change these breakpoints, and add more.

In detail

I use the following search(es):

<search id="base_metrics">
  <query>index=my_index sourcetype=my_sourcetype | stats count, earliest(_time) as earliest, latest(_time) as latest</query>
  <earliest>$earliest$</earliest>
  <latest>$latest$</latest>
</search>
<search base="base_metrics">
  <query>eval startTime=strftime(earliest, "%x %H:%M:%S"), endTime=if(strftime(earliest, "%x")=strftime(latest, "%x"), strftime(latest, "%H:%M:%S"), strftime(latest, "%x %H:%M:%S")), diff=(latest-earliest) | eval hours=floor(diff/3600) | eval minutes=floor((diff-(hours*3600))/60) | eval seconds=floor(diff-(hours*3600)-(minutes*60)) | eval duration=hours." hour".if(hours>1,"s "," ").minutes." minute".if(minutes>1,"s "," ").seconds." second".if(seconds>1,"s",""), span=case(diff>20," ",diff>1,"span=50ms",diff>0.01,"span=10ms",true(),"span=1ms")</query>
  <progress>
    <condition match="'job.resultCount' > 0">
      <set token="startTime">$result.startTime$</set>
      <set token="endTime">$result.endTime$</set>
      <set token="duration">$result.duration$</set>
      <set token="eventCount">$result.count$</set>
      <set token="span">$result.span$</set>
    </condition>
    <condition>
      <unset token="startTime"></unset>
      <unset token="endTime"></unset>
      <unset token="duration"></unset>
      <unset token="eventCount"></unset>
      <unset token="span"></unset>
    </condition>
  </progress>
</search>

Here, I've included tokens that are not required for this particular solution (sorry, I couldn't be bothered stripping them out for this answer), which I use in the following HTML panel:

<html depends="$eventCount$,$duration$,$startTime$,$endTime$">
  $eventCount$ events spanning $duration$ ($startTime$ to $endTime$)
</html>

I use the span token in timechart commands:

timechart $span$ ...

Graham_Hanningt · ‎09-14-2017

I'm going to try setting a token with a value based on the time range, and then injecting the token into the timechart command:

For time ranges down to a few seconds—say, 20 seconds—set the token to an empty string (let the timechart "auto-span")
For narrower time ranges, take control of spanning: set the token to span=<milliseconds>ms, where <milliseconds> is a number of milliseconds, calculated from the time range, that results in a "reasonable" number of buckets

Graham_Hanningt · ‎09-14-2017

Dang. span values must be integers. So my calculation needs to evaluate out which subsecond unit to use: us | ms | cs | ds. Not rocket science, but a tad tedious.

Graham_Hanningt · ‎09-14-2017

There's more:

Error in 'timechart' command: The value for option span (741ms) is invalid. When span is expressed using a sub-second unit (ds, cs, ms, us), the span value needs to be < 1 second, and 1 second must be evenly divisible by the span value.

To recap: Kibana zooms auto-interval time-based charts down to 1ms buckets. Out of the box.

Graham_Hanningt · ‎09-14-2017

I don't really fancy implementing a GCD function just to do this.

Graham_Hanningt · ‎09-14-2017

I'm going to fall back to using a case function.

MuS · ‎09-14-2017

Hi Graham_Hannington,

have you tried something like this:

index=_internal earliest=-10sec@sec latest=-1sec@sec
| bin _time span=1ms 
| chart count over _time by sourcetype

This produces the same output as timechart is doing, but you have more control on the time span.

cheers, MuS

Graham_Hanningt · ‎09-14-2017

Hi @MuS,

Thanks for the tip. Nope, I've not tried that. From the Splunk docs for bin:

The bin command is automatically called by the chart and the timechart commands

I'd thought, perhaps naively, that specifying span via timechart gave me the same control as specifying span directly via bin; that timechart "passed through" the span value to bin "under the covers".