Deployment Architecture

How to produce empty time buckets

Builder

In order to perform fast Fourier transform (FFT), I need data from equal time intervals. Here is my first attempt:
| bucket span=5m _time | stats count as COUNT by _time
The idea is to use bins for sampling. However, this doesn't work when no events occur in some bins. Splunk will not output rows for any bin in which COUNT = 0 because _raw doesn't contain such data. Is there some way to force/coerce Splunk into producing empty time buckets?

FFT (in R app) is the most hopeful tool for spectrum analysis. Before I explore interpolation in an external tool, I'd like to see if there is a direct Splunk method. I understand that there may not be a meaningful way to fill empty buckets in general. But calendar time is defined in the system, so there should be an easy way to force it.

I looked at zero-count discussions. One post suggests a subsearch over dense data, i.e., data sets that have non-zero counts in each of desired time bucket. Whereas in my range this is possible, this appears to be a really expensive way to implement a simple calendaring function.

Tags (3)
1 Solution

SplunkTrust
SplunkTrust

Try this workaround

your base search | timechart span=1h count | appendpipe [|stats count |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp | eval count=0 | eval _time=temp | table _time count] | timechart span=1h sum(count) as count | fillnull

View solution in original post

Motivator

In order to create zero values in each time bucket, you need append and stats/eventstats. I will introduce a slightly more complicated search to demonstrate the full power of this approach: the sum of the events for each value in "field1" every 5 minutes (even if there are no events from up to all but one of the values in "field1"):

yoursearch | bucket span=5m _time | stats sum(events) as events by _time field1

Add append, but be wary that the number of events in the subsearch does not expend the subsearch limit:

yoursearch | bucket span=5m _time | stats sum(events) as events by _time field1 | append [yoursearch | bucket span=5m _time | table _time field1 | eventstats values(field1) AS field1 values(_time) AS times | stats count by field1 times | eval events=0 | rename times AS _time] | stats max(events) AS events by _time field1

This approach assumes that every _time 5 minute span contains contains data from at least one value in field1. If this is not the case, then add another data source with values in all time span periods, and then rename the field to field1 at the start of your append, and then remove that value from field1 at the end of your append subsearch.

SplunkTrust
SplunkTrust

Try this workaround

your base search | timechart span=1h count | appendpipe [|stats count |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp | eval count=0 | eval _time=temp | table _time count] | timechart span=1h sum(count) as count | fillnull

View solution in original post

Contributor

This was pretty helpful to get me going, thought I'd share my finished product w/ others if they might find it useful. What I needed to do was get data for each minutely timebucket in a stats call of mine, but I had multiple fields that I was pivoting on so the solution had to be expanded upon a bit. Here's what I did:

base search |bucket _time span=1m| stats sum(field1) as field1 avg(field2) as field2 by field3 field4 _time| appendpipe [|inputlookup field3_lookup |table field3 | dedup field3 |stats values(field3)  |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp  | eval _time=temp | timechart span=1m values(field3) as field3 | eval field3=mvjoin(field3,";") | streamstats last(field3) as field3 | makemv delim=";" field3 | mvexpand field3 | eval field1=0 | eval field2=0 | eval field4="possibility1;possibility2;possiblity3" | makemv field4 delim=";"| mvexpand field4]

The lookup contains all possible values for field3, so basically filling out all possible values that could show up in the base stats call, then afterward further down the line not pictured I do an sistats call that gets me everything I want and now has "0" values for time buckets that otherwise wouldn't show up for certain variants of field3 and field4

Explorer

This solution worked for me, thank you briancronrath. One thing that I had to do was ad an additional AS with the portion which says

|stats values(field3) AS field3

[|inputlookup field3_lookup |table field3 | dedup field3 |stats values(field3) AS field3 |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp  | eval _time=temp | timechart span=1m values(field3) as field3 | eval field3=mvjoin(field3,";") | streamstats last(field3) as field3 | makemv delim=";" field3 | mvexpand field3 | eval field1=0 | eval field2=0 | eval field4="possibility1;possibility2;possiblity3" | makemv field4 delim=";"| mvexpand field4]
0 Karma

Builder

The subsearch only uses two lines produced by calendar, thus is super efficient. Great lesson about makemv. Not that it matters, but in the interest of minimalism this takes a couple less evals:

| timechart span=1h count
| appendpipe [
| stats count
| addinfo
| eval temp=info_min_time."##".info_max_time
| fields temp count
| makemv temp delim="##"
| mvexpand temp
| rename temp as _time
] | timechart span=1h max(count) as count
| fillnull

Explorer

Is there a way to do this when I want to split? When I add "by host" to my timechart this answer stops working. 😞

0 Karma

Motivator

try my solution below.

0 Karma

Explorer

I don't understand how to try your solution at all. What is field1? Is that arbitrary? I'm not looking at fields in my search, only event counts. Additionally you state that it only works if there is at least one event from a value in a field. that statement doesn't even make sense at all to me :(. events from a value? My search is basically (60 minute window):
index=blah eventtype=error | timechart span=1m count by host

0 Karma

Motivator

timechart is very unideal for this case. Try my solution below with append, stats, and eventstats.

0 Karma

Builder

timechart has a very undesirable feature for this purpose: It insists on one extra data point beyond the specified timespan.

0 Karma

Builder

Subsearch using calendar. This fills all, even missing start and end buckets. Brilliant!

Builder

Yes, that's what I need.

0 Karma

Influencer

when raw data itself is not for some buckets, you need those buckets as well. is that right understanding?

0 Karma

Communicator

Hello,

Could you try to use "timechart" instead of "bucket"?

(your search) | timechart span=5m count AS COUNT

You may also get COUNT=0 events. I hope it helps you.

Thank you.

Builder

timechart works, as long as start and end buckets have data. Thanks.

0 Karma