Deployment Architecture

How to produce empty time buckets

yuanliu
SplunkTrust
SplunkTrust

In order to perform fast Fourier transform (FFT), I need data from equal time intervals. Here is my first attempt:
| bucket span=5m _time | stats count as COUNT by _time
The idea is to use bins for sampling. However, this doesn't work when no events occur in some bins. Splunk will not output rows for any bin in which COUNT = 0 because _raw doesn't contain such data. Is there some way to force/coerce Splunk into producing empty time buckets?

FFT (in R app) is the most hopeful tool for spectrum analysis. Before I explore interpolation in an external tool, I'd like to see if there is a direct Splunk method. I understand that there may not be a meaningful way to fill empty buckets in general. But calendar time is defined in the system, so there should be an easy way to force it.

I looked at zero-count discussions. One post suggests a subsearch over dense data, i.e., data sets that have non-zero counts in each of desired time bucket. Whereas in my range this is possible, this appears to be a really expensive way to implement a simple calendaring function.

Tags (3)
1 Solution

somesoni2
Revered Legend

Try this workaround

your base search | timechart span=1h count | appendpipe [|stats count |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp | eval count=0 | eval _time=temp | table _time count] | timechart span=1h sum(count) as count | fillnull

View solution in original post

landen99
Motivator

In order to create zero values in each time bucket, you need append and stats/eventstats. I will introduce a slightly more complicated search to demonstrate the full power of this approach: the sum of the events for each value in "field1" every 5 minutes (even if there are no events from up to all but one of the values in "field1"):

yoursearch | bucket span=5m _time | stats sum(events) as events by _time field1

Add append, but be wary that the number of events in the subsearch does not expend the subsearch limit:

yoursearch | bucket span=5m _time | stats sum(events) as events by _time field1 | append [yoursearch | bucket span=5m _time | table _time field1 | eventstats values(field1) AS field1 values(_time) AS times | stats count by field1 times | eval events=0 | rename times AS _time] | stats max(events) AS events by _time field1

This approach assumes that every _time 5 minute span contains contains data from at least one value in field1. If this is not the case, then add another data source with values in all time span periods, and then rename the field to field1 at the start of your append, and then remove that value from field1 at the end of your append subsearch.

somesoni2
Revered Legend

Try this workaround

your base search | timechart span=1h count | appendpipe [|stats count |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp | eval count=0 | eval _time=temp | table _time count] | timechart span=1h sum(count) as count | fillnull

nouraali
Explorer

Hi ,

given the below input (4 mins of sample access log data):
_time,URI,Bytes
2021-05-18 02:01:00,a,1
2021-05-18 02:01:00,a,1
2021-05-18 02:02:00,a,1
2021-05-18 02:03:00,b,1
2021-05-18 02:03:00,b,1
2021-05-18 02:04:00,a,1
assuming a window of 2 mins, i want to perform some computations (average and standard dev of bytes grouped by URI) as below
source="ds1.csv" host="vgspl11hr" index="sfp" sourcetype="csv"
| table _time,URI,Bytes
| timechart span=1m avg(Bytes) AS avg_bytes, stdev(Bytes) AS std_bytes by URI limit=0
| fillnull value=""
| untable _time Measure Value
| eval Metric=mvindex(split(Measure,": "),0),uri=mvindex(split(Measure,": "),1)
| fields - Measure
| eval time_uri=_time."__".uri
| fields - uri - _time
| xyseries time_uri Metric Value
| eval _time=mvindex(split(time_uri,"__"),0),uri=mvindex(split(time_uri,"__"),1)
| fields - time_uri
with 2-min time window between (5/18/21 2:01:00.000 AM to 5/18/21 2:03:00.000 AM), below is the output:
_time	            uri	avg_bytes	std_bytes
2021-05-18 02:01:00 a 1 0
2021-05-18 02:02:00 a 1 0
So, the timechart performed the computations on the existing URIs in the first 2 mins time window, in that case the URI=a.
but i want the timechart to consider the existence of the URI = b.
Is there a way to have the timechart consider all the values of the URI in the computation, even if not all of the URI have values in the time window?
I need the output to be as below in the first 2 mins time window:
_time	            uri	avg_bytes	std_bytes
2021-05-18 02:01:00 a 1 0
2021-05-18 02:01:00 b
2021-05-18 02:02:00 a 1 0
2021-05-18 02:02:00 b
Is that possible?

0 Karma

patrickprodoehl
Explorer

Thanks for sharing.

Why is the last | fillnull needed though?

I removed is and the _time bins are still filled with 0 values and interpreted correctly in the dashboard.

 

 

0 Karma

briancronrath
Contributor

This was pretty helpful to get me going, thought I'd share my finished product w/ others if they might find it useful. What I needed to do was get data for each minutely timebucket in a stats call of mine, but I had multiple fields that I was pivoting on so the solution had to be expanded upon a bit. Here's what I did:

base search |bucket _time span=1m| stats sum(field1) as field1 avg(field2) as field2 by field3 field4 _time| appendpipe [|inputlookup field3_lookup |table field3 | dedup field3 |stats values(field3)  |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp  | eval _time=temp | timechart span=1m values(field3) as field3 | eval field3=mvjoin(field3,";") | streamstats last(field3) as field3 | makemv delim=";" field3 | mvexpand field3 | eval field1=0 | eval field2=0 | eval field4="possibility1;possibility2;possiblity3" | makemv field4 delim=";"| mvexpand field4]

The lookup contains all possible values for field3, so basically filling out all possible values that could show up in the base stats call, then afterward further down the line not pictured I do an sistats call that gets me everything I want and now has "0" values for time buckets that otherwise wouldn't show up for certain variants of field3 and field4

jbuckner85
Path Finder

This solution worked for me, thank you briancronrath. One thing that I had to do was ad an additional AS with the portion which says

|stats values(field3) AS field3

[|inputlookup field3_lookup |table field3 | dedup field3 |stats values(field3) AS field3 |addinfo | eval temp=info_min_time."##".info_max_time | makemv temp delim="##" | mvexpand temp  | eval _time=temp | timechart span=1m values(field3) as field3 | eval field3=mvjoin(field3,";") | streamstats last(field3) as field3 | makemv delim=";" field3 | mvexpand field3 | eval field1=0 | eval field2=0 | eval field4="possibility1;possibility2;possiblity3" | makemv field4 delim=";"| mvexpand field4]
0 Karma

yuanliu
SplunkTrust
SplunkTrust

The subsearch only uses two lines produced by calendar, thus is super efficient. Great lesson about makemv. Not that it matters, but in the interest of minimalism this takes a couple less evals:

| timechart span=1h count
| appendpipe [
| stats count
| addinfo
| eval temp=info_min_time."##".info_max_time
| fields temp count
| makemv temp delim="##"
| mvexpand temp
| rename temp as _time
] | timechart span=1h max(count) as count
| fillnull

csendekw
Explorer

Is there a way to do this when I want to split? When I add "by host" to my timechart this answer stops working. 😞

0 Karma

landen99
Motivator

try my solution below.

0 Karma

csendekw
Explorer

I don't understand how to try your solution at all. What is field1? Is that arbitrary? I'm not looking at fields in my search, only event counts. Additionally you state that it only works if there is at least one event from a value in a field. that statement doesn't even make sense at all to me :(. events from a value? My search is basically (60 minute window):
index=blah eventtype=error | timechart span=1m count by host

0 Karma

landen99
Motivator

timechart is very unideal for this case. Try my solution below with append, stats, and eventstats.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

timechart has a very undesirable feature for this purpose: It insists on one extra data point beyond the specified timespan.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Subsearch using calendar. This fills all, even missing start and end buckets. Brilliant!

yuanliu
SplunkTrust
SplunkTrust

Yes, that's what I need.

0 Karma

strive
Influencer

when raw data itself is not for some buckets, you need those buckets as well. is that right understanding?

0 Karma

Suda
Communicator

Hello,

Could you try to use "timechart" instead of "bucket"?

(your search) | timechart span=5m count AS COUNT

You may also get COUNT=0 events. I hope it helps you.

Thank you.

yuanliu
SplunkTrust
SplunkTrust

timechart works, as long as start and end buckets have data. Thanks.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...