I am writing a search that looks at weighted moving averages of data points summarized and logged at 2 minute intervals. I need to bucket the data into two minute spans, in a window of ten minutes. The search will run every minute, and look at the past ten minutes worth of data, thus, there should always be five buckets of 2 minutes each. You'd think this would be as easy as:
earliest=-10m@m latest=@m *base_search* | bucket _time span=2m | stats xxx by _time
However, the bucket command (and timechart, etc) always make bucket boundaries snap to even numbered time boundaries, rather than being relative to the search time boundaries. To elaborate, if the search is made at 10:10:23, there are five buckets, for 10:00, 10:02, 10:04, 10:06, and 10:00, and if the search is run at 10:11:xx, there are six buckets: 10:00, 10:02 ... 10:10, with the first and last bucket containing one minutes' worth of data each (half the data).
What I think should happen with the 10:11 search is five buckets, the first being 10:01, then 10:03, etc. Has anyone found a way to do this that still lets them sleep at night? Maybe a call to eval that segments time similarly to the bucket command (could be a macro).
You could indeed work around this using eval:
earliest=-10m@m latest=@m *base_search* | addinfo | eval min_time = info_min_time | bucket span=2m info_min_time | eval info_min_time = strftime(info_min_time, "%H:%M:%S")| eval offset = min_time - info_min_time | eval _time=_time-offset | bucket span=2m _time | eval _time=_time+offset| stats xxx by _time
This is the search I used for testing my work:
sourcetype=access_combined earliest=-9m@m | addinfo | eval orig_time = strftime(_time, "%H:%M:%S")| eval min_time = info_min_time | bucket span=2m info_min_time | eval offset = min_time - info_min_time | eval _time=_time-offset| bucket span=2m _time | eval _time=_time+offset | eval min_time = strftime(min_time, "%H:%M:%S") | eval info_min_time = strftime(info_min_time, "%H:%M:%S")| table min_time info_min_time _time orig_time
You could indeed work around this using eval:
earliest=-10m@m latest=@m *base_search* | addinfo | eval min_time = info_min_time | bucket span=2m info_min_time | eval info_min_time = strftime(info_min_time, "%H:%M:%S")| eval offset = min_time - info_min_time | eval _time=_time-offset | bucket span=2m _time | eval _time=_time+offset| stats xxx by _time
This is the search I used for testing my work:
sourcetype=access_combined earliest=-9m@m | addinfo | eval orig_time = strftime(_time, "%H:%M:%S")| eval min_time = info_min_time | bucket span=2m info_min_time | eval offset = min_time - info_min_time | eval _time=_time-offset| bucket span=2m _time | eval _time=_time+offset | eval min_time = strftime(min_time, "%H:%M:%S") | eval info_min_time = strftime(info_min_time, "%H:%M:%S")| table min_time info_min_time _time orig_time
Thanks dart, this seems to work well for me. I'll try to make it into a macro so that the span time can be supplied as an argument and the macro used as a replacement for bucket
.
Just filed an ER for it.
Regardless of the actual answer, please file an enhancement request / bug report at http://www.splunk.com/support - this behavior is not intuitive...