Splunk Search

## How do we overcome the shortcomings of "bin" when trying to find X number of events in a certain time period?

Path Finder

Hello all,

I've had this issue in the past but never really spent the time to find a solution as bin is usually "good enough." Take the following data set:

12:00:00 -- id = A
12:01:00 -- id = B
12:02:00 -- id = A
12:03:00 -- id = A
12:04:00 -- id = A
12:05:00 -- id = A
12:06:00 -- id = A
12:07:00 -- id = A
12:08:00 -- id = A
12:09:00 -- id = A
12:10:00 -- id = B
12:11:00 -- id = A

If I did a |bin _time span=5m | stats count by idI'd get something like:

12:00:00| A : 4 | B : 1
12:05:00| A : 5 | B : 0
12:10:00| A : 1 | B : 1

And if then wanted to find a period where I had 5 or more As I'd have 1 period. If I wanted a period with 4 or more As I'd get 2 periods. Here's a very advanced diagram to illustrate the point:

Using what I'm intending I would have 8 periods with A>=4 and 4 periods with A>=5

I've tried stuff like transaction with maxspan but that doesn't seem to work and streamstats has been giving me issues when I'm not using only one field. bin and span seem to do the job most of the time but I feel like a lot of interesting data can be missed simply because the data happens to fall across the binning period.

More severely if it the following happened and I was looking for A>=5 I would get no results!

12:00:00 -- id = B
12:01:00 -- id = B
12:02:00 -- id = B
12:03:00 -- id = A
12:04:00 -- id = A
12:05:00 -- id = A
12:06:00 -- id = A
12:07:00 -- id = A
12:08:00 -- id = B
12:09:00 -- id = B
12:10:00 -- id = B
12:11:00 -- id = B
Tags (5)
Ultra Champion
| makeresults count=2
| streamstats count
| eval _time=relative_time(_time,-1*count."d@d")
| makecontinuous _time span=1min
| eval id=mvindex(split("ABC",""),random() % 3)
| table _time id
| streamstats dc(eval(id="A")) as p_test by id
| streamstats window=5 sum(p_test) as window_num count as tmp
| streamstats count(eval(tmp="5")) as session
| table _time id window_num session
| eventstats count(eval(window_num=4)) as "A>=4" count(eval(window_num=5)) as "A>=5"

From the table, time_window does not work.

Contributor

Streamstats seems to work the way you are thinking.

<your search> | reverse| streamstats time_window=5m count by id | table _time id count

I put the reverse in to get the events in the right order. It produces this outcome which looks like it lines up with your expectations.

_time   id  count
2020-01-09 12:00:00 A   1
2020-01-09 12:01:00 B   1
2020-01-09 12:02:00 A   2
2020-01-09 12:03:00 A   3
2020-01-09 12:04:00 A   4
2020-01-09 12:05:00 A   4
2020-01-09 12:06:00 A   5
2020-01-09 12:07:00 A   5
2020-01-09 12:08:00 A   5
2020-01-09 12:09:00 A   5
2020-01-09 12:10:00 B   1
2020-01-09 12:11:00 A   4
Path Finder

That works if you've filtered down to a single entity with those values, but say I have that dataset 3 or 4 times in roughly the same time period for other entities. Those events intermingle with the other data and I think streamstats ends up just resetting basically every event. At the very least I have not had any success.

Contributor

It seems to work pretty consistently for me for both values, but I'm using pretty small samples (~150 events).

You may need to tweak the time_window, for the counting, but give it a try.

Get Updates on the Splunk Community!

#### What’s New in Splunk Cloud Platform 9.1.2308?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2308! Analysts can ...

#### Index This | Why do they call it hyper text?

November 2023 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

#### State of Splunk Careers 2023: Career Resilience and the Continued Value of Splunk

For the past three years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...