Splunk Search

How do we overcome the shortcomings of "bin" when trying to find X number of events in a certain time period?

jadamsplunk
Path Finder

Hello all,

I've had this issue in the past but never really spent the time to find a solution as bin is usually "good enough." Take the following data set:

12:00:00 -- id = A
12:01:00 -- id = B
12:02:00 -- id = A
12:03:00 -- id = A
12:04:00 -- id = A
12:05:00 -- id = A
12:06:00 -- id = A
12:07:00 -- id = A
12:08:00 -- id = A
12:09:00 -- id = A
12:10:00 -- id = B
12:11:00 -- id = A

If I did a |bin _time span=5m | stats count by idI'd get something like:

12:00:00| A : 4 | B : 1
12:05:00| A : 5 | B : 0
12:10:00| A : 1 | B : 1

And if then wanted to find a period where I had 5 or more As I'd have 1 period. If I wanted a period with 4 or more As I'd get 2 periods. Here's a very advanced diagram to illustrate the point:

alt text

Using what I'm intending I would have 8 periods with A>=4 and 4 periods with A>=5

I've tried stuff like transaction with maxspan but that doesn't seem to work and streamstats has been giving me issues when I'm not using only one field. bin and span seem to do the job most of the time but I feel like a lot of interesting data can be missed simply because the data happens to fall across the binning period.

More severely if it the following happened and I was looking for A>=5 I would get no results!

12:00:00 -- id = B
12:01:00 -- id = B
12:02:00 -- id = B
12:03:00 -- id = A
12:04:00 -- id = A
12:05:00 -- id = A
12:06:00 -- id = A
12:07:00 -- id = A
12:08:00 -- id = B
12:09:00 -- id = B
12:10:00 -- id = B
12:11:00 -- id = B
0 Karma

to4kawa
Ultra Champion
| makeresults count=2 
| streamstats count 
| eval _time=relative_time(_time,-1*count."d@d") 
| makecontinuous _time span=1min
| eval id=mvindex(split("ABC",""),random() % 3)
| table _time id 
| streamstats dc(eval(id="A")) as p_test by id 
| streamstats window=5 sum(p_test) as window_num count as tmp 
| streamstats count(eval(tmp="5")) as session 
| table _time id window_num session 
| eventstats count(eval(window_num=4)) as "A>=4" count(eval(window_num=5)) as "A>=5"

From the table, time_window does not work.
Please set _time appropriately.

0 Karma

jimodonald
Contributor

Streamstats seems to work the way you are thinking.

<your search> | reverse| streamstats time_window=5m count by id | table _time id count

I put the reverse in to get the events in the right order. It produces this outcome which looks like it lines up with your expectations.

_time   id  count
2020-01-09 12:00:00 A   1
2020-01-09 12:01:00 B   1
2020-01-09 12:02:00 A   2
2020-01-09 12:03:00 A   3
2020-01-09 12:04:00 A   4
2020-01-09 12:05:00 A   4
2020-01-09 12:06:00 A   5
2020-01-09 12:07:00 A   5
2020-01-09 12:08:00 A   5
2020-01-09 12:09:00 A   5
2020-01-09 12:10:00 B   1
2020-01-09 12:11:00 A   4
0 Karma

jadamsplunk
Path Finder

That works if you've filtered down to a single entity with those values, but say I have that dataset 3 or 4 times in roughly the same time period for other entities. Those events intermingle with the other data and I think streamstats ends up just resetting basically every event. At the very least I have not had any success.

0 Karma

jimodonald
Contributor

It seems to work pretty consistently for me for both values, but I'm using pretty small samples (~150 events).

You may need to tweak the time_window, for the counting, but give it a try.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...