Deployment Architecture

Why are bucket times expanding?

ericl42
Path Finder

I've read a few other forum posts with similar issues but I never found a true solution for. Overall I'm trying to mock up some correlation rules within Enterprise Security where my time frame is going to be -5h@h to -1h@h. I want to make this into two bucks so I can compare two hour time frames against one another.

I continually get 3 buckets even though there should only be two.

In my most recent test I did this search:

(index=os_windows* OR index=os_unix*) (source=WinEventLog:Security OR sourcetype=linux_secure OR tag=authentication) action=failure NOT Result_Code=0x17 NOT Account_Name="*$"  earliest=-5h@h latest=-1h@h
| bucket _time span=2h 
| stats values(user) AS affected_users, values(ComputerName) as dc, dc(user) AS num_users count BY src_ip _time 
| where num_users > 3

When I look at the results, I see multiple src_ip's that have a 9:00, 11:00, and 13:00 row. It's currently 2:30 pm so the breakdown is:

2:30 = current time
1:30 = -1h@h
12:30 = -2h@h
11:30 = -3h@h
10:30 = -4h@h
09:30 = -5h@h

So I should have a 9:00 - 11:00 bucket and an 11:00 - 1:00 bucket. I have no idea why it's also showing me a 13:00 bucket in my search results. This is throwing off my math since that number is quite a bit different as I assume it's not the full hour and it's not snapping correctly or something.

1 Solution

woodcock
Esteemed Legend

OK, I finally took the time to try this and I know what is happening. Splunk is "too smart" here for its own good. It knows that there are an even number of hours in a day so when you tell it to bin _time span=2h, the buckets that it automatically creates fall on even-hour boundaries. In your case, when you are displaying odd-numbered times, such as from 1PM-5PM, it is creating 3 even-hour-based buckets: 12-2, 2-4, 4-6, instead of 2 odd-based buckets: 1-3, 3-5. I believe that @rich7177 has experimented and pontificated at length about this and perhaps he will post a comment or a link. In any case, now that I understand the fundamental nature of the problem, I believe that it can be addressed like this:

((index="os_windows*" AND source=WinEventLog:Security) OR (index="os_unix*" ANDsourcetype=linux_secure) OR tag=authentication) AND action=failure  AND NOT (Result_Code="0x17" OR Account_Name="*$") earliest=-5h@h latest=-1h@h
| bucket _time span=2h aligntime=-1h@h
| stats values(user) AS affected_users, values(ComputerName) AS dc, dc(user) AS num_users count BY src_ip _time 
| where num_users > 3

Alternatively you might consider starting over and using a sliding window instead of a discretized window with streamstats time_window=2h.

View solution in original post

woodcock
Esteemed Legend

OK, I finally took the time to try this and I know what is happening. Splunk is "too smart" here for its own good. It knows that there are an even number of hours in a day so when you tell it to bin _time span=2h, the buckets that it automatically creates fall on even-hour boundaries. In your case, when you are displaying odd-numbered times, such as from 1PM-5PM, it is creating 3 even-hour-based buckets: 12-2, 2-4, 4-6, instead of 2 odd-based buckets: 1-3, 3-5. I believe that @rich7177 has experimented and pontificated at length about this and perhaps he will post a comment or a link. In any case, now that I understand the fundamental nature of the problem, I believe that it can be addressed like this:

((index="os_windows*" AND source=WinEventLog:Security) OR (index="os_unix*" ANDsourcetype=linux_secure) OR tag=authentication) AND action=failure  AND NOT (Result_Code="0x17" OR Account_Name="*$") earliest=-5h@h latest=-1h@h
| bucket _time span=2h aligntime=-1h@h
| stats values(user) AS affected_users, values(ComputerName) AS dc, dc(user) AS num_users count BY src_ip _time 
| where num_users > 3

Alternatively you might consider starting over and using a sliding window instead of a discretized window with streamstats time_window=2h.

ericl42
Path Finder

Thank you very much. Yesterday before this post I accidentally started testing with aligntime and it seemed to fix the issue but I wasn't 100% why. I don't think I can use sliding windows because I'm mocking all of these rules up for ES correlation searches.

0 Karma

to4kawa
Ultra Champion
 (index=os_windows* OR index=os_unix*) (source=WinEventLog:Security OR sourcetype=linux_secure OR tag=authentication) action=failure NOT Result_Code=0x17 NOT Account_Name="*$"  earliest=-5h@h latest=-1h@h
| timechart span=2h values(user) AS affected_users, values(ComputerName) as dc, dc(user) AS num_users count BY src_ip
| where num_users > 3

Hi, How about obediently timechart?

0 Karma

ericl42
Path Finder

Timechart may work for this one scenario, but I have others where I count by multiple fields and timechart only allows me to do one.

0 Karma

to4kawa
Ultra Champion

Like makeresults , bin seems to make the last time when it makes time.

If you really need two,

 (index=os_windows* OR index=os_unix*) (source=WinEventLog:Security OR sourcetype=linux_secure OR tag=authentication) action=failure NOT Result_Code=0x17 NOT Account_Name="*$"  earliest=-5h@h latest=-1h@h
 | addinfo
 | eval sessionId = if(_time < relative_time(info_min_time,"+2h"),1,2)
 | stats earliest(_time) as _time values(user) AS affected_users, values(ComputerName) as dc, dc(user) AS num_users count BY src_ip sessionId
 | where num_users > 3

With this query, the search period is divided into the first 2 hours and the rest, and the results are displayed.

0 Karma

ericl42
Path Finder

Thanks for the response. After digging around a little, I think I may have fixed the issue by adding the aligntime portion. However I'll take a look at your new query as well.

| bucket _time span=2h@h aligntime=-1h@h

0 Karma

to4kawa
Ultra Champion
2:30 = current time
1:00 = -1h@h
12:00 = -2h@h
11:00 = -3h@h
10:00 = -4h@h
09:00 = -5h@h

Hi, @h is offset at the beginning of the time, so this is correct.

0 Karma

woodcock
Esteemed Legend

Try this instead:

 ... | bucket _time bins=2

Also, BE SURE TO SET YOUR PERSONAL Time zone setting: Your Name Here -> Preferences -> Time zone.
This looks like a bug and I would open a support ticket for sure.
You can add this to the end:

... | where _time >= relative_time(now(), "-5h@h") AND _time <= relative_time(now(), "-1h@h")
0 Karma

ericl42
Path Finder

Thanks for the quick response. I like the concept of bins so I always know it's two items I'm comparing against vs. potentially three. I tried this on on my query and the time just now says 2019-11-22 and doesn't have another hour or delimiter. So basically even though I said 2 bins, I'm only seeing one row per user ID.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...