Alerting

How to create a real-time conditional alert by matching the results of events in 2 different rolling window

poddura
Observer

Hi,
My scenario is that I have a set of commands and I have total hits & total failures for a command in last 30 mins.
Let's say Command A has got 100 hits and out of it 30 got failed in last 30 mins now I want to check the same total hits & total failures of the same command previous 30 mins and if I see same then I want to check for more previous 30 mins and if I see same kind of failure % then I want to trigger an alert.

How can I do this in splunk?

Labels (1)
0 Karma

DalJeanis
Legend

Okay, first, if you're looking at 30m increments, you are probably not looking for a real time search. How fast will the person have to respond? What is the actual SLA? if they don't have to respond to an alert within 5m, then you want a scheduled search.

Second, is your 30 minute window a rolling window, or a fixed window?

It's expensive to go back and do things a second or third time. Just get the data all at the same time. What I would tend to do for what you talked about is this -

 your search that gets the events for the last 90 minutes

| rename COMMENT as "divide up the three time periods"
| addinfo 
| eval timeframe= ceiling((_time - info_min_time)/1800)

| rename COMMENT as "set up all the fields you need to stats the three periods"
| command = (whatever the command was)
| errorMessage = coalesce( whatever the error message was, "(NONE)")
| stats count as totalCount  by command errorMessage timeframe 

Now you have records for each combination of time period, command and error message, with "(NONE)" for records with no errors.

| rename COMMENT as "find total of records for each command for each timeframe "
| eventstats sum(totalcount) as commandcount by command timeframe  

| rename COMMENT as "set the  _time to the end of the three time periods"
| eval _time=_info_min_time + 1800*timeframe

Now you can look at the absolute number and/or percentage of errors in each timeframe that are not "(NONE)" and see whether you have a consistent error condition. One way would be to do this.

| eval errorpercent= totalCount /commandcount 
| eventstats min(errorpercent)  as minpercent max(errorpercent)  as maxpercent  by command
| where ... minpercent and maxpercent match some criterial you set.
0 Karma

tauliang
Communicator

One thing you could try is to apply a time-based window of 30m to streamstats

streamstats

and build your alert condition based on that.

0 Karma
Get Updates on the Splunk Community!

Data Management Digest – December 2025

Welcome to the December edition of Data Management Digest! As we continue our journey of data innovation, the ...

Index This | What is broken 80% of the time by February?

December 2025 Edition   Hayyy Splunk Education Enthusiasts and the Eternally Curious!    We’re back with this ...

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Hello Splunk Community,   We're thrilled to share an exciting update that will help you manage your data more ...