Alerting

Alert when 10 continuous events match certain condition?

sonicant
Path Finder

Is there anyway to tell splunk to judge whether some error codes appear in 10 continuous events?
The key point is "continuous 10 events", means if some normal event breaks in, then don't need to trigger alert.

For example:
if return code is in "ER1111" "222222" "GT9999" "empty" or less than zero, and such events appeared continuous for 10-times, then raise a alert.

Logs like this should alert because 10 continuous events all match the condition:


[2012-05-09 11:10:02] ## XXXXX return=ER1111

[2012-05-09 11:10:03] ## XXXXX return=ER1111

[2012-05-09 11:10:04] ## XXXXX return=222222

[2012-05-09 11:10:05] ## XXXXX return=ER1111

[2012-05-09 11:10:06] ## XXXXX return=GT9999

[2012-05-09 11:10:07] ## XXXXX return=ER1111

[2012-05-09 11:10:08] ## XXXXX return=

[2012-05-09 11:10:09] ## XXXXX return=ER1111

[2012-05-09 11:10:10] ## XXXXX return=-3131

[2012-05-09 11:10:11] ## XXXXX return=GT9999


But logs like this should not alert because an event with return=000000 breaked in line 5:


[2012-05-09 11:10:13] ## XXXXX return=ER1111

[2012-05-09 11:10:14] ## XXXXX return=-99433

[2012-05-09 11:10:15] ## XXXXX return=-3041

[2012-05-09 11:10:16] ## XXXXX return=222222

[2012-05-09 11:10:17] ## XXXXX return=000000

[2012-05-09 11:10:18] ## XXXXX return=ER1111

[2012-05-09 11:10:19] ## XXXXX return=ER1111

[2012-05-09 11:10:22] ## XXXXX return=222222

[2012-05-09 11:10:26] ## XXXXX return=GT9999

[2012-05-09 11:10:28] ## XXXXX return=222222

[2012-05-09 11:10:31] ## XXXXX return=

[2012-05-09 11:10:33] ## XXXXX return=-796

[2012-05-09 11:10:37] ## XXXXX return=-4176


Would be appreciate if someone can help~~

Tags (2)
0 Karma
1 Solution

kristian_kolb
Ultra Champion

Hi there, this approach is a little bit different than Ayn's, which may be useful if you have a limited number of 'bad' return codes that you want to alert on, and a larger number of 'good' return codes (that break the spell so-to-speak). Uses an eval'd field (XYZ) in a boolean fashion, and uses streamstats for a rolling window of the last 10 events.

sourcetype=your_sourcetype
| eval XYZ=if(isnull(return), 1, 0)
| eval XYZ=case(return == ER1111, "1", return == GT9999, "1", return == 222222, "1", return < 0, "1") 
| streamstats window=10 sum(XYZ) AS XYZ_Count values(return) AS Return_Codes 
| table Return_Codes XYZ_Count

Then you can set up alerting with the custom condition set to search XYZ_Count > 10
The downside is that you'll get 2 alerts if there are 11 consecutive 'bad' return codes, 3 alerts if there are 12, and so on.


UPDATE:

streamstats does not automatically use real-time searching. The main feature is rather that it by design works on the last n number of events that are piped through it.

Hope this helps,

Kristian

View solution in original post

sanjeev_srivast
New Member

Can anyone help in my query please i need to set-up alert, i need alert to fire only when "there are continuous 20 error events in 5 minutes of time span"

Would this work:
index="PQR" "The Network Adapter could not establish the connection" earliest=-5m@m | timechart partial=f span=5m count | eval alert=0
| foreach count [eval alert=if(count<500, 0, 1)] | addcoltotals | where isnull(_time)

0 Karma

kristian_kolb
Ultra Champion

Hi there, this approach is a little bit different than Ayn's, which may be useful if you have a limited number of 'bad' return codes that you want to alert on, and a larger number of 'good' return codes (that break the spell so-to-speak). Uses an eval'd field (XYZ) in a boolean fashion, and uses streamstats for a rolling window of the last 10 events.

sourcetype=your_sourcetype
| eval XYZ=if(isnull(return), 1, 0)
| eval XYZ=case(return == ER1111, "1", return == GT9999, "1", return == 222222, "1", return < 0, "1") 
| streamstats window=10 sum(XYZ) AS XYZ_Count values(return) AS Return_Codes 
| table Return_Codes XYZ_Count

Then you can set up alerting with the custom condition set to search XYZ_Count > 10
The downside is that you'll get 2 alerts if there are 11 consecutive 'bad' return codes, 3 alerts if there are 12, and so on.


UPDATE:

streamstats does not automatically use real-time searching. The main feature is rather that it by design works on the last n number of events that are piped through it.

Hope this helps,

Kristian

kristian_kolb
Ultra Champion

you're welcome. /k

0 Karma

sonicant
Path Finder

thank you Kristian, now I understand 🙂

0 Karma

kristian_kolb
Ultra Champion

see update above. /k

0 Karma

sonicant
Path Finder

Hi Kristian
The command streamstats you just introduced is useful on this case, thank you!
One more concern, since the streamstats monitors new events in realtime, will it consumes too much CPU or IO resources?

0 Karma

Ayn
Legend

If you can come up with a solid definition of a "normal" event this should be easy enough to accomplish using the transaction command, which will create groups (or "transactions") of events based on rules you specify. I'm assuming that the XXXXX you show in your sample events is some kind of identifier to tie the events together? If so you could do this, assuming the XXXXX value is extracted into a field, let's call it id:

... | transaction id endswith=eval(return=000000)

Also this is assuming that the definition of a "normal" event is one where return is 00000. You could easily extend this by adding other values as well (return=000000 OR return=...)

Once you have run the transaction command, a couple of new fields will have been created - among those, "eventcount" which says how many events were included in the transaction. Because the transaction command has been told to stop if it finds a "normal" event, all transactions with an eventcount of 10 or higher will match what you're looking for. So:

... | transaction id endswith=eval(return=000000) | search eventcount>=10

This should give you the results you want.

sonicant
Path Finder

Hi Ayn

In fact, I don't have the real log on hand....those logs are sensitive (including bank account information), so I'm not authorized to copy them out....sad...but thank you!

0 Karma

Ayn
Legend

Well the more information you can include on the logs you're operating on and what specifical goal you have, the easier it will be to help you of course.

0 Karma

sonicant
Path Finder

Hi Ayn, thank you for your answer, but the sample log is just simplified from original log, the XXXXX means some other fields/words, and actually some log event comes in XML format which need xmlkv to deal with it.

for example:
[2012-05-09 11:10:16] abc=123 def=456 return=222222

[2012-05-09 11:10:17] def=789 ghi=012 return=000000

Or:
[2012-05-09 11:10:17] action 123 started..

[2012-05-09 11:10:18] 123
ER1111
001211
action 123 finished

[2012-05-09 11:11:20] some other text.....

But your answer is helpful to me, thanks anyway!

0 Karma
Get Updates on the Splunk Community!

Splunk APM & RUM | Upcoming Planned Maintenance

There will be planned maintenance of Splunk APM’s and Splunk RUM’s streaming infrastructure in the coming ...

Part 2: Diving Deeper With AIOps

Getting the Most Out of Event Correlation and Alert Storm Detection in Splunk IT Service Intelligence   Watch ...

User Groups | Upcoming Events!

If by chance you weren't already aware, the Splunk Community is host to numerous User Groups, organized ...