Solved: Alert when 10 continuous events match certain cond...

sonicant · ‎05-08-2012

Is there anyway to tell splunk to judge whether some error codes appear in 10 continuous events?
The key point is "continuous 10 events", means if some normal event breaks in, then don't need to trigger alert.

For example:
if return code is in "ER1111" "222222" "GT9999" "empty" or less than zero, and such events appeared continuous for 10-times, then raise a alert.

Logs like this should alert because 10 continuous events all match the condition:

[2012-05-09 11:10:02] ## XXXXX return=ER1111

[2012-05-09 11:10:03] ## XXXXX return=ER1111

[2012-05-09 11:10:04] ## XXXXX return=222222

[2012-05-09 11:10:05] ## XXXXX return=ER1111

[2012-05-09 11:10:06] ## XXXXX return=GT9999

[2012-05-09 11:10:07] ## XXXXX return=ER1111

[2012-05-09 11:10:08] ## XXXXX return=

[2012-05-09 11:10:09] ## XXXXX return=ER1111

[2012-05-09 11:10:10] ## XXXXX return=-3131

[2012-05-09 11:10:11] ## XXXXX return=GT9999

But logs like this should not alert because an event with return=000000 breaked in line 5:

[2012-05-09 11:10:13] ## XXXXX return=ER1111

[2012-05-09 11:10:14] ## XXXXX return=-99433

[2012-05-09 11:10:15] ## XXXXX return=-3041

[2012-05-09 11:10:16] ## XXXXX return=222222

[2012-05-09 11:10:17] ## XXXXX return=000000

[2012-05-09 11:10:18] ## XXXXX return=ER1111

[2012-05-09 11:10:19] ## XXXXX return=ER1111

[2012-05-09 11:10:22] ## XXXXX return=222222

[2012-05-09 11:10:26] ## XXXXX return=GT9999

[2012-05-09 11:10:28] ## XXXXX return=222222

[2012-05-09 11:10:31] ## XXXXX return=

[2012-05-09 11:10:33] ## XXXXX return=-796

[2012-05-09 11:10:37] ## XXXXX return=-4176

Would be appreciate if someone can help~~

kristian_kolb · ‎05-09-2012

Hi there, this approach is a little bit different than Ayn's, which may be useful if you have a limited number of 'bad' return codes that you want to alert on, and a larger number of 'good' return codes (that break the spell so-to-speak). Uses an eval'd field (XYZ) in a boolean fashion, and uses streamstats for a rolling window of the last 10 events.

sourcetype=your_sourcetype
| eval XYZ=if(isnull(return), 1, 0)
| eval XYZ=case(return == ER1111, "1", return == GT9999, "1", return == 222222, "1", return < 0, "1") 
| streamstats window=10 sum(XYZ) AS XYZ_Count values(return) AS Return_Codes 
| table Return_Codes XYZ_Count

Then you can set up alerting with the custom condition set to search XYZ_Count > 10
The downside is that you'll get 2 alerts if there are 11 consecutive 'bad' return codes, 3 alerts if there are 12, and so on.

UPDATE:

streamstats does not automatically use real-time searching. The main feature is rather that it by design works on the last n number of events that are piped through it.

Hope this helps,

Kristian

View solution in original post

sanjeev_srivast · ‎11-14-2018

Can anyone help in my query please i need to set-up alert, i need alert to fire only when "there are continuous 20 error events in 5 minutes of time span"

kristian_kolb · ‎05-09-2012

Hi there, this approach is a little bit different than Ayn's, which may be useful if you have a limited number of 'bad' return codes that you want to alert on, and a larger number of 'good' return codes (that break the spell so-to-speak). Uses an eval'd field (XYZ) in a boolean fashion, and uses streamstats for a rolling window of the last 10 events.

sourcetype=your_sourcetype
| eval XYZ=if(isnull(return), 1, 0)
| eval XYZ=case(return == ER1111, "1", return == GT9999, "1", return == 222222, "1", return < 0, "1") 
| streamstats window=10 sum(XYZ) AS XYZ_Count values(return) AS Return_Codes 
| table Return_Codes XYZ_Count

Then you can set up alerting with the custom condition set to search XYZ_Count > 10
The downside is that you'll get 2 alerts if there are 11 consecutive 'bad' return codes, 3 alerts if there are 12, and so on.

UPDATE:

streamstats does not automatically use real-time searching. The main feature is rather that it by design works on the last n number of events that are piped through it.

Hope this helps,

Kristian

kristian_kolb · ‎05-09-2012

you're welcome. /k

sonicant · ‎05-09-2012

thank you Kristian, now I understand 🙂

kristian_kolb · ‎05-09-2012

see update above. /k

sonicant · ‎05-09-2012

Hi Kristian
The command streamstats you just introduced is useful on this case, thank you!
One more concern, since the streamstats monitors new events in realtime, will it consumes too much CPU or IO resources?

Ayn · ‎05-09-2012

If you can come up with a solid definition of a "normal" event this should be easy enough to accomplish using the transaction command, which will create groups (or "transactions") of events based on rules you specify. I'm assuming that the XXXXX you show in your sample events is some kind of identifier to tie the events together? If so you could do this, assuming the XXXXX value is extracted into a field, let's call it id:

... | transaction id endswith=eval(return=000000)

Also this is assuming that the definition of a "normal" event is one where return is 00000. You could easily extend this by adding other values as well (return=000000 OR return=...)

Once you have run the transaction command, a couple of new fields will have been created - among those, "eventcount" which says how many events were included in the transaction. Because the transaction command has been told to stop if it finds a "normal" event, all transactions with an eventcount of 10 or higher will match what you're looking for. So:

... | transaction id endswith=eval(return=000000) | search eventcount>=10

This should give you the results you want.

sonicant · ‎05-09-2012

Hi Ayn

In fact, I don't have the real log on hand....those logs are sensitive (including bank account information), so I'm not authorized to copy them out....sad...but thank you!

Ayn · ‎05-09-2012

Well the more information you can include on the logs you're operating on and what specifical goal you have, the easier it will be to help you of course.

sonicant · ‎05-09-2012

Hi Ayn, thank you for your answer, but the sample log is just simplified from original log, the XXXXX means some other fields/words, and actually some log event comes in XML format which need xmlkv to deal with it.

for example:
[2012-05-09 11:10:16] abc=123 def=456 return=222222

[2012-05-09 11:10:17] def=789 ghi=012 return=000000

Or:
[2012-05-09 11:10:17] action 123 started..

[2012-05-09 11:10:18] 123
ER1111
001211
action 123 finished

[2012-05-09 11:11:20] some other text.....

But your answer is helpful to me, thanks anyway!

Alert when 10 continuous events match certain condition?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Best Practices: Splunk auto adjust pipeline queue

Laser Bananas and Edge Hubs: Exploring Operational Technology (OT) Data Through a ...

Event Series: Mastering AI Tokenomics and Splunk Agent Observability

Join the Conversation