Is there anyway to tell splunk to judge whether some error codes appear in 10 continuous events?
The key point is "continuous 10 events", means if some normal event breaks in, then don't need to trigger alert.
For example:
if return code is in "ER1111" "222222" "GT9999" "empty" or less than zero, and such events appeared continuous for 10-times, then raise a alert.
Logs like this should alert because 10 continuous events all match the condition:
[2012-05-09 11:10:02] ## XXXXX return=ER1111
[2012-05-09 11:10:03] ## XXXXX return=ER1111
[2012-05-09 11:10:04] ## XXXXX return=222222
[2012-05-09 11:10:05] ## XXXXX return=ER1111
[2012-05-09 11:10:06] ## XXXXX return=GT9999
[2012-05-09 11:10:07] ## XXXXX return=ER1111
[2012-05-09 11:10:08] ## XXXXX return=
[2012-05-09 11:10:09] ## XXXXX return=ER1111
[2012-05-09 11:10:10] ## XXXXX return=-3131
[2012-05-09 11:10:11] ## XXXXX return=GT9999
But logs like this should not alert because an event with return=000000 breaked in line 5:
[2012-05-09 11:10:13] ## XXXXX return=ER1111
[2012-05-09 11:10:14] ## XXXXX return=-99433
[2012-05-09 11:10:15] ## XXXXX return=-3041
[2012-05-09 11:10:16] ## XXXXX return=222222
[2012-05-09 11:10:17] ## XXXXX return=000000
[2012-05-09 11:10:18] ## XXXXX return=ER1111
[2012-05-09 11:10:19] ## XXXXX return=ER1111
[2012-05-09 11:10:22] ## XXXXX return=222222
[2012-05-09 11:10:26] ## XXXXX return=GT9999
[2012-05-09 11:10:28] ## XXXXX return=222222
[2012-05-09 11:10:31] ## XXXXX return=
[2012-05-09 11:10:33] ## XXXXX return=-796
[2012-05-09 11:10:37] ## XXXXX return=-4176
Would be appreciate if someone can help~~
Hi there, this approach is a little bit different than Ayn's, which may be useful if you have a limited number of 'bad' return codes that you want to alert on, and a larger number of 'good' return codes (that break the spell so-to-speak). Uses an eval'd field (XYZ) in a boolean fashion, and uses streamstats for a rolling window of the last 10 events.
sourcetype=your_sourcetype
| eval XYZ=if(isnull(return), 1, 0)
| eval XYZ=case(return == ER1111, "1", return == GT9999, "1", return == 222222, "1", return < 0, "1")
| streamstats window=10 sum(XYZ) AS XYZ_Count values(return) AS Return_Codes
| table Return_Codes XYZ_Count
Then you can set up alerting with the custom condition set to search XYZ_Count > 10
The downside is that you'll get 2 alerts if there are 11 consecutive 'bad' return codes, 3 alerts if there are 12, and so on.
UPDATE:
streamstats does not automatically use real-time searching. The main feature is rather that it by design works on the last n number of events that are piped through it.
Hope this helps,
Kristian
Can anyone help in my query please i need to set-up alert, i need alert to fire only when "there are continuous 20 error events in 5 minutes of time span"
Would this work:
index="PQR" "The Network Adapter could not establish the connection" earliest=-5m@m | timechart partial=f span=5m count | eval alert=0
| foreach count [eval alert=if(count<500, 0, 1)] | addcoltotals | where isnull(_time)
Hi there, this approach is a little bit different than Ayn's, which may be useful if you have a limited number of 'bad' return codes that you want to alert on, and a larger number of 'good' return codes (that break the spell so-to-speak). Uses an eval'd field (XYZ) in a boolean fashion, and uses streamstats for a rolling window of the last 10 events.
sourcetype=your_sourcetype
| eval XYZ=if(isnull(return), 1, 0)
| eval XYZ=case(return == ER1111, "1", return == GT9999, "1", return == 222222, "1", return < 0, "1")
| streamstats window=10 sum(XYZ) AS XYZ_Count values(return) AS Return_Codes
| table Return_Codes XYZ_Count
Then you can set up alerting with the custom condition set to search XYZ_Count > 10
The downside is that you'll get 2 alerts if there are 11 consecutive 'bad' return codes, 3 alerts if there are 12, and so on.
UPDATE:
streamstats does not automatically use real-time searching. The main feature is rather that it by design works on the last n number of events that are piped through it.
Hope this helps,
Kristian
you're welcome. /k
thank you Kristian, now I understand 🙂
see update above. /k
Hi Kristian
The command streamstats you just introduced is useful on this case, thank you!
One more concern, since the streamstats monitors new events in realtime, will it consumes too much CPU or IO resources?
If you can come up with a solid definition of a "normal" event this should be easy enough to accomplish using the transaction
command, which will create groups (or "transactions") of events based on rules you specify. I'm assuming that the XXXXX you show in your sample events is some kind of identifier to tie the events together? If so you could do this, assuming the XXXXX value is extracted into a field, let's call it id
:
... | transaction id endswith=eval(return=000000)
Also this is assuming that the definition of a "normal" event is one where return
is 00000. You could easily extend this by adding other values as well (return=000000 OR return=...)
Once you have run the transaction command, a couple of new fields will have been created - among those, "eventcount" which says how many events were included in the transaction. Because the transaction
command has been told to stop if it finds a "normal" event, all transactions with an eventcount of 10 or higher will match what you're looking for. So:
... | transaction id endswith=eval(return=000000) | search eventcount>=10
This should give you the results you want.
Hi Ayn
In fact, I don't have the real log on hand....those logs are sensitive (including bank account information), so I'm not authorized to copy them out....sad...but thank you!
Well the more information you can include on the logs you're operating on and what specifical goal you have, the easier it will be to help you of course.
Hi Ayn, thank you for your answer, but the sample log is just simplified from original log, the XXXXX means some other fields/words, and actually some log event comes in XML format which need xmlkv to deal with it.
for example:
[2012-05-09 11:10:16] abc=123 def=456 return=222222
[2012-05-09 11:10:17] def=789 ghi=012 return=000000
Or:
[2012-05-09 11:10:17] action 123 started..
[2012-05-09 11:10:18]
action 123 finished
[2012-05-09 11:11:20] some other text.....
But your answer is helpful to me, thanks anyway!