Splunk Search

Alerting on undefined events.

agthurber
Explorer

We have set up many alerts to trigger based on a count threshold for a specific event over a set period of time. Given that we have identified most of the events we expect to see, how can we set a threshold for events that do not fit into those definitions?

Basically I am looking to find a count of all errors then subtract the number of known events and set a threshold to send an alert if the result is over 100 events. Probably just a syntax question but where would i start?

thanks, Arlen

Tags (3)
1 Solution

ftk
Motivator

A fairly quick and relatively easy to maintain way to do this is with eventtypes. Basically you define your known events as eventtypes and then filter them out -- what is left are your unknown events.

Let's consider the following sample data:

1/1/2011 11:33 [ERROR] processing fault 0x000230da
1/1/2011 11:34 [WARN] failed to instantiate appX
1/1/2011 11:36 MyCustomApp: error saving events to file - action aborted
1/1/2011 14:55 critical failure of PSU 1 SRV234DB01

All of these events show up when you do a search for errors:

error* OR fail*

However the first three events are known events, and we are only interested in the unknown events at this point. What we can do is define eventtypes for the known events. Docs link to creating eventtypes from the UI

In eventtypes.conf:

[known_error_1]
search = "[ERROR] processing fault 0x00*"
[known_error_2]
search = "[WARN] failed to instantiate AppX"
[known_error_3]
search = "MyCustomApp: error saving to file - action aborted"

Now we can do a search like this:

(error* OR fail*) NOT eventtype=known_error_*

which will only spit out the last event of our sample data since it didn't match any of the eventtypes. The nice thing about this is, whenever you discover new events that you want to filter out you simply add a new stanza to your eventtypes.conf. You don't have to worry about updating all your saved searches, or creating very long and complex searches with lots of ORs and NOTs in order to filter known errors from your results.

This technique comes in very handy to filter out the background noise of known events in your environment, especially in alerts and dashboards.

View solution in original post

ftk
Motivator

A fairly quick and relatively easy to maintain way to do this is with eventtypes. Basically you define your known events as eventtypes and then filter them out -- what is left are your unknown events.

Let's consider the following sample data:

1/1/2011 11:33 [ERROR] processing fault 0x000230da
1/1/2011 11:34 [WARN] failed to instantiate appX
1/1/2011 11:36 MyCustomApp: error saving events to file - action aborted
1/1/2011 14:55 critical failure of PSU 1 SRV234DB01

All of these events show up when you do a search for errors:

error* OR fail*

However the first three events are known events, and we are only interested in the unknown events at this point. What we can do is define eventtypes for the known events. Docs link to creating eventtypes from the UI

In eventtypes.conf:

[known_error_1]
search = "[ERROR] processing fault 0x00*"
[known_error_2]
search = "[WARN] failed to instantiate AppX"
[known_error_3]
search = "MyCustomApp: error saving to file - action aborted"

Now we can do a search like this:

(error* OR fail*) NOT eventtype=known_error_*

which will only spit out the last event of our sample data since it didn't match any of the eventtypes. The nice thing about this is, whenever you discover new events that you want to filter out you simply add a new stanza to your eventtypes.conf. You don't have to worry about updating all your saved searches, or creating very long and complex searches with lots of ORs and NOTs in order to filter known errors from your results.

This technique comes in very handy to filter out the background noise of known events in your environment, especially in alerts and dashboards.

mzorzi
Splunk Employee
Splunk Employee

let's say the search that returns your event is:

index=myindex errorfield="corrupted file"

You could use the not condition:

NOT ( index=myindex  AND errorfield="corrupted file" )

And then trigger the Alert when the number of these events is greater than zero

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...