Splunk Search

Need to edit search to NOT alert if another similar alert comes in within 5 minutes

Communicator

Hello,

I have two searches that alert on every occurrence:
3rd party agent drops offline: index=appevtlogsprod host=HOST01 "3rd Party App Alert 201"
3rd party agent comes online: index=appevtlogsprod host=HOST01 "3rd Party App Alert 200"

The problem is that I'll get alerted every time (about 70 - 100 alerts per evening) we have a small network/server blip during backups (but the 3rd party agent is actually still running on the server). So we get a lot of false positives.

I want to avoid getting alerted if the 3rd party agent drops offline but comes online within 5 minutes.

Here's an example of the output from the event log it picks up:

OFFLINE EVENT LOG ENTRY:
12/03/2014 09:01:59 PM
LogName=Application
SourceName=gecsABC.SYSMGR
EventCode=2
EventType=3
Type=Warning
ComputerName=serverhostname.domain.com
TaskCategory=None
OpCode=None
RecordNumber=122677
Keywords=Classic
Message=GECS Warning: Controller: gecsABC / GECS 3rd Party App Alert 201 - Agent FNAPP02D set to offline.

ONLINE EVENT LOG ENTRY:
12/03/2014 09:02:30 PM
LogName=Application
SourceName=gecsABC.SYSMGR
EventCode=2
EventType=3
Type=Warning
ComputerName=serverhostname.domain.com
TaskCategory=None
OpCode=None
RecordNumber=122679
Keywords=Classic
Message=GECS Warning: Controller: gecsABC / GECS 3rd Party App Alert 200 - Agent FNAPP02D now online & responding.

I wouldn't mind regex'ing the string for the service name (in this case it's FNAPP02D) so that I can use one Splunk search for all situations with different service names.

Any help on this would be truly appreciated! 🙂

Thanks.

Tags (4)
1 Solution

SplunkTrust
SplunkTrust

If you have to have two searches then I'd make each search check over a slightly longer time range and only alert if a server has been offline for more than five minutes / has come online without going offline within the past five minutes.

Maybe you can consolidate the two into one search though, something like this (untested):

  index=app_evtlogs_prod host=HOST01 "3rd Party App Alert 201" OR "3rd Party App Alert 200"
| transaction AgentId keepevicted=t maxspan=5m startswith="3rd Party App Alert 201" endswith="3rd Party App Alert 200"
| where eventcount < 2

You'll end up with offline and online events that do not have a corresponding "partner" event within five minutes.

View solution in original post

SplunkTrust
SplunkTrust

If you have to have two searches then I'd make each search check over a slightly longer time range and only alert if a server has been offline for more than five minutes / has come online without going offline within the past five minutes.

Maybe you can consolidate the two into one search though, something like this (untested):

  index=app_evtlogs_prod host=HOST01 "3rd Party App Alert 201" OR "3rd Party App Alert 200"
| transaction AgentId keepevicted=t maxspan=5m startswith="3rd Party App Alert 201" endswith="3rd Party App Alert 200"
| where eventcount < 2

You'll end up with offline and online events that do not have a corresponding "partner" event within five minutes.

View solution in original post

SplunkTrust
SplunkTrust

Great to hear 😄

0 Karma

SplunkTrust
SplunkTrust

Based on the two messages, you could go to Settings -> Fields -> Field Extractions -> Add new, enter agent_id as name, select sourcetype and enter the sourcetype of these events, keep inline, and use this as the inline extraction:

Agent (?<agent_id>\S+) (now online|set to offline) in Message

That assumes the key-value field Message has been auto-extracted and pulls the agent ID from its value.

Communicator

After adding this Field, should it show under "Interesting Fields" when performing the search (the same search that my scheduled search does - under Searches & Reports)?

Or would it show as a field when I expand a particular search result?

Does this new Field require a spunk restart to take effect?

Thanks!

0 Karma

Communicator

Hi Martin,

You are the freakin man! 😄

This is the final search that looks to be working as needed:
index=appevtlogsprod host=HOST01 "3rd party app Alert 201" OR "3rd party app Alert 200" | rex "- Agent (?\S+)" | transaction AgentId keepevicted=t maxspan=5m startswith="3rd party app Alert 201" endswith="3rd party app Alert 200" | where eventcount < 2

SplunkTrust
SplunkTrust

Yeah, I have assumed there is a field called AgentId that contains some kind of, uhm, Id for your Agents. If that doesn't exist yet you will need to either add a rex command before the transaction or configure the field to be extracted by default for that sourcetype. That'll make sure the pairs are connected together for each agent individually.

0 Karma

Communicator

I'm not regex expert by no means, but would you happen to know how to structure this regex (needs to be a field that has letters & numbers in any sequence & length - so just about anything. I kind of need a rex wildcard in there)?

I know how to identify a numeric value for creating the field (i.e.: rex "Blah Blah Blah the Number of message is: (?\d+) messages"

But that's the only rex I've created so far by borrowing someone else's idea. 😉

Thanks!

0 Karma

Communicator

Thanks for the quick reply Martin!

I'm definitely happy to consolidate the searches to one. Cleaner the better. 🙂

One thing to note is that it must look for the indexed event within 5 minutes that must contain the agent name to cross-reference: i.e.: FNAPP02D
Because sometimes I'll get about 50 OFFline splunk alerts and 2 minutes later another 50 ONline splunk alerts, but all 50 pairs have different agent names.

Is that was you were identifying as "AgentId" in your example above?
Don't need to do create a regex to identify the string "Agent "

0 Karma