Alerting

Alert when you can't find message on a host

mahasd
New Member

We have job that run on all hosts every 5 minutes and once completed it writes completed message. On the basis of completed message we know it's successful. I was able to create alert on the basis of completed message(trigger alert when number of host is not equal to 100). In email it send all the host that it found message and I have to find myself in which host it didn't run.
I want to set alert and send email with the hostname in which the job did not complete.

search: host=tm1-dc-cc-* "Completed monitor recovery service"

Tags (1)
0 Karma

CryoHydra
Path Finder

host=tm1-dc-cc-* "Completed monitor recovery service" --> needs to trigger alert when host count not equal to 100.

host=tm1-dc-cc-* "Completed monitor recovery service"
| stats values(host) dc(host) as count
| where count!=100

here where count!=100 helps you to trigger your expectation. Please accept the answer if it helps !

0 Karma

somesoni2
Revered Legend

I believe you can write this kind of alert by two methods
1) First is to find when is the last completed message written from the host. If it's not recently, it wasn't completed within your threshold time, then alert it for those host.
e.g. your job runs every 5 min and I'm assuming your alert search too, they you'll select data for say last 60 mins, see for each host when was the last Completed message was received and compare that with current time. Below search would generate events when a host has not written a completed message in 10 mins. Your alert condition would "if number of events greater than 0".

 host=tm1-dc-cc-* "Completed monitor recovery service" | table _time host | dedup host 
| eval age=now()-_time | where age>600

2) Other option is to have a lookup table file with all your host names. You can setup a scheduled search to frequently update the lookup table with new servers. Once you've the lookup table, use that in search to find which ones are actually not reported in given period:
E.g. say you've a lookup table file tm_dc_hosts.csv with a column host, below can be your alert search with alert condition as "if number of events greater than 0".

[| inputlookup tm_dc_hosts.csv | table host ] "Completed monitor recovery service"
| table host | eval from=2
| append [| inputlookup tm_dc_hosts.csv | table host | eval from=1]
| stats max(from) as from by host | where from=1

CryoHydra
Path Finder

@somesoni2 what eval from=2 actually means here ? could you please elaborate the input lookup query there.

0 Karma

mahasd
New Member

it didn't work exactly what you suggested but I was able to make it work by

host=tm1-dc-cc-* | search NOT [search host=tm1-dc-cc-* "completed monitor recovery" | fields host | format] | stats count by host
Thank you though.

0 Karma

PowerPacked
Builder

Hi @mahasd

use the below search

your search | NOT "Completed monitor recovery service" | stats c by host

trigger if the count was greater than 0 and you will also get a list of hosts.

Thanks

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...