I have two search conditions that I need to trigger alerts from. I have a hundred hosts on a HA cluster. Sometimes host(s) happen to leave an HA cluster and come back online, due to network issues or during a production changes by engineers. When a host leaves the HA cluster, I get a single message in Splunk that reads "serverX has gone out-of-sync". When the host joins back the HA cluster, I get a single message in Splunk that reads "serverX has gone in-sync". This means I have two search results to play with.
My Goal: When a host leaves the HA cluster and comes back within an hour, do not send any alerts. But if a host leaves an HA cluster, but does not come back online after an hour, trigger an alert.
Here is what I have done so far (search period =1hr):
index=test sync_status="out-of-sync" [search index=test sync_status="in-sync" | dedup server | table server]
I get undesired results. I expect to see only the host that went offline but did not join back the cluster (of which I can see results when I do simple searches).
Am I in the right direction, from a search and logic perspective? Are they better search methods of doing it?
Hi @NatSec,
please, try something like this:
index=test (sync_status="out-of-sync" OR sync_status="in-sync"
| stats dc(sync_status) AS dc_sync_status values(sync_status) AS sync_status BY host
| where dc_sync_status =1 AND sync_status="out-of-sync"
You can schedule an alert running every hours on a time period of an hour using this search that triggers when you have only sync_status="out-of-sync" in the last hour for one host.
Ciao.
Giuseppe