I have about a dozen data sources that I want to monitor for an outage... like>>> No Events in Last 60 minutes.
Currently I have been using a separate alerts for each data source / index which run every hour and alert if there are 1 < events.
I am just wondering if there is a better way to do this.... I also have to contend with some sources having longer that 60 minute delay at times...
Thank you
Thank you for the suggestion, but we were not able to create alerts for individual indexes / data sources with |metadata , maybe you know how to do that...?
So we are currently using time interval outage alerts for each specific index AND/OR sourcetype AND/OR source.
For example >>>
| tstats count where index=<foo> sourcetype=<bar> earliest=<-60m>
the result is a count and thus in the alerts we use custom alert trigger "search count=0"
I used these two commands below to perform what you are doing. You can use your personal rule to trigger an alert or just monitor it in a Dashboard for example.
#01 - Monitor the incoming data from hosts
| metadata type=hosts
| convert ctime(lastTime), cTime(recentTime)
#02 - Monitor the incoming data from sourcetypes
| metadata type=sourcetypes
| convert ctime(lastTime), cTime(recentTime)
#03 - Monitor the incoming data from an specific index
| metadata type=sourcetypes index=_internal
| convert ctime(lastTime), cTime(recentTime)
And take a look at this post on Splunk' Blog: https://www.splunk.com/en_us/blog/tips-and-tricks/metadata-metalore.html
Thank you for the suggestion, but we were not able to create alerts for individual indexes / data sources with |metadata , maybe you know how to do that...?
So we are currently using time interval outage alerts for each specific index AND/OR sourcetype AND/OR source.
For example >>>
| tstats count where index=<foo> sourcetype=<bar> earliest=<-60m>
the result is a count and thus in the alerts we use custom alert trigger "search count=0"