I have to figure out a way to do two things: alert me when a forwarder stops sending events to Splunk, and when an event stream (sourcetype) stops, with the forwarder still running. The problem is that there are other customers using the deployment server so we can't use built-in monitoring there, unless there is a way to monitor a subset of all systems reporting in. Let's say there are about a thousand systems to monitor and they don't all have the same sourcetypes on them. Example: server1 has sourcetypes A, B, C, and server2 has sourcetypes A, X, Y, Z. I have looked at other posts from the community. I'm stuck and hope you Splunk gurus can help. Many thanks in advance.
There are few approaches you can do
Approach 1) build a baseline set of sourcetypes/hostnames which you want to compulsorily monitor.(You could do it manually or fetch from cmdb). Build it as a lookup. Let's call it baselineHosts.csv
Do a search of what is currently being logged in let's say last 10mins
|tstats latest(_time) WHERE index=* earliest=-10m by host,sourcetype
Then do a comparison of the baselineHosts.csv vs the logged systems and find which are NOT coming
|inputlookup baselineHosts.csv | fields host,sourcetype
| join type=left host sourcetype [|tstats count WHERE index=* earliest=-10m by host,sourcetype|fields host,sourcetype]
| where count < 1
Approach 2) Do based on last logged time. In here get the latest time and alert if it is more than 10 minutes
|tstats latest(_time) as last_logged WHERE index=* earliest=-1h by host,sourcetype|eval timeDiff=now()-last_logged| where timeDiff > 600
There are few approaches you can do
Approach 1) build a baseline set of sourcetypes/hostnames which you want to compulsorily monitor.(You could do it manually or fetch from cmdb). Build it as a lookup. Let's call it baselineHosts.csv
Do a search of what is currently being logged in let's say last 10mins
|tstats latest(_time) WHERE index=* earliest=-10m by host,sourcetype
Then do a comparison of the baselineHosts.csv vs the logged systems and find which are NOT coming
|inputlookup baselineHosts.csv | fields host,sourcetype
| join type=left host sourcetype [|tstats count WHERE index=* earliest=-10m by host,sourcetype|fields host,sourcetype]
| where count < 1
Approach 2) Do based on last logged time. In here get the latest time and alert if it is more than 10 minutes
|tstats latest(_time) as last_logged WHERE index=* earliest=-1h by host,sourcetype|eval timeDiff=now()-last_logged| where timeDiff > 600
Thanks koshyk! I appreciate the help.