I'm working on 2 scenarios, and I'm sure you guys can help me.
We have about 15 indexes.
Sometimes, the applications that generate data to Splunk gets stuck and no data is indexed.
What I need is to alert the user every time the last event indexed in an index is older than 1 hour.
Using metadata I'm able to deal with only 1 index at a time.
We have now about 50 forwarders sending data to Splunk.
What is the best way to alert the user if any forwarder did not phoned home in the last hour?
this method is applicable both your scenarios.
You have to create a lookup with all the indexes/forwarders to monitor (e.g.: indexes.csv or perimeter.csv) and then run a search like these using as time period one hour:
| inputlookup indexes.csv | eval count=0 | append [ index=* | stats count by index ] | stats sum(count) AS total by index | where total=0
(instead to use index=* you could create a tag or an eventtype with all your index to monitor)
| inputlookup perimeter.csv | eval count=0, host=lower(host) | append [ index=_internal | eval host=lower(host)| stats count by index ] | stats sum(count) AS total by index | where total=0
In this way you have indexes and forwarders that doesn't send logs in the selected period.
If you like, instead to have condition |where total=0 you could use rangemap command and have a visualization of the indexes or forwarders status (also graphically).