We have a lot of searches that run to ensure we are receiving data from a Splunk forwarder and that it is still running. To do this, we have a search set up for each forwarder checking against the internal logs:
index="_internal" source="*metrics.log" group=tcpin_connections | eval sourceHost=if(isnull(hostname), sourceHost,hostname) | search sourceHost="phpdbo01"
The above alerts us when the forwarder phpdbo01 is down/not sending information back to indexers. One of the issues we have is that we want to have a default message come through, if Splunk attempts to run a search and the indexers are down, we would like for a message such as "Splunk indexers are down please resubmit your search once they come back up", because in many cases, as with the alert above, we are looking for an absence of data to trigger an alert. When the above search runs and the indexers are down, Splunk returns 0 results and so sends out an alert that the phpdbo01 forwarder is down. I was trying to work with the below, but I am not having a great deal of luck:
index="_internal" source="*splunkd.log" host="pl-wlmsplpp01" "Unable to distribute to peer named" | rex field=_raw "Unable to distribute to peer named (?.*):\d+ at " | eval status=if(indexer=="pl-wlmsplpp04" OR indexer=="pl-wlmsplpp03","Down","Up") | eval search_result = if(status!="Down",[search index="_internal" source="*metrics.log" group=tcpin_connections | eval sourceHost=if(isnull(hostname), sourceHost,hostname) | search sourceHost="phpdbo01"], "Indexer is down. please run search again at a later date") | table search_result
Unfortunately, I have been unable to get the above working. Can anyone help or have you already done something similar that could be adapted?
Try something like this
| rest /services/search/distributed/peers splunk_server=local | where status!="Up" | stats count as IndexerDown | appendcols [search index="_internal" source="*metrics.log" group=tcpin_connections | eval sourceHost=if(isnull(hostname), sourceHost,hostname) | search sourceHost="phpdbo01" | stats count as ForwarderUp] | where IndexerDown>0 OR ForwarderUp>0
So this will give alert only when both IndexerDown=0 (means all indexers are up) and ForwarderUp=0 (means no event from Forwarder)
shouldn't that read | where IndexerDown=0 AND ForwarderUp=0
?