I have a deployment with SHC and IDC (3 indexers), when I took down an indexer, all SHs would only see 2 search peers. I know that I could check from indexer cluster master that there are 2 running indexers while the other is down. Is there an easy way to get the same information (mostly GUID, hostname and URI) from SHs?
Hi @zhipengy [Splunk],
in DMC (Splunk Monitoring Console) there's an alert to check if all the peers are up (DMC Alert - Search Peer Not Responding), you can configure an alert on your Monitoring Console or run this alert on a Search Head.
For more infos see at https://docs.splunk.com/Documentation/Splunk/8.0.3/DMC/Platformalerts .
Remember that you cannot enable Monitoring Console on a production Search Head because the full App is very heavy for the system and requires a dedicates Search Head.
Ciao.
Giuseppe
Hi @zhipengy [Splunk],
in DMC (Splunk Monitoring Console) there's an alert to check if all the peers are up (DMC Alert - Search Peer Not Responding), you can configure an alert on your Monitoring Console or run this alert on a Search Head.
For more infos see at https://docs.splunk.com/Documentation/Splunk/8.0.3/DMC/Platformalerts .
Remember that you cannot enable Monitoring Console on a production Search Head because the full App is very heavy for the system and requires a dedicates Search Head.
Ciao.
Giuseppe
@gcusello Thanks for your quick response! Actually we need to get the information on a production SH, meanwhile, we are looking for a way to get the information via REST APIs or configs, so that these information would be the input of our app. The only way I found that could be run on SHs is:
|rest services/search/distributed/peers
But it would only provide me with the running indexers.
Hi @zhipengy [Splunk],
did you explored the choice to use the _internal index? something like this:
| metasearch index=_internal (host=sh1 OR host=sh2 OR host=sh3)
| stats count BY host
| append [ | makeresults | eval count=0, host=SH1 | fields host count ]
| append [ | makeresults | eval count=0, host=SH2 | fields host count ]
| append [ | makeresults | eval count=0, host=SH3 | fields host count ]
| stats sum(count) AS total BY host
| where total=0
Ciao.
Giuseppe
@gcuitsec Yes, I did think about "_internal" index, the problem is that, that's historic data right? E.g., there were 3 running indexers, then we would see events from all of them, so there would be 3 values (of course there will be more if includes SHs). Then one of the indexer is down, from SHs, I'd like to know it's really down or removed from indexer cluster. Seems it's impossible to get the information from _internal index? As whether the indexer is down, or actually removed from the cluster, we will not see any events from it been indexed to "_internal" index.
Hi @zhipengy [Splunk],
about the system to monitor, if they are more than three you can put them in a lookup and use the same approach:
| metasearch index=_internal [ | inputlookup systems.csv | eval host=lower(host) | fields host ]
|host=lower(host)
| stats count BY host
| append [ | inputlookup systems.csv | eval count=0, host=lower(host) | fields host count ]
| stats sum(count) AS total BY host
| where total=0
In this way, you're continously monitoring your infrastructure (you could run your alert every five minutes) and receive an alert every time there's a problem.
If a peer is or an SH is removed by the cluster, you have only to update the lookup, but I don't think that's an everyday job and, anyway, you have to maintain under control your infrastructure.
I usually use this approach in my infrastructure, to monitor al the systems and control if all the central (indexers, Master Node, SHs) and peripheral systems (HFs and UFs) are up and running.
I'm not interested to hystoric data.
Ciao.
Giuseppe
@gcusello That makes sense! Let me talk with my team about your suggestion. Thank you so much!