Deployment Architecture

How to find a stopped indexer from search head with SHC and IDC deployment

Splunk Employee
Splunk Employee

I have a deployment with SHC and IDC (3 indexers), when I took down an indexer, all SHs would only see 2 search peers. I know that I could check from indexer cluster master that there are 2 running indexers while the other is down. Is there an easy way to get the same information (mostly GUID, hostname and URI) from SHs?

0 Karma
1 Solution

Legend

Hi @zhipengy [Splunk],
in DMC (Splunk Monitoring Console) there's an alert to check if all the peers are up (DMC Alert - Search Peer Not Responding), you can configure an alert on your Monitoring Console or run this alert on a Search Head.
For more infos see at https://docs.splunk.com/Documentation/Splunk/8.0.3/DMC/Platformalerts .
Remember that you cannot enable Monitoring Console on a production Search Head because the full App is very heavy for the system and requires a dedicates Search Head.

Ciao.
Giuseppe

View solution in original post

0 Karma

Legend

Hi @zhipengy [Splunk],
in DMC (Splunk Monitoring Console) there's an alert to check if all the peers are up (DMC Alert - Search Peer Not Responding), you can configure an alert on your Monitoring Console or run this alert on a Search Head.
For more infos see at https://docs.splunk.com/Documentation/Splunk/8.0.3/DMC/Platformalerts .
Remember that you cannot enable Monitoring Console on a production Search Head because the full App is very heavy for the system and requires a dedicates Search Head.

Ciao.
Giuseppe

View solution in original post

0 Karma

Splunk Employee
Splunk Employee

@gcusello Thanks for your quick response! Actually we need to get the information on a production SH, meanwhile, we are looking for a way to get the information via REST APIs or configs, so that these information would be the input of our app. The only way I found that could be run on SHs is:
|rest services/search/distributed/peers
But it would only provide me with the running indexers.

0 Karma

Legend

Hi @zhipengy [Splunk],
did you explored the choice to use the _internal index? something like this:

| metasearch index=_internal (host=sh1 OR host=sh2 OR host=sh3)
| stats count BY host
| append [ | makeresults | eval count=0, host=SH1 | fields host count ]
| append [ | makeresults | eval count=0, host=SH2 | fields host count ]
| append [ | makeresults | eval count=0, host=SH3 | fields host count ]
| stats sum(count) AS total BY host
| where total=0

Ciao.
Giuseppe

0 Karma

Splunk Employee
Splunk Employee

@gcuitsec Yes, I did think about "internal" index, the problem is that, that's historic data right? E.g., there were 3 running indexers, then we would see events from all of them, so there would be 3 values (of course there will be more if includes SHs). Then one of the indexer is down, from SHs, I'd like to know it's really down or removed from indexer cluster. Seems it's impossible to get the information from _internal index? As whether the indexer is down, or actually removed from the cluster, we will not see any events from it been indexed to "internal" index.

0 Karma

Legend

Hi @zhipengy [Splunk],
about the system to monitor, if they are more than three you can put them in a lookup and use the same approach:

 | metasearch index=_internal [ | inputlookup systems.csv | eval host=lower(host) | fields host ]
 |host=lower(host)
 | stats count BY host
 | append [ | inputlookup systems.csv | eval count=0, host=lower(host) | fields host count ]
 | stats sum(count) AS total BY host
 | where total=0

In this way, you're continously monitoring your infrastructure (you could run your alert every five minutes) and receive an alert every time there's a problem.
If a peer is or an SH is removed by the cluster, you have only to update the lookup, but I don't think that's an everyday job and, anyway, you have to maintain under control your infrastructure.
I usually use this approach in my infrastructure, to monitor al the systems and control if all the central (indexers, Master Node, SHs) and peripheral systems (HFs and UFs) are up and running.
I'm not interested to hystoric data.

Ciao.
Giuseppe

0 Karma

Splunk Employee
Splunk Employee

@gcusello That makes sense! Let me talk with my team about your suggestion. Thank you so much!

0 Karma