I have an index cluster where one peer is down (as in we're talking to Cisco). When I run a Splunk list search-server on one of our search-heads, sometimes it shows the IP of the down peer as Down, and other times it just doesn't list the downed peer at all.
I have an alert set up which monitors the results of the command that is sent to my SOC so they know when there is a connection problem, so I'm getting a lot of tickets. Why does the search-head see the peer sometimes and not others? How do I stop it? The box is entirely unreachable.
The setup is standard, where the search-head connects to the cluster master, and the indexer is still listed on the master because it hasn't been long enough for it to age out.
During each search the list of search peers will be involved in the search action. Since you are aware of one peer is down you can remove from your indexer cluster till the time server backup.
Follow the below steps to remove the peer, (Read the notes before perform any, make sure master is in maintenance mode )
Best practice to take the search peer offline,
Hope this will helps you....
That's not exactly what I'm asking, although it is very helpful.
The behaviour is is: I have 4 peers, one of which is down. When I do a splunk list search-server sometimes I see 3 peers, all up, and sometimes I see 4, one of which is marked Down.
This throws off the monitoring system, as it appears to be flapping.
It wouldn't show up in there, as that only has static search-servers. This is an index cluster member, which means that the search-servers are added dynamically by the cluster master. The cluster master, in turn, doesn't poll for indexers, but receives indexers as they say they are available.
One possible scenario to explain your problem :
- specifications or configuration is different on that indexer -> he has more work to do OR it is less powerfull than others
- at some times, you are creating more search load that the indexer can handle -> load spike on this indexer
- this indexer is now slow to respond to sh -> the search head thinks it's down
So I would :
- investigate why that indexer is different than the others
- check search concurrency limit on sh and if needed lower it (lower shedule perc may be enough)
- spread your search load over time (schedule window, continous scheduling, skew, ..)