I have a 6 node Search Head Cluster deployment. What are my options to figure out the Captain switch over time?
Here are two options that will work for you
Option 1: Use Splunkd.log and use a search like “SHPoolingMgr - Making node the captain”. Only the Node with this message will be the Captain.
index=_internal (host=SH1 OR host=SH2 OR host=SH3 OR host=SH4 OR host=SH5 OR host=SH6) sourcetype=splunkd "Making node the captain" | table host _raw
Result:
Option 2: You can search metrics.log on a node to check if it has been the Captain. For example:
08-31-2016 18:27:07.094 +0000 INFO Metrics - group=captainstability, stable_follower_pct=0, stable_captain_pct=100, num_polled_captain=155, num_polled_follower=0, num_polled_candidate=0, upgrades_to_captain=0, downgrades_from_captain=0, captain_changes=0
This changes every poll period, so maybe look for the stable_captain_pct > 0 during any time frame shows that was captain during that time.
Search like:
index=_internal (host=SH1 OR host=SH2 OR host=SH3 OR host=SH4 OR host=SH5 OR host=SH6) sourcetype=metrics stable_captain_pct > 0 | timechart count by host
You could also script it using:
/opt/splunk/bin/splunk show shcluster-status
Use the distributed management console (DMC). I have it enabled on our deployer.
In DMC
search -> search head clustering -> status and configuration
Look for the captain election activity (it is a panel) and captain selection details (a panel to the right)
Thanks Burwell,
This is the right solution!
Here are two options that will work for you
Option 1: Use Splunkd.log and use a search like “SHPoolingMgr - Making node the captain”. Only the Node with this message will be the Captain.
index=_internal (host=SH1 OR host=SH2 OR host=SH3 OR host=SH4 OR host=SH5 OR host=SH6) sourcetype=splunkd "Making node the captain" | table host _raw
Result:
Option 2: You can search metrics.log on a node to check if it has been the Captain. For example:
08-31-2016 18:27:07.094 +0000 INFO Metrics - group=captainstability, stable_follower_pct=0, stable_captain_pct=100, num_polled_captain=155, num_polled_follower=0, num_polled_candidate=0, upgrades_to_captain=0, downgrades_from_captain=0, captain_changes=0
This changes every poll period, so maybe look for the stable_captain_pct > 0 during any time frame shows that was captain during that time.
Search like:
index=_internal (host=SH1 OR host=SH2 OR host=SH3 OR host=SH4 OR host=SH5 OR host=SH6) sourcetype=metrics stable_captain_pct > 0 | timechart count by host
Heyo! Stumbled across this thread and thought I'd offer up an easier alternative than using the internal or metrics logging, if you've got privileges to hit the REST endpoints.
You can also use this to output the captain info via SPL:
| rest /services/shcluster/status splunk_server=local
Which outputs the captain information in the captain.* fields. We use `splunk_server=local` to avoid trying to query other SHC / IDX members for captain info (which will throw errors), since we only need information from the SH we're running it on.
See: https://docs.splunk.com/Documentation/Splunk/latest/RESTREF/RESTcluster#shcluster.2Fstatus