The problem:
I added new nodes to an index cluster by mistake. After removing them, I found that searches from the search head return zero results. DMC says everything is healthy, indexes are online, and have buckets. Searching an individual indexer returns data as expected. I noticed that the "| metadata" command and data summary of the search app see event activity.
No errors are in splunkd.log on the search head or peers. However, when I looked at the search.log, I found this which seems very relevant:
3-12-2015 02:33:17.613 INFO DistributedSearchResultCollectionManager - Default search group:dmc_group_indexer
03-12-2015 02:33:17.613 INFO DistributedSearchResultCollectionManager - Connecting to peer ip-172-31-12-201 connectAll 0 connectToSpecificPeer 1
03-12-2015 02:33:17.613 INFO DistributedSearchResultCollectionManager - Not connecting to peer 'ip-172-31-5-228' because it has been optimized out. Groups
03-12-2015 02:33:17.613 INFO DistributedSearchResultCollectionManager - Not connecting to peer 'ip-172-31-5-229' because it has been optimized out. Groups
03-12-2015 02:33:17.613 INFO DistributedSearchResultCollectionManager - Not connecting to peer 'ip-172-31-5-230' because it has been optimized out. Groups
03-12-2015 02:33:17.613 INFO DistributedSearchResultCollectionManager - Not connecting to peer 'ip-172-31-5-231' because it has been optimized out. Groups
My environment:
Contents of distsearch.conf on the SH:
[distributedSearch]
[distributedSearch:dmc_group_search_head]
servers = localhost:localhost
[distributedSearch:dmc_group_cluster_master]
[distributedSearch:dmc_group_license_master]
servers = localhost:localhost
[distributedSearch:dmc_group_indexer]
default = true
servers = 172.31.5.228:8089,172.31.5.229:8089,172.31.5.230:8089,172.31.5.231:8089,localhost:localhost
[distributedSearch:dmc_group_deployment_server]
[distributedSearch:dmc_group_kv_store]
The search groups present in your distsearch.conf file have been added during the configuration of the Distributed Management Console in distributed mode. These search groups are created to optimize the searches dispatched by the views of the DMC, and given that they can modify the default behavior of your searches in other apps we do not recommend (and as a matter fact, do not support) that you set up the Distributed Managemen.... In your deployment, the DMC should be setup on the Cluster Master.
That being said, the issue you are observing is probably caused by a known product defect (SPL-95114) where instability in an indexing cluster can lead to distributed search groups losing the internal reference to their members. When this happens, the default target group "dmc_group_indexer" acts as if it had no members even though those are listed in distsearch.conf, and as a result your searches will not be dispatched to any remote peers unless you specify a splunk_server=*
or splunk_server_group=*
clause.
Suggested corrective actions:
Do not setup the Distributed Management Console in distributed mode on your production search-head. Only do so on the Cluster Master (in an indexer clustering environment) or on a search-head dedicated to this function and to which only admins have access.
On your production search-head, reset the Distributed Management Console to factory defaults with this procedure:
$SPLUNK_HOME/etc/apps/splunk_management_console/local
directory.$SPLUNK_HOME/etc/apps/splunk_management_console/lookups
directory.$SPLUNK_HOME/etc/system/local/distsearch.conf
delete any stanzas that reference distributed search groups created by the Distributed Management Console. These will be named dmc_*
.The search groups present in your distsearch.conf file have been added during the configuration of the Distributed Management Console in distributed mode. These search groups are created to optimize the searches dispatched by the views of the DMC, and given that they can modify the default behavior of your searches in other apps we do not recommend (and as a matter fact, do not support) that you set up the Distributed Managemen.... In your deployment, the DMC should be setup on the Cluster Master.
That being said, the issue you are observing is probably caused by a known product defect (SPL-95114) where instability in an indexing cluster can lead to distributed search groups losing the internal reference to their members. When this happens, the default target group "dmc_group_indexer" acts as if it had no members even though those are listed in distsearch.conf, and as a result your searches will not be dispatched to any remote peers unless you specify a splunk_server=*
or splunk_server_group=*
clause.
Suggested corrective actions:
Do not setup the Distributed Management Console in distributed mode on your production search-head. Only do so on the Cluster Master (in an indexer clustering environment) or on a search-head dedicated to this function and to which only admins have access.
On your production search-head, reset the Distributed Management Console to factory defaults with this procedure:
$SPLUNK_HOME/etc/apps/splunk_management_console/local
directory.$SPLUNK_HOME/etc/apps/splunk_management_console/lookups
directory.$SPLUNK_HOME/etc/system/local/distsearch.conf
delete any stanzas that reference distributed search groups created by the Distributed Management Console. These will be named dmc_*
.