I'm trying to confirm that replication and searching can happen on one NIC while ingesting happens over a different NIC.
I have the following simple test setup:
3 indexes in a cluster, each with 2 NICs...
1 forwarder sending to all three indexers
The search-head is connected to the master and in settings > distributed search > search-peers, or on the command line I see all three indexers in the cluster:
splunk list search-server Server at URI "dsplunk-index-test-01.oit.duke.edu:8089" with status as "Up" Server at URI "splunk-index-test-01-private.oit.duke.edu:8089" with status as "Up" Server at URI "splunk-index-test-02-private.oit.duke.edu:8089" with status as "Up" Server at URI "splunk-index-test-03-private.oit.duke.edu:8089" with status as "Up"
But I only see results from one indexer when I search from the web GUI on the search-head, or from its command line.
This is my command line search: splunk search "index=* | chart count by splunk_server"
I'm using the same search in the web GUI, just everything inside the "".
If I run the command-line search on the indexers individually I get results from the specific search-peer.
If I run the command-line search on the master, I get results from all three search-peers.
splunk_server count -------------------- ----- splunk-index-test-01 57 splunk-index-test-02 39 splunk-index-test-03 456
If I run the command-line search from the search-head I get one result.
splunk_server count -------------------- ----- splunk-index-test-01 57
If I had configured the search-head incorrectly to the master, I wouldn't see the search peers in the list search-server command results. Or I wouldn't see any results at all. As it is, it makes no sense that one of the 3 indexers shows and the other two don't. Firewalls are all open to the search-head for both NICs on all 3 indexers. I can telnet to port 8089 from the search-head to both NICs on all 3 boxes.
Here's the snippet from server.conf on the search-head:
[clustering] master_uri = https://splunk-master-test-01.oit.duke.edu:8089 mode = searchhead pass4SymmKey = $1$7/FK0zLe7w3j3t4lkTuxrXaNBB9vpccQ==
And from the master:
[clustering] cluster_label = oit mode = master pass4SymmKey = $1$bYZ2q5Vu//5VNuiwljjQlH9xYhGBKA== replication_factor = 2 search_factor = 1
(pass4SymmKeys have been changed)
show cluster-status shows that everything is up and searchable, all green lights.
How do I get my search-head to believe that it actually should be able to see the other search-peers?
Any indication in splunkd.log that the search head can successfully connect to all search peers?
Also, have you taken a look at the search job inspector, specifically search.log in the UI?
I can telnet to both nics on all 3 boxes over port 8089.
I inspected this job:
This is under normalizedSearch:
litsearch ( index=* ) ( ( splunkserver=splunk-index-test-01* ) ) | fields keepcolorder=t "*" "bkt" "cd" "si" "host" "index" "linecount" "source" "sourcetype" "splunkserver" | remotetl nb=300 et=1507753287.000000 lt=1507839687.000000 remove=true maxcount=1000 max_prefetch=100
It looks from this like the search-head believes there is only one search-peer.
Further, if I specify the splunkserver in the search:
I get "Search filters specified using splunkserver/splunkserver_group do not match any search peer."
Even though that search peer is listed under Distributed search->search peers. If the search-head was unable to see them, they would not show up there.
splunkd.log shows no connection problems.
The search.log shows that it is kind of aware of the other two search-peers:
10-12-2017 16:42:57.557 INFO DistributedSearchResultCollectionManager - Connecting to peer splunk-index-test-02 connectAll 0 connectToSpecificPeer 1
10-12-2017 16:42:57.557 INFO DistributedSearchResultCollectionManager - Connecting to peer splunk-index-test-03 connectAll 0 connectToSpecificPeer 1
10-12-2017 16:42:57.563 INFO DistributedSearchResultCollectionManager - Successfully created search result collector for peer=splunk-index-test-02 in 0.003000 seconds
10-12-2017 16:42:57.565 INFO DistributedSearchResultCollectionManager - Successfully created search result collector for peer=splunk-index-test-03 in 0.003000 seconds
I'm not sure how helpful this is, given that it says they don't exist when specified directly.
Strange. If you haven't already, I would try to remove/re-add the search head to the cluster and see if that helps. I'll dig to see if I can find the REST call to get the list of search peers from the cluster master.
I'm not sure what you mean by removing the search-head from the cluster. It's an index cluster, the search-head is a singleton.
I'd appreciate that REST call.
Not sure if this will work from your search head, but worth a try:
|rest /services/cluster/master/peers count=0 splunk_server=splunk-master-test-01
I did just find "Not connecting to peer 'splunk-index-test-02' because it has been optimized out. Peername and none of it's search groups  match the query."
Searching docs.splunk.com for either of these phrases gets me nothing. I do wish that the error log wording showed up in the documentation.
I also searched for DistributedSearchResultCollectionManager.
Do you by any chance still have a distsearch.conf file on your search head?
Re-reading your original question, it almost sounds like you have connected it to the cluster master AND configured search peers in the distributed search setup....?
Maybe I am misreading...
This is just a wild hair idea, but if you are replicating data between three indexers and all the data got replicated onto one indexer, then connecting to the other two wouldn't really be necessary, would it?