Deployment Architecture

Why am I only seeing results from one search-peer?

I'm trying to confirm that replication and searching can happen on one NIC while ingesting happens over a different NIC.
I have the following simple test setup:

3 indexes in a cluster, each with 2 NICs...
1 master
1 search-head
1 forwarder sending to all three indexers

The search-head is connected to the master and in settings > distributed search > search-peers, or on the command line I see all three indexers in the cluster:

splunk list search-server
Server at URI "dsplunk-index-test-01.oit.duke.edu:8089" with status as "Up"
Server at URI "splunk-index-test-01-private.oit.duke.edu:8089" with status as "Up"
Server at URI "splunk-index-test-02-private.oit.duke.edu:8089" with status as "Up"
Server at URI "splunk-index-test-03-private.oit.duke.edu:8089" with status as "Up"

But I only see results from one indexer when I search from the web GUI on the search-head, or from its command line.

This is my command line search: splunk search "index=* | chart count by splunk_server"
I'm using the same search in the web GUI, just everything inside the "".

If I run the command-line search on the indexers individually I get results from the specific search-peer.

If I run the command-line search on the master, I get results from all three search-peers.

   splunk_server     count      
-------------------- -----                                                                   
splunk-index-test-01   57
splunk-index-test-02   39
splunk-index-test-03   456          

If I run the command-line search from the search-head I get one result.

      splunk_server     count      
    -------------------- -----                                                                   
    splunk-index-test-01   57

If I had configured the search-head incorrectly to the master, I wouldn't see the search peers in the list search-server command results. Or I wouldn't see any results at all. As it is, it makes no sense that one of the 3 indexers shows and the other two don't. Firewalls are all open to the search-head for both NICs on all 3 indexers. I can telnet to port 8089 from the search-head to both NICs on all 3 boxes.

Here's the snippet from server.conf on the search-head:

[clustering]
master_uri = https://splunk-master-test-01.oit.duke.edu:8089
mode = searchhead
pass4SymmKey = $1$7/FK0zLe7w3j3t4lkTuxrXaNBB9vpccQ==

And from the master:

    [clustering]            
    cluster_label = oit                                        
    mode = master       
    pass4SymmKey = $1$bYZ2q5Vu//5VNuiwljjQlH9xYhGBKA==
    replication_factor = 2         
    search_factor = 1   

(pass4SymmKeys have been changed)

show cluster-status shows that everything is up and searchable, all green lights.
How do I get my search-head to believe that it actually should be able to see the other search-peers?

0 Karma

10-13-2017 14:04:14.654 INFO  dispatchRunner - initing LicenseMgr in search process: nonPro=0
10-13-2017 14:04:14.655 INFO  dispatchRunner - registering build time modules, count=1
10-13-2017 14:04:14.655 INFO  dispatchRunner - registering search time components of build time module name=vix
10-13-2017 14:04:14.655 INFO  dispatchRunner - Splunkd starting (build aa7d4b1ccb80).
10-13-2017 14:04:14.655 INFO  dispatchRunner - System info: Linux, splunk-search-head-test-01, 3.10.0-693.2.2.el7.x86_64, #1 SMP Sat Sep 9 03:55:24 EDT 2017, x86_64.
10-13-2017 14:04:14.656 INFO  dispatchRunner - Detected 1 (virtual) CPUs, 1 CPU cores, and 975MB RAM
10-13-2017 14:04:14.656 INFO  dispatchRunner - Maximum number of threads (approximate): 487
10-13-2017 14:04:14.656 INFO  dispatchRunner - Arguments are: "search" "--id=1507917854.41" "--maxbuckets=0" "--ttl=600" "--maxout=500000" "--maxtime=8640000" "--lookups=1" "--reduce_freq=10" "--user=bryn" "--pro" "--roles=admin:user"
10-13-2017 14:04:14.656 INFO  dispatchRunner - Getting search configuration data from: /opt/splunk/etc/modules/parsing/config.xml
10-13-2017 14:04:14.662 INFO  BundlesSetup - Setup stats for /opt/splunk/etc: wallclock_elapsed_msec=24, cpu_time_used=0.021666, shared_services_generation=2, shared_services_population=1
10-13-2017 14:04:14.665 WARN  AuthorizationManager - Capability 'delete_by_keyword' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.665 WARN  AuthorizationManager - Capability 'edit_view_html' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.665 WARN  AuthorizationManager - Capability 'list_httpauths' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.665 WARN  AuthorizationManager - Capability 'rtsearch' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.666 WARN  AuthorizationManager - Capability 'delete_by_keyword' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.666 WARN  AuthorizationManager - Capability 'edit_view_html' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.666 WARN  AuthorizationManager - Capability 'list_httpauths' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.666 WARN  AuthorizationManager - Capability 'rtsearch' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.666 WARN  AuthorizationManager - Capability 'delete_by_keyword' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.666 WARN  AuthorizationManager - Capability 'edit_view_html' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.666 WARN  AuthorizationManager - Capability 'rtsearch' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.666 WARN  AuthorizationManager - Capability 'schedule_search' had value 'disable' - only 'enabled' is valid. Ignoring...
10-13-2017 14:04:14.667 INFO  UserManagerPro - Load authentication: forcing roles="admin, user"
10-13-2017 14:04:14.671 INFO  SessionManager - auth tokens will be generated with shpooling shared secret
10-13-2017 14:04:14.671 INFO  UserManager - Setting user context: splunk-system-user
10-13-2017 14:04:14.671 INFO  UserManager - Done setting user context: NULL -> splunk-system-user
10-13-2017 14:04:14.672 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
10-13-2017 14:04:14.672 INFO  UserManager - Setting user context: bryn
10-13-2017 14:04:14.672 INFO  UserManager - Done setting user context: NULL -> bryn
10-13-2017 14:04:14.678 INFO  dispatchRunner - search context: user="bryn", app="search", bs-pathname="/opt/splunk/etc"
10-13-2017 14:04:14.685 INFO  SearchParser - PARSING: search index=*\n| chart count by splunk_server
10-13-2017 14:04:14.689 INFO  ISplunkDispatch - Not running in splunkd. Bundle replication not triggered.
10-13-2017 14:04:14.700 INFO  UserManager - Setting user context: bryn
10-13-2017 14:04:14.700 INFO  UserManager - Done setting user context: NULL -> bryn
10-13-2017 14:04:14.725 INFO  SearchProcessor - Building search filter
10-13-2017 14:04:14.725 INFO  SearchProcessor - Final search filter= ( ( splunk_server=splunk-index-test-01* )  ) 
10-13-2017 14:04:14.733 INFO  SearchOperator:kv - name=EXTRACT-GUID, can_use_re2=0, regex: (?i)(?!=\w)(?:objectguid|guid)\s*=\s*(?<guid_lookup>[\w\-]+)
10-13-2017 14:04:14.733 INFO  SearchOperator:kv - name=EXTRACT-SID, can_use_re2=0, regex: objectSid\s*=\s*(?<sid_lookup>\S+)
10-13-2017 14:04:14.735 INFO  SearchOperator:kv - name=ad-kv, can_use_re2=0, regex: (?<_KEY_1>[\w-]+)=(?<_VAL_1>[^\r\n]*)
10-13-2017 14:04:14.737 INFO  SearchOperator:kv - name=access-extractions, can_use_re2=0, regex: ^(?P<clientip>\S+)\s++(?P<ident>\S+)\s++(?P<user>\S+)\s++\[(?<req_time>[^\]]*+)\]\s++"\s*+(?P<method>[^\s"]++)?(?:\s++(?<uri>(?:(?<uri_domain>\w++://[^/\s"]++))?+(?<uri_path>(?:/++(?<root>(?:\\"|[^\s\?/"])++)/++)?(?:(?:\\"|[^\s\?/"])*+/++)*(?<file>[^\s\?/]+)?)(?:\?(?<uri_query>[^\s]*))?)(?:\s++(?P<version>[^\s"]++))*)?\s*+"\s++(?P<status>\S+)\s++(?P<bytes>\S+)(?:\s++"(?<referer>(?:(?<referer_domain>\w++://[^/\s"]++))?+[^"]*+)"(?:\s++"(?<useragent>[^"]*+)"(?:\s++"(?<cookie>[^"]*+)")?+)?+)?(?P<other>.*)
10-13-2017 14:04:14.738 INFO  SearchOperator:kv - name=syslog-extractions, can_use_re2=0, regex: \s([^\s\[]+)(?:\[(\d+)\])?:\s
10-13-2017 14:04:14.739 INFO  SearchOperator:kv - name=db2, can_use_re2=0, regex: ([A-Z]+) *: (.*?)(?=\n|$| +[A-Z]+ *:)
10-13-2017 14:04:14.739 INFO  SearchOperator:kv - name=EXTRACT-extract_spent, can_use_re2=0, regex: (?<spent>\d+)ms$
10-13-2017 14:04:14.740 INFO  SearchOperator:kv - name=EXTRACT-1, can_use_re2=0, regex: (?<_KEY_1>\S+)::(?<_VAL_1>\S+)
10-13-2017 14:04:14.742 INFO  SearchOperator:kv - name=bracket-space, can_use_re2=0, regex: \[(\S+) (.*?)\]
10-13-2017 14:04:14.744 INFO  SearchOperator:kv - name=sendmail-extractions, can_use_re2=0, regex: sendmail\[(\d+)\]: (\w+):
10-13-2017 14:04:14.744 INFO  SearchOperator:kv - name=tcpdump-endpoints, can_use_re2=0, regex: (\d+\.\d+\.\d+\.\d+):(\d+) -> (\d+\.\d+\.\d+\.\d+):(\d+)
10-13-2017 14:04:14.744 INFO  SearchOperator:kv - name=colon-kv, can_use_re2=0, regex: (?<= )([A-Za-z]+): ?((0x[A-F\d]+)|\d+)(?= |\n|$)
10-13-2017 14:04:14.752 INFO  SearchOperator:kv - name=EXTRACT-severity,logger, can_use_re2=0, regex: .*?(?<severity>[A-Z]+) ((?<logger>[^\s]+) \-)*
10-13-2017 14:04:14.753 INFO  SearchOperator:kv - name=EXTRACT-collection,category,object, can_use_re2=0, regex: collection=\"?(?P<collection>[^\"\n]+)\"?\ncategory=\"?(?P<category>[^\"\n]+)\"?\nobject=\"?(?P<object>[^\"\n]+)\"?\n
10-13-2017 14:04:14.754 INFO  SearchOperator:kv - name=wel-message, can_use_re2=0, regex: (?sm)^(?<_pre_msg>.+)\nMessage=(?<Message>.+)$
10-13-2017 14:04:14.754 INFO  SearchOperator:kv - name=wel-col-kv, can_use_re2=0, regex: \n([^:\n\r]+):[ \t]++([^\n]*)
10-13-2017 14:04:14.755 INFO  SearchOperator:kv - name=EXTRACT-useragent, can_use_re2=0, regex: userAgent=(?P<browser>[^ (]+)
10-13-2017 14:04:14.755 INFO  SearchOperator:kv - name=splunk-service-extractions, can_use_re2=0, regex: (?i)^(?:[^ ]* ){2}(?P<log_level>[^\s]*)\s+\[(?P<requestid>\w+)]\s+(?P<component>[^ ]+):(?P<line>\d+) - (?P<message>.+)
10-13-2017 14:04:14.755 INFO  SearchOperator:kv - name=EXTRACT-fields, can_use_re2=0, regex: (?i)^(?:[^ ]* ){2}(?:[+\-]\d+ )?(?P<log_level>[^ ]*)\s+(?P<component>[^ ]+) - (?P<message>.+)
10-13-2017 14:04:14.755 INFO  SearchOperator:kv - name=extract_spent, can_use_re2=0, regex: (?P<spent>\d+)ms$
10-13-2017 14:04:14.756 INFO  SearchOperator:kv - name=weblogic-code, can_use_re2=0, regex: <BEA-([0-9]+)>
10-13-2017 14:04:14.756 INFO  SearchOperator:kv - name=colon-line, can_use_re2=0, regex: ^(\w+)\s*:[ \t]*(.*?)$
10-13-2017 14:04:14.756 INFO  SearchOperator:kv - name=was-trlog-code, can_use_re2=0, regex: ] ([a-fA-F0-9]{8})
10-13-2017 14:04:14.757 INFO  UnifiedSearch - base lispy: [ AND index::* splunk_server::splunk-index-test-01* ]
10-13-2017 14:04:14.758 INFO  UnifiedSearch - Processed search targeting arguments
10-13-2017 14:04:14.758 INFO  SortOperator - maxmem = 209715200
10-13-2017 14:04:14.758 INFO  SortOperator - maxmem = 209715200
10-13-2017 14:04:14.758 INFO  SearchParser - PARSING: prestats count by splunk_server
10-13-2017 14:04:14.758 INFO  SearchParser - PARSING: addinfo type=count label=prereport_events
10-13-2017 14:04:14.758 INFO  DispatchThread - BatchMode: allowBatchMode: 1, conf(1): 1, timeline/Status buckets(0):0, realtime(0):0, report pipe empty(0):0, reqTimeOrder(0):0, summarize(0):0, statefulStreaming(0):0
10-13-2017 14:04:14.758 INFO  DispatchThread - required fields list to add to remote search = prestats_reserved_*,psrsvd_*,splunk_server
10-13-2017 14:04:14.758 INFO  SearchParser - PARSING: fields keepcolorder=t "prestats_reserved_*" "psrsvd_*" "splunk_server"
10-13-2017 14:04:14.763 INFO  DispatchThread - Did not find a usable summary_id, setting info._summary_mode=none, not modifying input summary_id=49CAB615-276A-428B-972B-FC67E89AEB46_search_bryn_96102898428831f8
10-13-2017 14:04:14.765 INFO  DispatchThread - Did not find a usable summary_id, setting info._summary_mode=none, not modifying input summary_id=49CAB615-276A-428B-972B-FC67E89AEB46_search_bryn_NScf8163cdac44f862
10-13-2017 14:04:14.766 INFO  DispatchThread - Allow retry on peer failure
10-13-2017 14:04:14.766 INFO  UserManager - Setting user context: bryn
10-13-2017 14:04:14.766 INFO  UserManager - Done setting user context: bryn -> bryn
10-13-2017 14:04:14.766 INFO  UserManager - Unwound user context: bryn -> bryn
10-13-2017 14:04:14.766 INFO  DistributedSearchResultCollectionManager - Stream search: litsearch ( index=* ) ( ( splunk_server=splunk-index-test-01* ) ) | addinfo  type=count label=prereport_events | fields  keepcolorder=t "prestats_reserved_*" "psrsvd_*" "splunk_server" | prestats  count by splunk_server
10-13-2017 14:04:14.766 INFO  ExternalResultProvider - No external result providers are configured
10-13-2017 14:04:14.766 INFO  DistributedSearchResultCollectionManager - Default search group:*
10-13-2017 14:04:14.766 INFO  DistributedSearchResultCollectionManager - Connecting to peer splunk-index-test-01 connectAll 0 connectToSpecificPeer 1
10-13-2017 14:04:14.766 INFO  DistributedSearchResultCollectionManager - Connecting to peer splunk-index-test-02 connectAll 0 connectToSpecificPeer 1
10-13-2017 14:04:14.766 INFO  DistributedSearchResultCollectionManager - Connecting to peer splunk-index-test-03 connectAll 0 connectToSpecificPeer 1
10-13-2017 14:04:14.766 INFO  DistributedSearchResultCollectionManager - Connecting to peer splunk-search-head-test-01 connectAll 0 connectToSpecificPeer 1
10-13-2017 14:04:14.766 INFO  ServerConfig - Using REMOTE_SERVER_NAME=splunk-search-head-test-01
10-13-2017 14:04:14.767 INFO  KeyManagerLocalhost - Checking for localhost key pair
10-13-2017 14:04:14.767 INFO  KeyManagerLocalhost - Public key already exists: /opt/splunk/etc/auth/distServerKeys/trusted.pem
10-13-2017 14:04:14.767 INFO  KeyManagerLocalhost - Reading public key for localhost: /opt/splunk/etc/auth/distServerKeys/trusted.pem
10-13-2017 14:04:14.767 INFO  KeyManagerLocalhost - Finished reading public key for localhost: /opt/splunk/etc/auth/distServerKeys/trusted.pem
10-13-2017 14:04:14.767 INFO  KeyManagerLocalhost - Reading private key for localhost: /opt/splunk/etc/auth/distServerKeys/private.pem
10-13-2017 14:04:14.767 INFO  KeyManagerLocalhost - Finished reading private key for localhost: /opt/splunk/etc/auth/distServerKeys/private.pem
10-13-2017 14:04:14.768 INFO  DistributedSearchResultCollectionManager - Successfully created search result collector for peer=splunk-index-test-01 in 0.003000 seconds
10-13-2017 14:04:14.770 INFO  DistributedSearchResultCollectionManager - Successfully created search result collector for peer=splunk-index-test-02 in 0.002000 seconds
10-13-2017 14:04:14.772 INFO  DistributedSearchResultCollectionManager - Successfully created search result collector for peer=splunk-index-test-03 in 0.002000 seconds
10-13-2017 14:04:14.772 INFO  DispatchThread - Disk quota = 10485760000
10-13-2017 14:04:14.772 INFO  UserManager - Setting user context: bryn
10-13-2017 14:04:14.772 INFO  UserManager - Done setting user context: NULL -> bryn
10-13-2017 14:04:14.772 INFO  SearchParser - PARSING: litsearch ( index=* ) ( ( splunk_server=splunk-index-test-01* ) ) | addinfo  type=count label=prereport_events | fields  keepcolorder=t "prestats_reserved_*" "psrsvd_*" "splunk_server" | prestats  count by splunk_server
10-13-2017 14:04:14.784 INFO  UserManager - Setting user context: bryn
10-13-2017 14:04:14.784 INFO  UserManager - Done setting user context: NULL -> bryn
10-13-2017 14:04:14.785 INFO  UserManager - Setting user context: bryn
10-13-2017 14:04:14.785 INFO  UserManager - Done setting user context: NULL -> bryn
10-13-2017 14:04:14.785 INFO  UserManager - Setting user context: bryn
10-13-2017 14:04:14.785 INFO  UserManager - Done setting user context: NULL -> bryn
10-13-2017 14:04:14.793 INFO  UserManager - Setting user context: bryn
10-13-2017 14:04:14.793 INFO  UserManager - Done setting user context: NULL -> bryn
10-13-2017 14:04:14.793 INFO  UserManager - Setting user context: bryn
10-13-2017 14:04:14.793 INFO  UserManager - Done setting user context: NULL -> bryn
10-13-2017 14:04:14.797 INFO  SearchParser - PARSING: typer | tags
10-13-2017 14:04:14.798 INFO  FastTyper - found nodes count: comparisons=6, unique_comparisons=5, terms=4, unique_terms=4, phrases=12, unique_phrases=12, total leaves=22
10-13-2017 14:04:14.801 INFO  UnifiedSearch - Processed search targeting arguments
10-13-2017 14:04:14.801 INFO  LocalCollector - Final required fields list = prestats_reserved_*,psrsvd_*,splunk_server
10-13-2017 14:04:14.801 INFO  UserManager - Unwound user context: bryn -> NULL
10-13-2017 14:04:14.801 INFO  UserManager - Setting user context: bryn
10-13-2017 14:04:14.801 INFO  UserManager - Done setting user context: NULL -> bryn
10-13-2017 14:04:14.801 INFO  UserManager - Unwound user context: bryn -> NULL
10-13-2017 14:04:14.801 WARN  RetryManager - Peer: splunk-search-head-test-01 not found in offset map.
10-13-2017 14:04:15.108 INFO  UserManager - Unwound user context: bryn -> NULL
10-13-2017 14:04:15.109 INFO  UserManager - Unwound user context: bryn -> NULL
10-13-2017 14:04:15.109 INFO  UserManager - Unwound user context: bryn -> NULL
10-13-2017 14:04:15.109 INFO  UserManager - Unwound user context: bryn -> NULL
10-13-2017 14:04:15.109 INFO  UserManager - Unwound user context: bryn -> NULL
10-13-2017 14:04:15.125 INFO  UserManager - Unwound user context: bryn -> NULL
10-13-2017 14:04:15.128 INFO  UserManager - Setting user context: bryn
10-13-2017 14:04:15.128 INFO  UserManager - Done setting user context: NULL -> bryn
10-13-2017 14:04:15.128 INFO  UserManager - Unwound user context: bryn -> NULL
10-13-2017 14:04:15.133 INFO  DispatchThread - Downloading all remote search.log files took 0.005 seconds
10-13-2017 14:04:15.135 INFO  DispatchManager - DispatchManager::dispatchHasFinished(id='1507917854.41', username='bryn')
10-13-2017 14:04:15.136 INFO  UserManager - Unwound user context: bryn -> NULL
10-13-2017 14:04:15.136 INFO  ShutdownHandler - Shutting down splunkd
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Begin"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_JustBeforeKVStore"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_KVStore"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Thruput"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_TcpInput1"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_TcpOutput"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_UdpInput"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_FifoInput"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_WinEventLogInput"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_HttpInput"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Scheduler"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Tailing"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_SyslogOutput"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_HTTPOutput"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_TailingXP"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_PeerManager"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_ArchiveAndOneshot"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_AuditTrailManager"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_AuditTrailQueueServiceThread"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_FSChangeMonitor"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_FSChangeManagerProcessor"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_HttpClientPollingThread"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_AsyncQueuedMessageDispatcherThread"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_OfflineFlusher"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Slave"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_SlaveSearch"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Captain"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Select"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_IdataDO_Collector"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_TcpOutput2"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_IndexerService"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Database1"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_LastIndexerLevel"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_TcpInput2"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_LoadLDAPUsers"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_MetricsManager"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Pipeline"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Queue"
10-13-2017 14:04:15.136 INFO  ShutdownHandler - shutting down level "ShutdownLevel_Exec"
10-13-2017 14:04:15.137 INFO  ShutdownHandler - shutting down level "ShutdownLevel_CallbackRunner"
10-13-2017 14:04:15.137 INFO  ShutdownHandler - shutting down level "ShutdownLevel_HttpClient"
10-13-2017 14:04:15.137 INFO  ShutdownHandler - Shutdown complete in 972 microseconds
0 Karma

Splunk Employee
Splunk Employee

Looks like your search head connects to all three peers:

10-13-2017 14:04:14.768 INFO DistributedSearchResultCollectionManager - Successfully created search result collector for peer=splunk-index-test-01 in 0.003000 seconds
10-13-2017 14:04:14.770 INFO
DistributedSearchResultCollectionManager
- Successfully created search result collector for
peer=splunk-index-test-02 in 0.002000
seconds 10-13-2017 14:04:14.772 INFO
DistributedSearchResultCollectionManager
- Successfully created search result collector for
peer=splunk-index-test-03 in 0.002000
seconds

Your job inspector shows that two of the three peers did not contribute data to the search result set.
There were only 8 buckets searched, how much data do you have at rest and what was your search timeframe? Most importantly: Which indexes are you searching by default (index=*)?
It is entirely possible that all primary buckets for your search resided on a single peer.

0 Karma

That is not the case, when I search individually on each indexer on the command line I get results. When I search from the command line on the master, I get results for each indexer.

0 Karma

Splunk Employee
Splunk Employee

your config looks good to me
try index=_* | chart count by splunk_server . Does it also return results on one search head ?

also check you don't have a search group configured on the SH with only one indexer (do a btool on distsearch.conf for checking this) ?
if so, just remove it

from the config, you are not using any multisite mode (then you could have afinity between sh and one indexer) -> so unlikely this is the cause

as you have multiple nic, your indexer may report a unreachable search ip to the CM
On each indexer, I would force the search ip in server.conf (look for register_search_address = IP address in server.conf)
I would start by setting this.

0 Karma

I rebuilt the VM with the same config, and it has the same exact symptoms. Of course this means that it was disconnected from the master/cluster and re-attached.

/opt/splunk/bin/splunk list search-server                                                                   
Server at URI "splunk-index-test-01-private.oit.duke.edu:8089" with status as "Up"
Server at URI "splunk-index-test-02-private.oit.duke.edu:8089" with status as "Up"
Server at URI "splunk-index-test-03-private.oit.duke.edu:8089" with status as "Up"

and

/opt/splunk/bin/splunk search "index=* | stats count by splunk_server"
   splunk_server     count
-------------------- ------
splunk-index-test-01 727141

Interestingly, the search of internal indexes returned nothing at all:

 /opt/splunk/bin/splunk search "index=_* | stats count by splunk_server"
root@splunk-search-head-test-01  /opt/splunk/etc/system/local $ 

There are no distsearch.conf files anywhere that aren't default:

/opt/splunk/etc $ find ./ -name distsearch.conf
./apps/splunk_archiver/default/distsearch.conf
./apps/splunk_management_console/default/distsearch.conf
./system/default/distsearch.conf

I'm really puzzled by this one.

0 Karma

Splunk Employee
Splunk Employee

So am I. When you run your search on the SH UI and then look at the job inspector output, do you see your three indexers listed under dispatch.stream.remote?
Any chance you can post the full search.log from your search?

0 Karma

Same old search:
index=* | chart count by splunk_server
which returned:
splunk-index-test-01 | 18443
From the job inspector:
0.14 dispatch.stream.remote 11 - 50,997
0.14 dispatch.stream.remote.splunk-index-test-01 9 - 43,249
0.00 dispatch.stream.remote.splunk-index-test-02 1 - 3,874
0.00 dispatch.stream.remote.splunk-index-test-03 1 - 3,874

and at the end of the job inspector:
searchProviders
[
"splunk-index-test-01",
"splunk-index-test-02",
"splunk-index-test-03",
"splunk-search-head-test-01"
]
searchTotalBucketsCount 8
searchTotalEliminatedBucketsCount 0
sid 1507917854.41
statusBuckets 0
ttl 600
Additional info search.log search.log( splunk-index-test-01 splunk-index-test-02 splunk-index-test-03 )

The search.log is attached to the next answer.

0 Karma

I removed the cluster info from the search-head's server.conf and added the peers individually. I'm still only getting one of them, so it isn't the master.

0 Karma

I probably should have mentioned - this was a problem before I added the 2nd NIC. I re-did all of the index-clustering pieces (removed [cluster] from server.conf everywhere and re-ran the cluster-config command everywhere) and re-attached the search-head. It had not really registered that there was a connection problem until after I'd done all of this, but it definitely was there.

There are no distsearch.conf files anywhere except the defaults, on the search-head, indexers, and master. btool is fine with those.

0 Karma

Splunk Employee
Splunk Employee

Any indication in splunkd.log that the search head can successfully connect to all search peers?

Also, have you taken a look at the search job inspector, specifically search.log in the UI?

0 Karma

I can telnet to both nics on all 3 boxes over port 8089.

I inspected this job:
index=*
This is under normalizedSearch:
litsearch ( index=* ) ( ( splunk_server=splunk-index-test-01* ) ) | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1507753287.000000 lt=1507839687.000000 remove=true max_count=1000 max_prefetch=100

It looks from this like the search-head believes there is only one search-peer.

Further, if I specify the splunk_server in the search:
index=* splunk_server=splunk-index-test-02*
I get "Search filters specified using splunk_server/splunk_server_group do not match any search peer."
Even though that search peer is listed under Distributed search->search peers. If the search-head was unable to see them, they would not show up there.

splunkd.log shows no connection problems.

0 Karma

I did just find "Not connecting to peer 'splunk-index-test-02' because it has been optimized out. Peername and none of it's search groups [] match the query."

Searching docs.splunk.com for either of these phrases gets me nothing. I do wish that the error log wording showed up in the documentation.
I also searched for DistributedSearchResultCollectionManager.

0 Karma

SplunkTrust
SplunkTrust

This is just a wild hair idea, but if you are replicating data between three indexers and all the data got replicated onto one indexer, then connecting to the other two wouldn't really be necessary, would it?

0 Karma

Splunk Employee
Splunk Employee

Highly unlikely, and wouldn't explain the results when running the search on the cluster master.

0 Karma

That's not how index clustering works.
If I stop splunk on the one that I can see, there is no data at all.

0 Karma

Splunk Employee
Splunk Employee

Do you by any chance still have a distsearch.conf file on your search head?
Re-reading your original question, it almost sounds like you have connected it to the cluster master AND configured search peers in the distributed search setup....?
Maybe I am misreading...

0 Karma

Splunk Employee
Splunk Employee

Strange. If you haven't already, I would try to remove/re-add the search head to the cluster and see if that helps. I'll dig to see if I can find the REST call to get the list of search peers from the cluster master.

0 Karma

Splunk Employee
Splunk Employee

Not sure if this will work from your search head, but worth a try:

 |rest /services/cluster/master/peers count=0 splunk_server=splunk-master-test-01
0 Karma

I'm not sure what you mean by removing the search-head from the cluster. It's an index cluster, the search-head is a singleton.
I'd appreciate that REST call.

0 Karma

The search.log shows that it is kind of aware of the other two search-peers:

10-12-2017 16:42:57.557 INFO DistributedSearchResultCollectionManager - Connecting to peer splunk-index-test-02 connectAll 0 connectToSpecificPeer 1
10-12-2017 16:42:57.557 INFO DistributedSearchResultCollectionManager - Connecting to peer splunk-index-test-03 connectAll 0 connectToSpecificPeer 1

and

10-12-2017 16:42:57.563 INFO DistributedSearchResultCollectionManager - Successfully created search result collector for peer=splunk-index-test-02 in 0.003000 seconds
10-12-2017 16:42:57.565 INFO DistributedSearchResultCollectionManager - Successfully created search result collector for peer=splunk-index-test-03 in 0.003000 seconds

I'm not sure how helpful this is, given that it says they don't exist when specified directly.

0 Karma