Deployment Architecture

Distributed search performance troubles

pcsegal
Explorer

I have a setup with one cluster master, one indexer cluster (with 3 peers), one non-clustered indexer and one search head.
So:
- 1 cluster master. (Splunk Enterprise 7.0.2)
- 3 clustered indexers. (Splunk Enterprise 7.0.2)
- 1 non-clustered indexer which contains legacy data that I need to be able to search from the search head. (Splunk Enterprise 6.6.5)
- 1 search head with all of the 4 indexers as search peers. (Splunk Enterprise 7.0.2)

This setup does work, and data gets searched. However, I'm seeing a slowness in searching the peers from the search head. I'm not sure if this is expected, but it happens exactly the same in both my production and my development environments (which have the exact same setup, except that the Splunk versions are all 7.0.1).

Here are the things I'm seeing, when searching from the search head.

1) If I run any search that returns no data, it always takes about 2 seconds. This happens even if I disconnect the non-clustered indexer from the story. When I look at the Job Inspector, I see that "dispatch.finalizeRemoteTimeline" is always a straight 2.00 seconds, for searches with no result, and seems to be at least 2.00 for any other search. Picture below
Job Inspector
At the end of this post, I've included the resulting search.log entries for an example of search that returns no results ( index="1234" ) at the end of this post. The lines that caught my attention were:

04-08-2018 12:49:09.168 INFO  UserManager - Unwound user context: admin -> NULL
04-08-2018 12:49:11.168 INFO  UserManager - Unwound user context: admin -> NULL

2) If I run any search against data that resides in the non-clustered indexer, it takes a lot longer than if I run the same search locally in the non-clustered indexer. Typically double the time.

Anyone found something similar?

Best regards.

Example of search.log for item 1 (for a search that returns zero results):

04-08-2018 12:49:09.110 INFO  dispatchRunner - Search process mode: preforked (reused process) (build 2b5b15c4ee89).
04-08-2018 12:49:09.111 INFO  dispatchRunner - registering build time modules, count=1
04-08-2018 12:49:09.111 INFO  dispatchRunner - registering search time components of build time module name=vix
04-08-2018 12:49:09.111 INFO  BundlesSetup - Setup stats for /opt/splunk/etc: wallclock_elapsed_msec=173, cpu_time_used=0.02, shared_services_generation=2, shared_services_population=1
04-08-2018 12:49:09.112 INFO  UserManager - Setting user context: splunk-system-user
04-08-2018 12:49:09.112 INFO  UserManager - Done setting user context: NULL -> splunk-system-user
04-08-2018 12:49:09.112 INFO  UserManager - Unwound user context: splunk-system-user -> NULL
04-08-2018 12:49:09.112 INFO  UserManager - Setting user context: admin
04-08-2018 12:49:09.112 INFO  UserManager - Done setting user context: NULL -> admin
04-08-2018 12:49:09.112 INFO  dispatchRunner - search context: user="admin", app="damas", bs-pathname="/opt/splunk/etc"
04-08-2018 12:49:09.112 INFO  dispatchRunner - Executing the DispatchThread.
04-08-2018 12:49:09.112 INFO  SearchParser - PARSING: search index="1234"
04-08-2018 12:49:09.113 INFO  ISplunkDispatch - Not running in splunkd. Bundle replication not triggered.
04-08-2018 12:49:09.114 INFO  UserManager - Setting user context: admin
04-08-2018 12:49:09.114 INFO  UserManager - Done setting user context: NULL -> admin
04-08-2018 12:49:09.121 INFO  SearchProcessor - Building search filter
04-08-2018 12:49:09.139 INFO  UnifiedSearch - Expanded index search = index="1234"
04-08-2018 12:49:09.139 INFO  UnifiedSearch - base lispy: [ AND [ EQ index 1234 ] ]
04-08-2018 12:49:09.139 INFO  UnifiedSearch - Processed search targeting arguments
04-08-2018 12:49:09.139 INFO  DispatchThread - BatchMode: allowBatchMode: 0, conf(1): 1, timeline/Status buckets(0):300, realtime(0):0, report pipe empty(0):1, reqTimeOrder(0):0, summarize(0):0, statefulStreaming(0):0
04-08-2018 12:49:09.139 INFO  DispatchThread - Storing only 1000 events per timeline buckets due to limits.conf max_events_per_bucket setting.
04-08-2018 12:49:09.139 INFO  DispatchThread - Setup timeliner partialCommits=1
04-08-2018 12:49:09.139 INFO  DispatchThread - required fields list to add to remote search = _bkt,_cd,_si,host,index,linecount,source,sourcetype,splunk_server
04-08-2018 12:49:09.139 INFO  DispatchThread - Timeline information will be computed remotely
04-08-2018 12:49:09.139 INFO  SearchParser - PARSING: fields keepcolorder=t "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1523188140.000000 lt=1523191749.000000 remove=true max_count=1000 max_prefetch=100
04-08-2018 12:49:09.139 INFO  DispatchCommandProcessor - summaryHash=9c75a18ef8348e81 summaryId=1AB2EE33-B3C9-4DF2-B1B2-89B951717AFE_damas_admin_9c75a18ef8348e81 remoteSearch=litsearch index="1234" | fields  keepcolorder=t "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server"  | remotetl  nb=300 et=1523188140.000000 lt=1523191749.000000 remove=true max_count=1000 max_prefetch=100
04-08-2018 12:49:09.139 INFO  DispatchCommandProcessor - summaryHash=NSd4b234a41686a614 summaryId=1AB2EE33-B3C9-4DF2-B1B2-89B951717AFE_damas_admin_NSd4b234a41686a614 remoteSearch=litsearch index="1234" | fields keepcolorder=t "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1523188140.000000 lt=1523191749.000000 remove=true max_count=1000 max_prefetch=100
04-08-2018 12:49:09.139 INFO  DispatchThread - Getting summary ID for summaryHash=NSd4b234a41686a614
04-08-2018 12:49:09.139 INFO  DispatchThread - Matches no summary
04-08-2018 12:49:09.139 INFO  DispatchThread - SrchOptMetrics check_query_matches_ra=26
04-08-2018 12:49:09.139 INFO  SearchParser - PARSING: search index="1234"
04-08-2018 12:49:09.139 INFO  UnifiedSearch - Processed search targeting arguments
04-08-2018 12:49:09.139 INFO  DispatchThread - SrchOptMetrics optimize_toJson=1
04-08-2018 12:49:09.139 INFO  ProjElim - Black listed processors=[addinfo]
04-08-2018 12:49:09.139 INFO  DispatchThread - SrchOptMetrics optimization=1
04-08-2018 12:49:09.140 INFO  SearchPipeline - Command='search' doesnt have raw field 
04-08-2018 12:49:09.140 INFO  DispatchThread - Optimized Search = | search index="1234"
04-08-2018 12:49:09.140 INFO  DispatchThread - SrchOptMetrics fromJsontoSpl=1
04-08-2018 12:49:09.140 INFO  SearchParser - PARSING: | search index="1234"
04-08-2018 12:49:09.140 INFO  DispatchThread - SrchOptMetrics reparse_optimized_query=1
04-08-2018 12:49:09.147 INFO  SearchProcessor - Building search filter
04-08-2018 12:49:09.164 INFO  UnifiedSearch - Expanded index search = index="1234"
04-08-2018 12:49:09.164 INFO  UnifiedSearch - base lispy: [ AND [ EQ index 1234 ] ]
04-08-2018 12:49:09.164 INFO  UnifiedSearch - Processed search targeting arguments
04-08-2018 12:49:09.164 INFO  DispatchThread - BatchMode: allowBatchMode: 0, conf(1): 1, timeline/Status buckets(0):300, realtime(0):0, report pipe empty(0):1, reqTimeOrder(0):0, summarize(0):0, statefulStreaming(0):0
04-08-2018 12:49:09.164 INFO  DispatchThread - Storing only 1000 events per timeline buckets due to limits.conf max_events_per_bucket setting.
04-08-2018 12:49:09.165 INFO  DispatchThread - Setup timeliner partialCommits=1
04-08-2018 12:49:09.165 INFO  DispatchThread - required fields list to add to remote search = _bkt,_cd,_si,host,index,linecount,source,sourcetype,splunk_server
04-08-2018 12:49:09.165 INFO  DispatchThread - Timeline information will be computed remotely
04-08-2018 12:49:09.165 INFO  SearchParser - PARSING: fields keepcolorder=t "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1523188140.000000 lt=1523191749.000000 remove=true max_count=1000 max_prefetch=100
04-08-2018 12:49:09.165 INFO  DispatchCommandProcessor - summaryHash=9c75a18ef8348e81 summaryId=1AB2EE33-B3C9-4DF2-B1B2-89B951717AFE_damas_admin_9c75a18ef8348e81 remoteSearch=litsearch index="1234" | fields  keepcolorder=t "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server"  | remotetl  nb=300 et=1523188140.000000 lt=1523191749.000000 remove=true max_count=1000 max_prefetch=100
04-08-2018 12:49:09.165 INFO  DispatchCommandProcessor - summaryHash=NSd4b234a41686a614 summaryId=1AB2EE33-B3C9-4DF2-B1B2-89B951717AFE_damas_admin_NSd4b234a41686a614 remoteSearch=litsearch index="1234" | fields keepcolorder=t "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1523188140.000000 lt=1523191749.000000 remove=true max_count=1000 max_prefetch=100
04-08-2018 12:49:09.165 INFO  DispatchThread - Setting summary_mode=NONE after optimization
04-08-2018 12:49:09.165 INFO  DispatchThread - SrchOptMetrics FinalEval=26
04-08-2018 12:49:09.165 INFO  UserManager - Setting user context: admin
04-08-2018 12:49:09.165 INFO  UserManager - Done setting user context: admin -> admin
04-08-2018 12:49:09.165 INFO  UserManager - Unwound user context: admin -> admin
04-08-2018 12:49:09.165 INFO  DistributedSearchResultCollectionManager - Stream search: litsearch index="1234" | fields  keepcolorder=t "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server"  | remotetl  nb=300 et=1523188140.000000 lt=1523191749.000000 remove=true max_count=1000 max_prefetch=100
04-08-2018 12:49:09.166 INFO  ExternalResultProvider - No external result providers are configured
04-08-2018 12:49:09.166 INFO  DistributedSearchResultCollectionManager - ERP_FACTORY initialized, but zero external result provider, hence disabling _isERPCollectionEnabled
04-08-2018 12:49:09.166 INFO  DistributedSearchResultCollectionManager - Default search group:*
04-08-2018 12:49:09.166 INFO  DistributedSearchResultCollectionManager - Not connecting to peer 'XXXX' because it has been optimized out. No searchable indexes on this peer that match the query.
04-08-2018 12:49:09.166 INFO  DistributedSearchResultCollectionManager - Not connecting to peer 'XXXX' because it has been optimized out. No searchable indexes on this peer that match the query.
04-08-2018 12:49:09.166 INFO  DistributedSearchResultCollectionManager - Not connecting to peer 'XXXX' because it has been optimized out. No searchable indexes on this peer that match the query.
04-08-2018 12:49:09.166 INFO  DistributedSearchResultCollectionManager - Not connecting to peer 'XXXX' because it has been optimized out. No searchable indexes on this peer that match the query.
04-08-2018 12:49:09.166 INFO  DistributedSearchResultCollectionManager - Not connecting to peer 'XXXX' because it has been optimized out. No searchable indexes on this peer that match the query.
04-08-2018 12:49:09.166 INFO  DistributedSearchResultCollectionManager - Stream search:  | fields  keepcolorder=t "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server"  | remotetl  nb=300 et=1523188140.000000 lt=1523191749.000000 remove=true max_count=1000 max_prefetch=100
04-08-2018 12:49:09.166 INFO  DispatchThread - Disk quota = 10485760000
04-08-2018 12:49:09.167 INFO  UserManager - Setting user context: admin
04-08-2018 12:49:09.167 INFO  UserManager - Done setting user context: NULL -> admin
04-08-2018 12:49:09.167 INFO  UserManager - Setting user context: admin
04-08-2018 12:49:09.167 INFO  UserManager - Done setting user context: NULL -> admin
04-08-2018 12:49:09.167 INFO  SearchParser - PARSING:  | fields  keepcolorder=t "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server"  | remotetl  nb=300 et=1523188140.000000 lt=1523191749.000000 remove=true max_count=1000 max_prefetch=100
04-08-2018 12:49:09.167 INFO  LocalCollector - Final required fields list = _bkt,_cd,_si,host,index,linecount,source,sourcetype,splunk_server
04-08-2018 12:49:09.167 INFO  UserManager - Unwound user context: admin -> NULL
04-08-2018 12:49:09.167 INFO  UserManager - Setting user context: admin
04-08-2018 12:49:09.167 INFO  UserManager - Done setting user context: NULL -> admin
04-08-2018 12:49:09.168 INFO  UserManager - Unwound user context: admin -> NULL
04-08-2018 12:49:09.168 INFO  UserManager - Unwound user context: admin -> NULL
04-08-2018 12:49:11.168 INFO  UserManager - Unwound user context: admin -> NULL
04-08-2018 12:49:11.169 INFO  UserManager - Setting user context: admin
04-08-2018 12:49:11.169 INFO  UserManager - Done setting user context: NULL -> admin
04-08-2018 12:49:11.169 INFO  UserManager - Unwound user context: admin -> NULL
04-08-2018 12:49:11.169 INFO  DispatchManager - DispatchManager::dispatchHasFinished(id='1523191749.62791', username='admin')
04-08-2018 12:49:11.171 INFO  UserManager - Unwound user context: admin -> NULL
0 Karma

jkat54
SplunkTrust
SplunkTrust

Could it be an external service to blame? Are you using hostnames in distsearch.conf (where you configured individual peers) and server.conf (where you declare masters uri)?

If so dns could be to blame, etc

0 Karma

deepashri_123
Motivator

Hi @pcsegal,

Please refer the following link:
http://docs.splunk.com/Documentation/Splunk/7.0.3/Capacity/Accommodatemanysimultaneoussearches#How_t...
http://docs.splunk.com/Documentation/Splunk/7.0.3/Capacity/Referencehardware

Does your instance have the specifications recommended?

Let me know if this helps!!

0 Karma

deepashri_123
Motivator

@pcsegal,

Can you share the query that you are running?

0 Karma

pcsegal
Explorer

@deepashri_123 - for example, index="1234" (which doesn't return any result) takes about 2 seconds. Any query that returns results from the legacy non-clustered indexer seems way slower than it should be (usually at least 2x slower than doing the same query locally in the non-clustered indexer).

0 Karma

jkat54
SplunkTrust
SplunkTrust

are you searching "all time" for every test?

does it behave differently if you select "last 15 minutes"?

0 Karma

pcsegal
Explorer

@jkat54 - Thank you for the comment - I'm testing for different time ranges. Regarding the case of a search that returns no data, it doesn't seem to make any difference whether I choose "last 15 minutes" or "last 24h" or "all time" - "dispatch.finalizeRemoteTimeline" still takes 2 seconds.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...