In a given 7 day search, I can get results back in aproximately 10 seconds but the search finalization takes over a minute and sometimes as long as 5 minutes unless I interrupt it. I can't find much information about this metric in the job inspector but I think its the culprite to many of my performance issues in a distributed splunk environment as I've seen my read ops on indexers hit 3-4k ops. Does anyone have any insight into this metric?
Here is one I've let run to completion with the full inspection. (only changing server names)
Search job inspector
This search has completed and has returned 30,307 results by scanning 36,067 events in 212.625 seconds.
The following messages were returned by the search subsystem:
DEBUG: [Server1] search context: user="me", app="search", bs-pathname="C:\Program Files\Splunk\var\run\searchpeers\searchhead-1386688856"
DEBUG: [Server2] search context: user="me", app="search", bs-pathname="C:\Program Files\Splunk\var\run\searchpeers\Searchhead-1386688856"
DEBUG: [Server3] search context: user="me", app="search", bs-pathname="/opt/splunk/var/run/searchpeers/Searchhead-1386688856"
DEBUG: base lispy: [ AND cpu WantedServerName index::vmware source::virtualmachineperf ]
DEBUG: search context: user="me", app="search", bs-pathname="/opt/splunk/etc"
(SID: 1386689000.46598)
Execution costs
Duration (seconds) Component Invocations Input count Output count
0.004 command.fields 17 30,307 30,307
3.73 command.remotetl 17 30,307 -
13.971 command.search 17 - 30,307
3.623 command.search.index 17 - -
0.287 command.search.fieldalias 13 36,067 36,067
0.119 command.search.filter 13 - -
0.031 command.search.calcfields 13 36,067 36,067
0 command.search.index.usec_1_8 95,546 - -
0 command.search.index.usec_4096_32768 24 - -
0 command.search.index.usec_512_4096 8 - -
0 command.search.index.usec_64_512 7 - -
0 command.search.index.usec_8_64 992 - -
5.807 command.search.rawdata 13 - -
2.21 command.search.typer 17 30,307 30,307
1.555 command.search.kv 13 - -
0.26 command.search.lookups 13 36,067 36,067
0.071 command.search.tags 17 30,307 30,307
0.001 command.search.summary 17 - -
0.609 dispatch.check_disk_usage 2 - -
0.358 dispatch.createProviderQueue 1 - -
0.76 dispatch.evaluate 1 - -
0.735 dispatch.evaluate.search 1 - -
3.157 dispatch.fetch 18 - -
204.589 dispatch.finalizeRemoteTimeline 1 - -
0.007 dispatch.preview 6 - -
6.273 dispatch.process_remote_timeline 8 5,382,700 -
3.023 dispatch.readEventsInResults 1 - -
0.871 dispatch.remote_timeline_fullevents 8 495,752 300
0.001 dispatch.stream.local 1 - -
17.693 dispatch.stream.remote 16 - 5,950,048
7.023 dispatch.stream.remote.Server1 5 - 2,126,597
6.292 dispatch.stream.remote.Server2 5 - 2,115,295
4.378 dispatch.stream.remote.Server3 6 - 1,708,156
2.103 dispatch.timeline 18 - -
0.577 dispatch.writeStatus 20 - -
1.333 startup.handoff 1 - -
Search job properties
bundleVersion 15744914850249264379
canSummarize 0
createTime 2013-12-10T09:23:21.000-06:00
cursorTime 2013-12-03T09:00:00.000-06:00
custom {'dispatch.earliest_time': '-7d@h', 'dispatch.latest_time': 'now', 'search': 'index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName'}
defaultSaveTTL 604800
defaultTTL 600
delegate None
diskUsage 15253504
dispatchState DONE
doneProgress 1.0
dropCount 0
eai:acl {'can_write': '1', 'ttl': '600', 'owner': 'me', 'modifiable': '1', 'perms': {'read': ['me'], 'write': ['me']}, 'sharing': 'global', 'app': 'search'}
earliestTime 2013-12-03T09:00:00.000-06:00
eventAvailableCount 30307
eventCount 30307
eventFieldCount 43
eventIsStreaming True
eventIsTruncated False
eventSearch search index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName
eventSorting desc
indexEarliestTime 1386082815
indexLatestTime 1386688993
isBatchModeSearch False
isDone True
isFailed False
isFinalized False
isPaused False
isPreviewEnabled True
isRealTimeSearch False
isRemoteTimeline True
isSaved False
isSavedSearch False
isZombie False
keywords index::vmware moname::WantedServerName perftype::cpu source::virtualmachineperf
label None
latestTime 2013-12-10T09:23:20.000-06:00
meanPreviewPeriod 35.437500
messages {'debug': ['[Server1] search context: user="me", app="search", bs-pathname="C:\\Program Files\\Splunk\\var\\run\\searchpeers\\Searchhead-1386688856"', '[Server2] search context: user="me", app="search", bs-pathname="C:\\Program Files\\Splunk\\var\\run\\searchpeers\\Searchhead-1386688856"', '[Server3] search context: user="me", app="search", bs-pathname="/opt/splunk/var/run/searchpeers/Searchhead-1386688856"', 'base lispy: [ AND cpu WantedServerName index::vmware source::virtualmachineperf ]', 'search context: user="me", app="search", bs-pathname="/opt/splunk/etc"']}
modifiedTime 2013-12-10T09:31:34.862-06:00
normalizedSearch litsearch index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1386082800.000000 lt=1386689000.000000 remove=true max_count=1000 max_prefetch=100
numPreviews 6
peerNameList ['Server1', 'Server2', 'Server3']
performance {'command.search.tags': {'input_count': '30307', 'invocations': '17', 'duration_secs': '0.071000', 'output_count': '30307'}, 'startup.handoff': {'invocations': '1', 'duration_secs': '1.333000'}, 'command.search.index.usec_512_4096': {'invocations': '8'}, 'command.search.kv': {'invocations': '13', 'duration_secs': '1.555000'}, 'dispatch.timeline': {'invocations': '18', 'duration_secs': '2.103000'}, 'dispatch.finalizeRemoteTimeline': {'invocations': '1', 'duration_secs': '204.589000'}, 'command.search': {'input_count': '0', 'invocations': '17', 'duration_secs': '13.971000', 'output_count': '30307'}, 'command.search.filter': {'invocations': '13', 'duration_secs': '0.119000'}, 'command.search.index': {'invocations': '17', 'duration_secs': '3.623000'}, 'dispatch.createProviderQueue': {'invocations': '1', 'duration_secs': '0.358000'}, 'command.search.typer': {'input_count': '30307', 'invocations': '17', 'duration_secs': '2.210000', 'output_count': '30307'}, 'command.search.rawdata': {'invocations': '13', 'duration_secs': '5.807000'}, 'dispatch.evaluate.search': {'invocations': '1', 'duration_secs': '0.735000'}, 'command.search.index.usec_4096_32768': {'invocations': '24'}, 'command.search.index.usec_1_8': {'invocations': '95546'}, 'dispatch.remote_timeline_fullevents': {'input_count': '495752', 'invocations': '8', 'duration_secs': '0.871000', 'output_count': '300'}, 'command.search.index.usec_8_64': {'invocations': '992'}, 'dispatch.stream.remote.Server3': {'input_count': '0', 'invocations': '5', 'duration_secs': '7.023000', 'output_count': '2126597'}, 'command.search.summary': {'invocations': '17', 'duration_secs': '0.001000'}, 'dispatch.fetch': {'invocations': '18', 'duration_secs': '3.157000'}, 'dispatch.preview': {'invocations': '6', 'duration_secs': '0.007000'}, 'dispatch.evaluate': {'invocations': '1', 'duration_secs': '0.760000'}, 'dispatch.stream.local': {'invocations': '1', 'duration_secs': '0.001000'}, 'dispatch.readEventsInResults': {'invocations': '1', 'duration_secs': '3.023000'}, 'command.search.index.usec_64_512': {'invocations': '7'}, 'dispatch.process_remote_timeline': {'input_count': '5382700', 'invocations': '8', 'duration_secs': '6.273000', 'output_count': '0'}, 'dispatch.stream.remote.Server1': {'input_count': '0', 'invocations': '6', 'duration_secs': '4.378000', 'output_count': '1708156'}, 'command.search.fieldalias': {'input_count': '36067', 'invocations': '13', 'duration_secs': '0.287000', 'output_count': '36067'}, 'dispatch.stream.remote.Server2': {'input_count': '0', 'invocations': '5', 'duration_secs': '6.292000', 'output_count': '2115295'}, 'command.remotetl': {'input_count': '30307', 'invocations': '17', 'duration_secs': '3.730000', 'output_count': '0'}, 'command.search.calcfields': {'input_count': '36067', 'invocations': '13', 'duration_secs': '0.031000', 'output_count': '36067'}, 'dispatch.writeStatus': {'invocations': '20', 'duration_secs': '0.577000'}, 'command.fields': {'input_count': '30307', 'invocations': '17', 'duration_secs': '0.004000', 'output_count': '30307'}, 'command.search.lookups': {'input_count': '36067', 'invocations': '13', 'duration_secs': '0.260000', 'output_count': '36067'}, 'dispatch.check_disk_usage': {'invocations': '2', 'duration_secs': '0.609000'}, 'dispatch.stream.remote': {'input_count': '0', 'invocations': '16', 'duration_secs': '17.693000', 'output_count': '5950048'}}
pid 15881
priority 5
remoteSearch litsearch index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1386082800.000000 lt=1386689000.000000 remove=true max_count=1000 max_prefetch=100
remoteSearchLogs ['Server1\nServer2\nServer3']
reportSearch None
request {'custom.dispatch.earliest_time': '-7d@h', 'custom.search': 'index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName', 'status_buckets': '300', 'adhoc_search_level': 'smart', 'latest_time': 'now', 'custom.dispatch.latest_time': 'now', 'auto_cancel': '30', 'earliest_time': '-7d@h', 'preview': '1', 'ui_dispatch_app': 'search', 'rf': '*', 'indexedRealtime': None, 'search': 'search index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName'}
resultCount 30307
resultIsStreaming True
resultPreviewCount 30307
runDuration 212.625
runtime {'auto_pause': '0', 'auto_cancel': '30'}
scanCount 36067
search search index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName
searchEarliestTime 1386082800.000000000
searchLatestTime 1386689000.000000000
searchProviders ['Server1', 'Server2', 'Server3', 'Searchhead']
sid 1386689000.46598
statusBuckets 300
ttl 600
Additional info timeline field summary search.log search.log( Server1 Server2 Server3 )
Server info: Splunk 6.0, Searchhead, Tue Dec 10 09:31:34 2013 User: me
A slow dispatch.remoteTimeline finish means that splunk is taking a long time to fetch the per-bucket events or the logfiles related to the search -- both of which happen at the end.
Fast mode searches do not even event buckets, so if this happens in fast mode as well, it is probably the log retrieval.
This answers post mentions how the two behaviors can be disabled if they are a problem in your environment:
http://answers.splunk.com/answers/122950/dispatch-finalizeremotetimeline-taking-a-long-time.html
However I believe this particular painpoint was reduced by mid 6.1.x in any event, and the log fetching is greatly helpful in troubleshooting. The choice is yours.
I have been investigating into searches for both admin user and splunk system user. Searched conducted by System User takes very long time. Searches done by system user are typically bucket copy trigger, copy buckets, summarize etc. Can someone explain why there is such a big difference:
Search Activity by User (2)
User Search Count Median Runtime 90th Percentile Runtime Cumulative Runtime Last Search
1 admin 201 0.17s 0.46s 5h 32min 13.32s 2018-01-31 09:55:39
2 splunk-system-user 150 3.58s 10.00s 13min 42.47s 2018-01-31 09:47:14
Common Search Commands (first there are for Splunk System user)
Command Count Average Runtime Max Runtime
1 summarize 100 3.86s 20.25s
2 bucket 25 9.56s 26.89s
3 copybuckets 25 7.92s 21.65s
4 kv 4 0.06s 0.12s
5 metadata 2 1.23s 1.41s
6 search 2 1.23s 1.41s
Appreciate your response.
Thanks,
Anup Pal
Solution Engineer,
SwiftStack Inc.