Monitoring Splunk

Why is the dispatch.finalizeRemoteTimeline causing searches to take extremely long to finish?

mjones414
Contributor

In a given 7 day search, I can get results back in aproximately 10 seconds but the search finalization takes over a minute and sometimes as long as 5 minutes unless I interrupt it. I can't find much information about this metric in the job inspector but I think its the culprite to many of my performance issues in a distributed splunk environment as I've seen my read ops on indexers hit 3-4k ops. Does anyone have any insight into this metric?

Here is one I've let run to completion with the full inspection. (only changing server names)

Search job inspector

This search has completed and has returned 30,307 results by scanning 36,067 events in 212.625 seconds.

The following messages were returned by the search subsystem:

DEBUG: [Server1] search context: user="me", app="search", bs-pathname="C:\Program Files\Splunk\var\run\searchpeers\searchhead-1386688856"
DEBUG: [Server2] search context: user="me", app="search", bs-pathname="C:\Program Files\Splunk\var\run\searchpeers\Searchhead-1386688856"
DEBUG: [Server3] search context: user="me", app="search", bs-pathname="/opt/splunk/var/run/searchpeers/Searchhead-1386688856"
DEBUG: base lispy: [ AND cpu WantedServerName index::vmware source::virtualmachineperf ]
DEBUG: search context: user="me", app="search", bs-pathname="/opt/splunk/etc"

(SID: 1386689000.46598)
Execution costs
Duration (seconds)      Component   Invocations     Input count     Output count
    0.004   command.fields  17  30,307  30,307
    3.73    command.remotetl    17  30,307  -
    13.971  command.search  17  -   30,307
    3.623   command.search.index    17  -   -
    0.287   command.search.fieldalias   13  36,067  36,067
    0.119   command.search.filter   13  -   -
    0.031   command.search.calcfields   13  36,067  36,067
    0   command.search.index.usec_1_8   95,546  -   -
    0   command.search.index.usec_4096_32768    24  -   -
    0   command.search.index.usec_512_4096  8   -   -
    0   command.search.index.usec_64_512    7   -   -
    0   command.search.index.usec_8_64  992     -   -
    5.807   command.search.rawdata  13  -   -
    2.21    command.search.typer    17  30,307  30,307
    1.555   command.search.kv   13  -   -
    0.26    command.search.lookups  13  36,067  36,067
    0.071   command.search.tags     17  30,307  30,307
    0.001   command.search.summary  17  -   -
    0.609   dispatch.check_disk_usage   2   -   -
    0.358   dispatch.createProviderQueue    1   -   -
    0.76    dispatch.evaluate   1   -   -
    0.735   dispatch.evaluate.search    1   -   -
    3.157   dispatch.fetch  18  -   -
    204.589     dispatch.finalizeRemoteTimeline     1   -   -
    0.007   dispatch.preview    6   -   -
    6.273   dispatch.process_remote_timeline    8   5,382,700   -
    3.023   dispatch.readEventsInResults    1   -   -
    0.871   dispatch.remote_timeline_fullevents     8   495,752     300
    0.001   dispatch.stream.local   1   -   -
    17.693  dispatch.stream.remote  16  -   5,950,048
    7.023   dispatch.stream.remote.Server1  5   -   2,126,597
    6.292   dispatch.stream.remote.Server2  5   -   2,115,295
    4.378   dispatch.stream.remote.Server3  6   -   1,708,156
    2.103   dispatch.timeline   18  -   -
    0.577   dispatch.writeStatus    20  -   -
    1.333   startup.handoff     1   -   -
Search job properties
bundleVersion   15744914850249264379
canSummarize    0
createTime  2013-12-10T09:23:21.000-06:00
cursorTime  2013-12-03T09:00:00.000-06:00
custom  {'dispatch.earliest_time': '-7d@h', 'dispatch.latest_time': 'now', 'search': 'index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName'}
defaultSaveTTL  604800
defaultTTL  600
delegate    None
diskUsage   15253504
dispatchState   DONE
doneProgress    1.0
dropCount   0
eai:acl {'can_write': '1', 'ttl': '600', 'owner': 'me', 'modifiable': '1', 'perms': {'read': ['me'], 'write': ['me']}, 'sharing': 'global', 'app': 'search'}
earliestTime    2013-12-03T09:00:00.000-06:00
eventAvailableCount 30307
eventCount  30307
eventFieldCount 43
eventIsStreaming    True
eventIsTruncated    False
eventSearch search index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName
eventSorting    desc
indexEarliestTime   1386082815
indexLatestTime 1386688993
isBatchModeSearch   False
isDone  True
isFailed    False
isFinalized False
isPaused    False
isPreviewEnabled    True
isRealTimeSearch    False
isRemoteTimeline    True
isSaved False
isSavedSearch   False
isZombie    False
keywords    index::vmware moname::WantedServerName perftype::cpu source::virtualmachineperf
label   None
latestTime  2013-12-10T09:23:20.000-06:00
meanPreviewPeriod   35.437500
messages    {'debug': ['[Server1] search context: user="me", app="search", bs-pathname="C:\\Program Files\\Splunk\\var\\run\\searchpeers\\Searchhead-1386688856"', '[Server2] search context: user="me", app="search", bs-pathname="C:\\Program Files\\Splunk\\var\\run\\searchpeers\\Searchhead-1386688856"', '[Server3] search context: user="me", app="search", bs-pathname="/opt/splunk/var/run/searchpeers/Searchhead-1386688856"', 'base lispy: [ AND cpu WantedServerName index::vmware source::virtualmachineperf ]', 'search context: user="me", app="search", bs-pathname="/opt/splunk/etc"']}
modifiedTime    2013-12-10T09:31:34.862-06:00
normalizedSearch    litsearch index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1386082800.000000 lt=1386689000.000000 remove=true max_count=1000 max_prefetch=100
numPreviews 6
peerNameList    ['Server1', 'Server2', 'Server3']
performance {'command.search.tags': {'input_count': '30307', 'invocations': '17', 'duration_secs': '0.071000', 'output_count': '30307'}, 'startup.handoff': {'invocations': '1', 'duration_secs': '1.333000'}, 'command.search.index.usec_512_4096': {'invocations': '8'}, 'command.search.kv': {'invocations': '13', 'duration_secs': '1.555000'}, 'dispatch.timeline': {'invocations': '18', 'duration_secs': '2.103000'}, 'dispatch.finalizeRemoteTimeline': {'invocations': '1', 'duration_secs': '204.589000'}, 'command.search': {'input_count': '0', 'invocations': '17', 'duration_secs': '13.971000', 'output_count': '30307'}, 'command.search.filter': {'invocations': '13', 'duration_secs': '0.119000'}, 'command.search.index': {'invocations': '17', 'duration_secs': '3.623000'}, 'dispatch.createProviderQueue': {'invocations': '1', 'duration_secs': '0.358000'}, 'command.search.typer': {'input_count': '30307', 'invocations': '17', 'duration_secs': '2.210000', 'output_count': '30307'}, 'command.search.rawdata': {'invocations': '13', 'duration_secs': '5.807000'}, 'dispatch.evaluate.search': {'invocations': '1', 'duration_secs': '0.735000'}, 'command.search.index.usec_4096_32768': {'invocations': '24'}, 'command.search.index.usec_1_8': {'invocations': '95546'}, 'dispatch.remote_timeline_fullevents': {'input_count': '495752', 'invocations': '8', 'duration_secs': '0.871000', 'output_count': '300'}, 'command.search.index.usec_8_64': {'invocations': '992'}, 'dispatch.stream.remote.Server3': {'input_count': '0', 'invocations': '5', 'duration_secs': '7.023000', 'output_count': '2126597'}, 'command.search.summary': {'invocations': '17', 'duration_secs': '0.001000'}, 'dispatch.fetch': {'invocations': '18', 'duration_secs': '3.157000'}, 'dispatch.preview': {'invocations': '6', 'duration_secs': '0.007000'}, 'dispatch.evaluate': {'invocations': '1', 'duration_secs': '0.760000'}, 'dispatch.stream.local': {'invocations': '1', 'duration_secs': '0.001000'}, 'dispatch.readEventsInResults': {'invocations': '1', 'duration_secs': '3.023000'}, 'command.search.index.usec_64_512': {'invocations': '7'}, 'dispatch.process_remote_timeline': {'input_count': '5382700', 'invocations': '8', 'duration_secs': '6.273000', 'output_count': '0'}, 'dispatch.stream.remote.Server1': {'input_count': '0', 'invocations': '6', 'duration_secs': '4.378000', 'output_count': '1708156'}, 'command.search.fieldalias': {'input_count': '36067', 'invocations': '13', 'duration_secs': '0.287000', 'output_count': '36067'}, 'dispatch.stream.remote.Server2': {'input_count': '0', 'invocations': '5', 'duration_secs': '6.292000', 'output_count': '2115295'}, 'command.remotetl': {'input_count': '30307', 'invocations': '17', 'duration_secs': '3.730000', 'output_count': '0'}, 'command.search.calcfields': {'input_count': '36067', 'invocations': '13', 'duration_secs': '0.031000', 'output_count': '36067'}, 'dispatch.writeStatus': {'invocations': '20', 'duration_secs': '0.577000'}, 'command.fields': {'input_count': '30307', 'invocations': '17', 'duration_secs': '0.004000', 'output_count': '30307'}, 'command.search.lookups': {'input_count': '36067', 'invocations': '13', 'duration_secs': '0.260000', 'output_count': '36067'}, 'dispatch.check_disk_usage': {'invocations': '2', 'duration_secs': '0.609000'}, 'dispatch.stream.remote': {'input_count': '0', 'invocations': '16', 'duration_secs': '17.693000', 'output_count': '5950048'}}
pid 15881
priority    5
remoteSearch    litsearch index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName | fields keepcolorder=t "*" "_bkt" "_cd" "_si" "host" "index" "linecount" "source" "sourcetype" "splunk_server" | remotetl nb=300 et=1386082800.000000 lt=1386689000.000000 remove=true max_count=1000 max_prefetch=100
remoteSearchLogs    ['Server1\nServer2\nServer3']
reportSearch    None
request {'custom.dispatch.earliest_time': '-7d@h', 'custom.search': 'index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName', 'status_buckets': '300', 'adhoc_search_level': 'smart', 'latest_time': 'now', 'custom.dispatch.latest_time': 'now', 'auto_cancel': '30', 'earliest_time': '-7d@h', 'preview': '1', 'ui_dispatch_app': 'search', 'rf': '*', 'indexedRealtime': None, 'search': 'search index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName'}
resultCount 30307
resultIsStreaming   True
resultPreviewCount  30307
runDuration 212.625
runtime {'auto_pause': '0', 'auto_cancel': '30'}
scanCount   36067
search  search index=vmware source=virtualmachineperf perftype=cpu moname=WantedServerName
searchEarliestTime  1386082800.000000000
searchLatestTime    1386689000.000000000
searchProviders ['Server1', 'Server2', 'Server3', 'Searchhead']
sid 1386689000.46598
statusBuckets   300
ttl 600
Additional info     timeline field summary search.log search.log( Server1 Server2 Server3 )

Server info: Splunk 6.0, Searchhead, Tue Dec 10 09:31:34 2013 User: me

jrodman
Splunk Employee
Splunk Employee

A slow dispatch.remoteTimeline finish means that splunk is taking a long time to fetch the per-bucket events or the logfiles related to the search -- both of which happen at the end.

Fast mode searches do not even event buckets, so if this happens in fast mode as well, it is probably the log retrieval.

This answers post mentions how the two behaviors can be disabled if they are a problem in your environment:
http://answers.splunk.com/answers/122950/dispatch-finalizeremotetimeline-taking-a-long-time.html

However I believe this particular painpoint was reduced by mid 6.1.x in any event, and the log fetching is greatly helpful in troubleshooting. The choice is yours.

anupkpal
New Member

I have been investigating into searches for both admin user and splunk system user. Searched conducted by System User takes very long time. Searches done by system user are typically bucket copy trigger, copy buckets, summarize etc. Can someone explain why there is such a big difference:

Search Activity by User (2)
User Search Count Median Runtime 90th Percentile Runtime Cumulative Runtime Last Search
1 admin 201 0.17s 0.46s 5h 32min 13.32s 2018-01-31 09:55:39
2 splunk-system-user 150 3.58s 10.00s 13min 42.47s 2018-01-31 09:47:14

Common Search Commands (first there are for Splunk System user)
Command Count Average Runtime Max Runtime
1 summarize 100 3.86s 20.25s
2 bucket 25 9.56s 26.89s
3 copybuckets 25 7.92s 21.65s
4 kv 4 0.06s 0.12s
5 metadata 2 1.23s 1.41s
6 search 2 1.23s 1.41s

Appreciate your response.

Thanks,

Anup Pal
Solution Engineer,
SwiftStack Inc.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...