From the documentation: startup.handoff The time elapsed between the forking of a separate search process and the beginning of useful work of the forked search processes. In other words it is the approximate time it takes to build the search apparatus. This is cumulative across all involved peers. If this takes a long time, it could be indicative of I/O issues with .conf files or the dispatch directory.
The things that I have seen affecting this have been:
search head is very busy
indexers are very busy
large search bundles
slow I/O on drive that hosts the splunk ($SPLUNK_HOME)
If you are using the SoS app or the Distributed Management Console you should be able to see if your problems are #1 or 2.
For 3 - Really large search bundles are most often due to large lookup tables. You can grab one of the bundles from the search head or indexers, move it to a temp directory and expand it using tar. Then look and see if one or more directories is very large (du -m --max-depth=1). This could point you to a lookup table you may not need to distribute. Sometimes will will see bundle replication timeouts that indicate this problem in the splunkd.log
4 - more difficult to measure as this could also be due to network utilization when transferring the bundle to the indexers. But you should be able to measure disk I/O using some tool.
It could also be that one indexer is much slower than all the others and that will cause the whole startup process to be slow.
If you find your answer, please post for other's benefit.
... View more