I have some problems with the time the first events showing up in splunk.
When I am doing searches to all indexers, it takes about 10 to 15 seconds until the first raw events are shown up.
After this it pulls the events very fast, so it goes up to million events very quick, but the start is really slow.
When limit the search to only one indexer or a few, it starts faster but still not really acceptable.
I checked the resource usage of all the instances even the search head is ok. It is the same on different search heads.
If I search for data where I only get a few results like 20 or so, it takes ages to show up. It seems that something is waiting for more results to show up first.
Not sure how I can explain that in a better way.
I noticed that the startup.handoff time is always high
It is a bit faster with data in warm, but only 1 or 2 seconds. That should be normal as warm is on SSD and cold on normal disks.
Thx
This likely is tied to the size of your search head dispatch bundles. Try something like:
index=_internal sourcetype=splunkd metrics group=bundles_uploads status=success
| stats max(replication_time_msec) AS rep_time BY host peer_name
| sort -rep_time
and look for how long the bundles are taking to replicate. If it's consistently longer than 10 secs (10000msec) you likely have some large lookup files or others that are contributing. You can look at commands to blacklist replication or tame the size of your lookups.
Can you check the resource utilization on the indexers as well (they're the one actually fetching the data)? Also, check if your searches are following best practices (have index/sourcetype filter at base search level, not using expensive commands).