We have a Search Head Cluster with 3 search heads. We have 70 searches that are supposed to run every minute.
We find that 14-15% of searches are getting skipped on SH Captain. We tried to change the captain and observed the same phenomenon on new captain too. We do not have any SH designated for ad-hoc searches.
Please find the image below where other search heads are not experiencing any skip. Also, note that SH captain is taking higher number of searches.
Please let us know if there is a way to get around this.
You're running 70 searches per minute on 3 SH. What are the specs on those SH's ? Are you having those skipped event throughout the day or simply during peek hours ?
Try running this search to see at what time you're getting the most drops :
index=_internal sourcetype=scheduler status=skipped |timechart span=1min count
It could be that you simply need more CPU cores on your SH's to handle all the load.
Thanks for your note.
Our Three SHs have 36 CPU each. The skipping is seen only observed on Search Head Captain. We are observing skipping throughout the day.
We have setup maxsearchesper_cpu to 4 on all SHs already. (This is taking us closer to brink though)
The SHC handles scheduling and dispatching so it consumes more CPU than the other instances. You're having the problem on the SHC regardless of which host is the captain right ?
interesting. Not my question, but if one SH is skipping, how do we check for what is creating load on that SH? Or generally what to look at to reduce load on the overloaded SH?
We made the following settings -
Our intent is to push all scheduled searches to sh1 and monitor the performance. We will keep you apprised on our observation.