Hi,
We have 3 search head in cluster environment under a load balancer.
We are observing that one of the search head (non caption) CPU utilization is very high in compression to other 2 search head.
Can anyone please suggest that why this is happening and how to troubleshoot this?
Thanks.
Found in a post.
Please follow the below troubleshooting steps.
A large amount of dirs/files can slow things down.
$SPLUNK_HOME/var/run/splunk/dispatch
or if in a pooled space
[Pooled Share]/var/run/splunk/dispatch
to get a count of files/dirs in each directory
ls -l|wc -l
You might want to check for a large amount of files under the var dirs in general.
Here's also a search to calculate scheduled search lag to see if the scheduler is lagging. 30 seconds lag is probably normal but you may want to investigate above that. you can set the HIGH_WATERMARK to your liking as a reference point.
As a requirement, you will need to be indexing the scheduler.log
replace host names below with host names for your search heads
(host=hosta OR host=hostb) index=_internal source=*scheduler.log |eval JOB_DELAY_SECS=(dispatch_time-scheduled_time)|timechart span=5m perc95(JOB_DELAY_SECS) by host|eval HIGH_WATERMARK=100
If you are on Linux, you can run this command to see what splunkd or splunkweb is spending time on.
strace -p <splunk pid> -tt
You should have MC installed and configured on some other nodes than SHC nodes. Start it and use it to see what is happening on your SHC and especially on that one node.
Couple of views which you could start:
r. Ismo
Exactly how high is "very high"? What process is responsible for the excess CPU load? Is the excess load consistent or intermittent? If a splunkd process is creating the load, what searches are running at the time?