Monitoring Splunk

How can we find the root cause for very high cpu levels on the indexer cluster?

danielbb
Motivator

Recently in the afternoons, we see high cpu spikes on the indexer cluster and some indexers reach 100% of cpu at some point. How can we detect what causes these spikes? Memory and indexer queues are just perfect.

Labels (2)
0 Karma

soutamo
SplunkTrust
SplunkTrust

If those are Linux machines just install nmon tool and start with it to look what is happening on those servers. There are also app for collecting it’s data and analyse it.

R. Ismo

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @danielbb,
check CPU usage using the Splunk Monitoring Console App [Settings -- Monitoring Console -- Resource Usage -- Machine -- CPU usage 90 percentile], maybe there are heavy scheduled searches that absorbe all CPUs.
Remember that each search and subsearch take a CPU and release it at the end, so if you have many heavy searches or real time searches, maybe you use all your CPUs.

Ciao.
Giuseppe

danielbb
Motivator

On one machine - A, the CPU usage 90 percentile is reaching close to 100% in the past 4 hours and another server, B is at much lower levels.

Looking at one chart above, Average Load Average, and A is at much higher load than B. That's indexing load, right?

0 Karma

nickhills
Ultra Champion

For High CPU search conditions:
If you have indexers with significantly higher load than others in a cluster, it's worth checking that your data is evenly balanced across the peers. If you have a few peers with proportionally more buckets than others you would expect them to participate more often, and in more searches. A data rebalance can address this.
https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/Rebalancethecluster

Or - if you migrated to Clustering or single site->multisite it could be searches running over old pre-migration data which may only exist on a subset of indexers. https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/Migratetomultisite

For High CPU indexing conditions:
Check that forwarders are evenly targeting all peers, its not uncommon in deployments that have grown to find multiple outputs.conf with differing indexer targets. Obviously, this means that not all indexers participate in the process, and can even cause data balance issues as above. Indexer discovery can help address this. https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/indexerdiscovery

Also check that you don't have any local props/transforms on the high use boxes that don't exist elsewhere in the cluster. (Use the master to distribute all configs) Whilst this can be a problem, you would normally expect to see it have an impact in very high indexing environments.

If my comment helps, please give it a thumbs up!

gcusello
SplunkTrust
SplunkTrust

Probably!
check if there are peaks to identify eventual scheduled searches.

Ciao.
Giuseppe

0 Karma