Recently in the afternoons, we see high cpu spikes on the indexer cluster and some indexers reach 100% of cpu at some point. How can we detect what causes these spikes? Memory and indexer queues are just perfect.
check CPU usage using the Splunk Monitoring Console App [Settings -- Monitoring Console -- Resource Usage -- Machine -- CPU usage 90 percentile], maybe there are heavy scheduled searches that absorbe all CPUs.
Remember that each search and subsearch take a CPU and release it at the end, so if you have many heavy searches or real time searches, maybe you use all your CPUs.
On one machine - A, the CPU usage 90 percentile is reaching close to 100% in the past 4 hours and another server, B is at much lower levels.
Looking at one chart above, Average Load Average, and A is at much higher load than B. That's indexing load, right?
For High CPU search conditions:
If you have indexers with significantly higher load than others in a cluster, it's worth checking that your data is evenly balanced across the peers. If you have a few peers with proportionally more buckets than others you would expect them to participate more often, and in more searches. A data rebalance can address this.
Or - if you migrated to Clustering or single site->multisite it could be searches running over old pre-migration data which may only exist on a subset of indexers. https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/Migratetomultisite
For High CPU indexing conditions:
Check that forwarders are evenly targeting all peers, its not uncommon in deployments that have grown to find multiple outputs.conf with differing indexer targets. Obviously, this means that not all indexers participate in the process, and can even cause data balance issues as above. Indexer discovery can help address this. https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/indexerdiscovery
Also check that you don't have any local props/transforms on the high use boxes that don't exist elsewhere in the cluster. (Use the master to distribute all configs) Whilst this can be a problem, you would normally expect to see it have an impact in very high indexing environments.