Monitoring Splunk

High CPU Usage of one indexer cluster peer node

klowk
Path Finder

Hi community,

one of our indexer cluster peer nodes has a very high cpu consumption by splunkd process see attached screenshot.

klowk_0-1596720912153.png

We have an indexer cluster that consists of two peer nodes and one master server. Only one of the peer nodes has this performance issue.

A few Universal Forwarders and Heavy Forwarders get sometimes following timeout from the peer node which has the high load.

08-06-2020 15:02:47.001 +0200 WARN TcpOutputProc - Cooked connection to ip=<ip> timed out

I already looked to the Monitoring Console. In the graph CPU Usage by Process Class i see 117 percent for splunkd server.

klowk_1-1596720970367.png

But what means splunkd server here?
Where can i get further information to solve this issue?

Thanks for your advice.

kind regards
Kathrin

0 Karma

shivanshu1593
Builder

Looks like your indexers are getting overloaded. You may want to consider adding another set of Indexers to distribute the load evenly.

For the CPU utilisation, check the following:

1. Check the status of Indexers' queues. Most likely they'd be blocked due to overload. If yes, then please check the Indexing rate and the amount of data coming in, along with the dats quality. All these factors contribute to high CPU utilisation.

2. Check the values of Noproc and Nofile (ulimit -u and ulimit -n) and consider raising the values. Please post the output here, so that we can have a look as well.

3. Right next to the CPU graph, check for the memory graph as well. What does that tell you? Is the memory mainly consumed by searches or something else? If it's searches, you've got a lot of searched running at the same time, overwhelming your servers.

4. Are the IOPS of your server enough? The incoming load vs how much the server can handle, parse and write to disk also contributes to CPU utilisation.

5. Check if Real time searches are being run in your environment. If yes, please get rid of them. They never end and take a heavy toll on your Indexers.

6. Under search, search - instance, look for top 20 memory consuming searches. Validate them if they are running from a long time and using correct syntax or not. Anything with index=* and All time as timerange also contribute to all this consumption (I ended up disabling real time searches for everyone, all time timestamp for everyone except admins)

7. What are the error messages in splunkd. You may want to look at them as well.

Hope this helps,

S

Thank you,
Shiv
###If you found the answer helpful, kindly consider upvoting/accepting it as the answer as it helps other Splunkers find the solutions to similar issues###
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...