Solved: High memory usage by splunk-MonitorNoHandle.exe

sylim_splunk · ‎12-09-2022

We have been experiencing unusually high memory usage on some of our domain controllers. The culprit here is Splunk process splunk-MonitorNoHandle.exe.
Here is the report of the memory usage of the domain controllers:

DC1 splunk-MonitorNoHandle.exe 17724   Services 0   14,993,012 K
DC2 splunk-MonitorNoHandle.exe 53268   Services 0 38,927,688 K
DC3 splunk-MonitorNoHandle.exe   16164   Services 0 43,997,828 K

sylim_splunk · ‎12-09-2022

'splunk-MonitorNoHandle.exe' is designed to hold data when it's not able to send to UF, use unlimited memory and this symptom can happen when there are huge amount of data to forward to the indexers while the forwarding speed by UF is limited.

i) Check the queue status of parsing queue and tcpoutput queue to find which one is getting blocked first.

Parsing queue blocked firstly which means it receives over the capacity - This can happen when maxKBps is throttled to the default, 256 ,change this to 0, unlimited or something your environment allows.
- side effect it can bombard indexers if it's sending unlimited, huge amount of data.

ii) In splunkd log, find any messages showing difficulties in sending data to the next receiving ends:

Even if the parsing pipeline can send more data by increasing maxKBps, if tcpoutput gets blocked then you will see the same issues again.

Below are the example logs that UF has problem in connecting to the indexers ;
11-12-2021 11:00:11.365 -0500 WARN TcpOutputProc - Cooked connection to ip=172.22.1.218:9997 timed out
11-12-2021 11:01:48.391 -0500 WARN TcpOutputProc - Cooked connection to ip=172.22.1.218:9997 timed out
11-12-2021 11:12:28.757 -0500 WARN TcpOutputProc - The TCP output processor has paused the data flow. Forwarding to output group ABC_indexers has been blocked for 500 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

iii) Recommendations :

iii-1) Parsing Queue being always full more often than the tcpoutput queue does - meaning MonitorNoHandle is sending data over the capacity that Parsing process can handle.

This can happen when maxKBps is throttled to the default, 256 , then consider to increase the value according to your traffic size.

http://docs.splunk.com/Documentation/Splunk/latest/Admin/Limitsconf

- side effect it can bombard indexers if it's sending unlimited, huge amount of data.

iii-2) How to set the memory limit used by the modInput, MonitorNoHandle;

[inputproc] in limits.conf
monitornohandle_max_heap_mb=5000
monitornohandle_max_driver_mem_mb=5000
( 5000/5gb can be changed according to your environment)

iii-3) If you find intermittent blockages on tcpoutput queue

This can also contribute to the MonitorNoHandle's memory growth as the Parsing Pipeline can not send as much data as it receives - then MonitorNoHandle.exe has to hold the backlog data within own heap memory that can grow unexpectedly.
Consider to implement asynchronous forwarding so that it can spread the data without pausing the data flow and that way MonitorNoHandle.exe may have less chances to hit the heap limit. You can consult with our PS resources for implementation too.

View solution in original post

sylim_splunk · ‎12-09-2022