Solved: Number of appserver.py processes increasing, causi...

hrawat · ‎02-14-2024

Search Head appears to have a rogue python process ( appserver.py) that slowly eats away all memory on the system, then eventually causes an OOM, which requires a manual restart of splunkd, then the issue starts slowly creeping up to happen again.

hrawat · ‎02-14-2024

Due to some issue with proper cleanup of idle processes, number of python process ( appserver.py) running on the system constantly grow. Thus due to systemwide memory growth, these stale processes, eventually causes an OOM.

Run following search to find if any search head is impacted by this issue and what % of total system memory these stale processes running more than 24 hours. If these processes using more than 15% of total system memory, then run script to kill stales processes.

index=_introspection host=<all search heads>  appserver.py data.elapsed > 86400
| dedup host, data.pid
| stats dc(data.pid) as cnt sum("data.pct_memory") AS appserver_memory_used by  host
| sort - appserver_memory_used

On linux/unix you can use following script to kill stale processes and reclaim memory.

kill -TERM  $(ps -eo etimes,pid,cmd | awk '{if ( $1 >= 86400) print $2 " " $4 }' |grep appserver.py | awk '{print $1}')

View solution in original post

waechtler_amaso · ‎07-30-2024

I see this behaviour, too, also for another process coming from the ITSI app:

/opt/splunk/etc/apps/SA-ITOA/bin/command_health_monitor.py

Besides killing processes or restarting splunk as a workaround, do you know whether there are efforts to finally resolve this bug?

Thanks, Jan

hrawat · ‎07-30-2024

Splunk 9.3.0 has the fix.

hrawat · ‎02-14-2024

Due to some issue with proper cleanup of idle processes, number of python process ( appserver.py) running on the system constantly grow. Thus due to systemwide memory growth, these stale processes, eventually causes an OOM.

Run following search to find if any search head is impacted by this issue and what % of total system memory these stale processes running more than 24 hours. If these processes using more than 15% of total system memory, then run script to kill stales processes.

index=_introspection host=<all search heads>  appserver.py data.elapsed > 86400
| dedup host, data.pid
| stats dc(data.pid) as cnt sum("data.pct_memory") AS appserver_memory_used by  host
| sort - appserver_memory_used

On linux/unix you can use following script to kill stale processes and reclaim memory.

kill -TERM  $(ps -eo etimes,pid,cmd | awk '{if ( $1 >= 86400) print $2 " " $4 }' |grep appserver.py | awk '{print $1}')

Number of appserver.py processes increasing, causing OOM

search head clustering

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

Preparing your Splunk Environment for OpenSSL3

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector