Solved: Possible memory leak in 4.3.6

jakubincloud · ‎07-05-2013

Hello,

I have an environment with 2 search heads and 2 indexers. There are 70ish forwarders which send around 50 MB data a day.

lsof -i :port | wc -l # shows established connections
70

On one search head there are 6 realtime searches, which can be seen on 'ps' screen

ps -Lef
(...) splunkd search --id=rt_1373011410.1218 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1218 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1219 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1219 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1218 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1219 --maxbuckets=0

However I see increasing number of splunkd threads, now sitting at number 39

ps -Lef | grep -v grep | grep "splunkd -p 8089" | wc -l
39

Furthermore there are couple of threads for mrsparkle

python -O /opt/splunk/lib/python2.7/site-packages/splunk/appserver/mrsparkle/root.py restart

The problem is that Splunk starts using the whole memory. Mem Used Percentage graph can be seen here

( edit: For your information Indexers have 34 GB memory each )

You can see manual restarts, and forced ones when memory usage gets to 100% and splunk is killed because of oom.

All splunk instances have been updated to 4.3.6 and have Deployment Monitor App disabled.

Is there something else I can do to check what causes the memory leak ?

jakubincloud · ‎07-06-2013

Answering my own question, showing all the steps I have done.

Upgrading volume for hot/warm backups from 250 IOPS to 1200 IOPS, didn't sort the memore usage patterns. But anyway high iops volumes are good things
On Search heads and indexers I had unix app that produced some errors, but it didn't make any problems before so I didn't look at it at that time. When I removed unix app ( and other defaults one ) it helped a bit, but after couple of minutes memory was starting to go up.
- upgraded from 4.3.6 to 5.0.3. The process was straightforward, dpkg -i the package. No more memory leaks. Memory stays at 2-4%.

View solution in original post

jakubincloud · ‎07-06-2013

Answering my own question, showing all the steps I have done.

Upgrading volume for hot/warm backups from 250 IOPS to 1200 IOPS, didn't sort the memore usage patterns. But anyway high iops volumes are good things
On Search heads and indexers I had unix app that produced some errors, but it didn't make any problems before so I didn't look at it at that time. When I removed unix app ( and other defaults one ) it helped a bit, but after couple of minutes memory was starting to go up.
- upgraded from 4.3.6 to 5.0.3. The process was straightforward, dpkg -i the package. No more memory leaks. Memory stays at 2-4%.

krugger · ‎07-05-2013

You memory usage patterns seem weird to me because you are processing virtually no data. I have over 50Gb coming in each day and only have 8Gb memory and it doesn't run out of memory.

Unless you are doing some massive complex processing on the inbound information it isn't normal to run out of memory with only 50Mb per day. I would say there is some sort of loop behaviour going on in your splunk infra-structure, but without knowing how things are setup and what you are doing with the data, it is quite hard to give you good guidance.

jakubincloud · ‎07-06-2013

Thank you for your answer. Upgrading from 4.3.6 to 5.0.3 solved the problem

Possible memory leak in 4.3.6

Can’t make it to .conf25? Join us online!

Community Content Calendar, September edition

Splunkbase Unveils New App Listing Management Public Preview

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you a member of the Splunk Community?

Possible memory leak in 4.3.6

Can’t make it to .conf25? Join us online!

Community Content Calendar, September edition

Splunkbase Unveils New App Listing Management Public Preview

Leveraging Automated Threat Analysis Across the Splunk Ecosystem