Monitoring Splunk

Possible memory leak in 4.3.6

jakubincloud
Explorer

Hello,

I have an environment with 2 search heads and 2 indexers. There are 70ish forwarders which send around 50 MB data a day.

lsof -i :port | wc -l # shows established connections
70

On one search head there are 6 realtime searches, which can be seen on 'ps' screen

ps -Lef
(...) splunkd search --id=rt_1373011410.1218 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1218 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1219 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1219 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1218 --maxbuckets=0
(...) splunkd search --id=rt_1373011410.1219 --maxbuckets=0

However I see increasing number of splunkd threads, now sitting at number 39

ps -Lef | grep -v grep | grep "splunkd -p 8089" | wc -l
39

Furthermore there are couple of threads for mrsparkle

python -O /opt/splunk/lib/python2.7/site-packages/splunk/appserver/mrsparkle/root.py restart

The problem is that Splunk starts using the whole memory. Mem Used Percentage graph can be seen here

alt text

( edit: For your information Indexers have 34 GB memory each )

You can see manual restarts, and forced ones when memory usage gets to 100% and splunk is killed because of oom.

All splunk instances have been updated to 4.3.6 and have Deployment Monitor App disabled.

Is there something else I can do to check what causes the memory leak ?

Tags (4)
0 Karma
1 Solution

jakubincloud
Explorer

Answering my own question, showing all the steps I have done.

  • Upgrading volume for hot/warm backups from 250 IOPS to 1200 IOPS, didn't sort the memore usage patterns. But anyway high iops volumes are good things

  • On Search heads and indexers I had unix app that produced some errors, but it didn't make any problems before so I didn't look at it at that time. When I removed unix app ( and other defaults one ) it helped a bit, but after couple of minutes memory was starting to go up.

    • upgraded from 4.3.6 to 5.0.3. The process was straightforward, dpkg -i the package. No more memory leaks. Memory stays at 2-4%.

alt text

View solution in original post

0 Karma

jakubincloud
Explorer

Answering my own question, showing all the steps I have done.

  • Upgrading volume for hot/warm backups from 250 IOPS to 1200 IOPS, didn't sort the memore usage patterns. But anyway high iops volumes are good things

  • On Search heads and indexers I had unix app that produced some errors, but it didn't make any problems before so I didn't look at it at that time. When I removed unix app ( and other defaults one ) it helped a bit, but after couple of minutes memory was starting to go up.

    • upgraded from 4.3.6 to 5.0.3. The process was straightforward, dpkg -i the package. No more memory leaks. Memory stays at 2-4%.

alt text

0 Karma

krugger
Communicator

You memory usage patterns seem weird to me because you are processing virtually no data. I have over 50Gb coming in each day and only have 8Gb memory and it doesn't run out of memory.

Unless you are doing some massive complex processing on the inbound information it isn't normal to run out of memory with only 50Mb per day. I would say there is some sort of loop behaviour going on in your splunk infra-structure, but without knowing how things are setup and what you are doing with the data, it is quite hard to give you good guidance.

0 Karma

jakubincloud
Explorer

Thank you for your answer. Upgrading from 4.3.6 to 5.0.3 solved the problem

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...