Monitoring Splunk

splunkd using too much RAM

jtrucks
Splunk Employee
Splunk Employee

I suspect scheduled searches are either the cause or a symptom of the cause of splunkd using in the neighborhood of 13G RAM and almost 60G swap. There are no new searches, and this happens several times per day. The web UI becomes useless as it times out talking to splunkd, but everything under the hood works eventually and then the server load returns to normal, as does splunkd's memory footprint.

I disabled the scheduler to see if this would fix it, but I won't know until at least tomorrow.

How can I narrow down what could be causing this? Any hints of things to look for in the logs or processes to strace etc? The server runs on a linux box.

Thanks,
Jesse

Clarification: I specifically meant the main splunkd process with my reference to splunkd above. No other process looks like a memory or CPU problem directly using ps, top, etc.

Tags (2)
1 Solution

jtrucks
Splunk Employee
Splunk Employee

It turns out that splunkd is generating > 20k threads, so really this is a problem about something other than RAM.

This is found with: ps -Lef

View solution in original post

jtrucks
Splunk Employee
Splunk Employee

It turns out that splunkd is generating > 20k threads, so really this is a problem about something other than RAM.

This is found with: ps -Lef

View solution in original post

jtrucks
Splunk Employee
Splunk Employee

It was version 5.0.2 and 5.0.2.4 fixed it.

0 Karma

the_wolverine
Champion

Hi, could you share version information? I don't see any information about which version this was seen in and which point release fixed it.

0 Karma

jtrucks
Splunk Employee
Splunk Employee

This turned out to be a bug that was fixed in a subsequent point release.

jtrucks
Splunk Employee
Splunk Employee

If the web UI doesn't work, SoS isn't useful.

0 Karma

jrodman
Splunk Employee
Splunk Employee

Start by paying attention as to which processes are using all the memory. Is it main splunkd or search processes? (SOS can help if this isn't easy to do yourself -- but doing it yourself with moderate time granularity will give you more data, eg while true; do sleep 60; date; ps aux |grep splunk; done

If you're running Splunk 5.x on Linux you can generate memory profiles using jemalloc by just switching on its MALLOC_CONF environment control variable. I'll try to enrich this tomorrow with the specifics.

If it's Solaris you can just switch on the DEBUG flags for libumem, similarly. Windows is a tougher road.

If this is an indexer on 4.x, one large known cause is bundle replication. Check search heads for lots of large files in /etc . Lookups are the usual culprit.

In general, this should probably be a support case. We should strive to have a better external page about memory growth, but still it's quite difficult to pin stuff down without knowing the the process is, the searches, the data, the growth rate, the version.

martin_mueller
SplunkTrust
SplunkTrust

Instead of ps, take a look at $SPLUNK_HOME/bin/splunk list jobs

0 Karma

bmacias84
Champion

This is a good call to use Splunk on Splunk aka SOS. In addition turn on permon for process contain splunk in name. You should be able to correlate event and search there.

0 Karma

jtrucks
Splunk Employee
Splunk Employee

How do I translate what I see via ps to a scheduled/saved search inside Splunk?

martin_mueller
SplunkTrust
SplunkTrust

First thing I'd do is look at the jobs running during such a memory peak.

For example, I've recently managed to make the splunk on my laptop use 15G simply by asking a very unintelligent search involving multikv and a few huge almost-but-not-quite-table-like events.

Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!