Solved: splunkd using too much RAM

jtrucks · ‎03-11-2013

I suspect scheduled searches are either the cause or a symptom of the cause of splunkd using in the neighborhood of 13G RAM and almost 60G swap. There are no new searches, and this happens several times per day. The web UI becomes useless as it times out talking to splunkd, but everything under the hood works eventually and then the server load returns to normal, as does splunkd's memory footprint.

I disabled the scheduler to see if this would fix it, but I won't know until at least tomorrow.

How can I narrow down what could be causing this? Any hints of things to look for in the logs or processes to strace etc? The server runs on a linux box.

Thanks,
Jesse

Clarification: I specifically meant the main splunkd process with my reference to splunkd above. No other process looks like a memory or CPU problem directly using ps, top, etc.

--
Jesse Trucks
Minister of Magic

jtrucks · ‎03-20-2013

It turns out that splunkd is generating > 20k threads, so really this is a problem about something other than RAM.

This is found with: ps -Lef

--
Jesse Trucks
Minister of Magic

View solution in original post

jtrucks · ‎03-20-2013

It turns out that splunkd is generating > 20k threads, so really this is a problem about something other than RAM.

This is found with: ps -Lef

--
Jesse Trucks
Minister of Magic

jtrucks · ‎12-04-2013

It was version 5.0.2 and 5.0.2.4 fixed it.

--
Jesse Trucks
Minister of Magic

the_wolverine · ‎12-04-2013

Hi, could you share version information? I don't see any information about which version this was seen in and which point release fixed it.

jtrucks · ‎07-25-2013

This turned out to be a bug that was fixed in a subsequent point release.

--
Jesse Trucks
Minister of Magic

jtrucks · ‎03-14-2013

If the web UI doesn't work, SoS isn't useful.

--
Jesse Trucks
Minister of Magic

jrodman · ‎03-12-2013

Start by paying attention as to which processes are using all the memory. Is it main splunkd or search processes? (SOS can help if this isn't easy to do yourself -- but doing it yourself with moderate time granularity will give you more data, eg while true; do sleep 60; date; ps aux |grep splunk; done

If you're running Splunk 5.x on Linux you can generate memory profiles using jemalloc by just switching on its MALLOC_CONF environment control variable. I'll try to enrich this tomorrow with the specifics.

If it's Solaris you can just switch on the DEBUG flags for libumem, similarly. Windows is a tougher road.

If this is an indexer on 4.x, one large known cause is bundle replication. Check search heads for lots of large files in /etc . Lookups are the usual culprit.

In general, this should probably be a support case. We should strive to have a better external page about memory growth, but still it's quite difficult to pin stuff down without knowing the the process is, the searches, the data, the growth rate, the version.

martin_mueller · ‎03-12-2013

Instead of ps, take a look at $SPLUNK_HOME/bin/splunk list jobs

bmacias84 · ‎03-11-2013

This is a good call to use Splunk on Splunk aka SOS. In addition turn on permon for process contain splunk in name. You should be able to correlate event and search there.

jtrucks · ‎03-11-2013

How do I translate what I see via ps to a scheduled/saved search inside Splunk?

--
Jesse Trucks
Minister of Magic

martin_mueller · ‎03-11-2013

First thing I'd do is look at the jobs running during such a memory peak.

For example, I've recently managed to make the splunk on my laptop use 15G simply by asking a very unintelligent search involving multikv and a few huge almost-but-not-quite-table-like events.

splunkd using too much RAM

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?