Monitoring Splunk

Why Splunk Service Gets Stopped On Search Head unexpectedly?

Path Finder

Hello Folk!

Please help me in resolving issues.

Splunk service on Search Head get stopped unexpectedly and that too rapidly?
While checking in the var/log/messages/ it is giving the error as "Out Of Memory"

When Splunk Service is Running:
total used free shared buffers cached
Mem: 12182120 11419240 762880 12 104664 5188940
-/+ buffers/cache: 6125636 6056484
Swap: 2064380 2064376 4

Labels (1)
Tags (3)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Hi,

seems like the machine ran out of memory and then the Splunk process got killed, which can happen on Linux. Not sure about Windows memory handling.

Maybe you have a long running search (or even multiple) that don't finish and thus used all of your available RAM.

You also don't want a machine to start swapping when Splunk is running on it. Swapping means basically no more RAM is available.

View solution in original post

0 Karma

Motivator

Verify your ulimit settings are correct for the splunk daemon.

If you are running Splunk via systemd, make sure the memory limit is set correctly within your unit file within the [Service] stanza

/etc/systemd/system/multi-user.target.wants/Splunkd.service

[Service]
...
MemoryLimit=100G
...
0 Karma

SplunkTrust
SplunkTrust

Hi,

seems like the machine ran out of memory and then the Splunk process got killed, which can happen on Linux. Not sure about Windows memory handling.

Maybe you have a long running search (or even multiple) that don't finish and thus used all of your available RAM.

You also don't want a machine to start swapping when Splunk is running on it. Swapping means basically no more RAM is available.

View solution in original post

0 Karma

Path Finder

@skalliger , exactly this is what is happening.
I want to know, can i get those saved searches which are long running, and occupying the memory for a long range of time on Search Head?
As increasing RAM is a solution, but need to check all my Splunk Environment is healthy.

0 Karma

SplunkTrust
SplunkTrust

Well, overkilling as in increasing hardware is always a workaround but doesn't fix the underlying problem. You always want to achieve a healthy environment before you further increase your resources.

Are you using the Monitoring Console? If not, I strongly advise to do so, it grants you an overview of what is happening in your Splunk deployment. You can see long-running searches and then optimise them.

There could be different reasons for long-running searches

  1. most basic; bad SPL - in which case the solution would be to improve the SPL (avoid using map, append, transactions, joins when not necessary)
  2. A lot of data needs to be searched (like, many millions), taking a long time to finish. You may want to either reduce the data that needs to be searched or speed up the searches (datamodels, summary indexes, accelerated reports).
  3. You are searching "old" data which takes a long time because either your storage is slow or indexers are having a lot of load on them (every search created on a search head gets distributed to your indexers).

Next to your Monitoring Console someone created a dashboard which is there for finding slow and/or ineffective searches. It's called "Extended Search Reporting", created by cerby (I think) and found at automine's github: https://gist.github.com/automine/06cdf246416223dacfa4edd895d0b7de

Skalli

Path Finder

@skalliger Thanks much!
That was really a gr8 help 🙂

0 Karma