Splunk Search
Highlighted

How to troubleshoot if splunk is down

Builder

one of our search head is down ,and not able to log in into it,what is the quick way to fix it and on which component of splunk this troubleshooting needs to be done

0 Karma
Highlighted

Re: How to troubleshoot if splunk is down

SplunkTrust
SplunkTrust

You can't log into Splunk, or you can log into the search head machine? Is it working still for people that have already logged in?

If it is just Splunk that you can't log in, get on the search head machine (command line if linux, windows UI if windows), and see if the process is running (you can do a splunk status if you want for that). If it is running, then go check the $SPLUNK_HOME/var/log/splunk/splunkd.log file for errors.

If you can't log onto the search head machine (not in Splunk), you will have to get access.

0 Karma
Highlighted

Re: How to troubleshoot if splunk is down

Builder

Hello cpetterborg,

I have problem doing a start as it says error reading pid file

0 Karma
Highlighted

Re: How to troubleshoot if splunk is down

Builder

http://docs.splunk.com/Documentation/Splunk/7.1.1/Troubleshooting/Whatsinhere

Splunk start --debug ( do not leave it running like this whe all is normal again)

Splunk crashed are often because of low memory.
Check if the number of open file descriptors and max user processes are sufficient.
Check if you have enough disk space.
See if there is a crash*log file For example crash-2018-06-27-20:57:26.log

Make note of the time when the crash occured. Check the splunkd_stderr.log
Check the other logs at that time what splunk was doing:

audit.log
splunkd.log
metrics.log
web*.log

Check:

index=_internal sourcetype=splunkd_crash_log | stats count by host

Check:

index=_internal sourcetype=splunkd loader message=*xml

Check if a user did some ridiculous search:

index=_audit action="search" (id=* OR search_id=*) | eval user=if(user=="n/a",null(),user) | stats max(total_run_time) 
as total_run_time first(user) as user by search_id | stats count perc95(total_run_time) median(total_run_time) by user
0 Karma
Highlighted

Re: How to troubleshoot if splunk is down

Builder

Hello Azeemering thank you for your response,when i try to re start splunk it fails to do and give a message about splunk.pid file

I go to /opt/splunk/var/run/splunk/splunk.pid and remove it manually to re start splunk.

how to overcome this problem and is there a way to do automatically other than doing manually deleting it

0 Karma
Highlighted

Re: How to troubleshoot if splunk is down

Builder
0 Karma
Highlighted

Re: How to troubleshoot if splunk is down

Esteemed Legend

Run a health check on your MC. Most Search Head crashes are do to low RAM, which the Health Check validates. THP should be off and if it isn't, this causes inefficient RAM usage. The Health Check will note this, too.

0 Karma
Highlighted

Re: How to troubleshoot if splunk is down

Builder

I have a problem with pid file
there was a pid file splunk.pid under /opt/splunk/var/run/splunk/splunk.pid
that needed to be removed and then done a restart.

What is reason for this and how can it be overcome

0 Karma
Highlighted

Re: How to troubleshoot if splunk is down

Esteemed Legend

The pid file is created when Splunk starts to do several things, chief among those, a foolproof way to provide evidence of a crash on last run. It normally goes like this on startup.

Check for pid file
   if present, there was a crash, do some additional diagnosting/logging, then delete it.
Start splunkd, write pid into pid file.

What can happen is that the user that previously ran Splunk was root and so the pid file was owned by root and splunk crashed. Now you are coming in as some other non-root user and you do not have the permissions that allow you to delete/overwrite this file so Splunk cannot start. So you have 2 solutions:

ALWAYS start splunk as the same user, preferably not root.
Prevent Splunk from crashing.

The former is very easy, the latter...

0 Karma