How to troubleshoot if splunk is down

vrmandadi · ‎06-27-2018

one of our search head is down ,and not able to log in into it,what is the quick way to fix it and on which component of splunk this troubleshooting needs to be done

woodcock · ‎07-05-2018

Run a health check on your MC. Most Search Head crashes are do to low RAM, which the Health Check validates. THP should be off and if it isn't, this causes inefficient RAM usage. The Health Check will note this, too.

vrmandadi · ‎07-05-2018

I have a problem with pid file
there was a pid file splunk.pid under /opt/splunk/var/run/splunk/splunk.pid
that needed to be removed and then done a restart.

What is reason for this and how can it be overcome

woodcock · ‎07-19-2018

The pid file is created when Splunk starts to do several things, chief among those, a foolproof way to provide evidence of a crash on last run. It normally goes like this on startup.

Check for pid file
   if present, there was a crash, do some additional diagnosting/logging, then delete it.
Start splunkd, write pid into pid file.

What can happen is that the user that previously ran Splunk was root and so the pid file was owned by root and splunk crashed. Now you are coming in as some other non-root user and you do not have the permissions that allow you to delete/overwrite this file so Splunk cannot start. So you have 2 solutions:

ALWAYS start splunk as the same user, preferably not root.
Prevent Splunk from crashing.

The former is very easy, the latter...

Azeemering · ‎06-27-2018

http://docs.splunk.com/Documentation/Splunk/7.1.1/Troubleshooting/Whatsinhere

Splunk start --debug ( do not leave it running like this whe all is normal again)

Splunk crashed are often because of low memory.
Check if the number of open file descriptors and max user processes are sufficient.
Check if you have enough disk space.
See if there is a crash*log file For example crash-2018-06-27-20:57:26.log

Make note of the time when the crash occured. Check the splunkd_stderr.log
Check the other logs at that time what splunk was doing:

audit.log
splunkd.log
metrics.log
web*.log

Check:

index=_internal sourcetype=splunkd_crash_log | stats count by host

Check:

index=_internal sourcetype=splunkd loader message=*xml

Check if a user did some ridiculous search:

index=_audit action="search" (id=* OR search_id=*) | eval user=if(user=="n/a",null(),user) | stats max(total_run_time) 
as total_run_time first(user) as user by search_id | stats count perc95(total_run_time) median(total_run_time) by user

vrmandadi · ‎07-05-2018

Hello Azeemering thank you for your response,when i try to re start splunk it fails to do and give a message about splunk.pid file

I go to /opt/splunk/var/run/splunk/splunk.pid and remove it manually to re start splunk.

how to overcome this problem and is there a way to do automatically other than doing manually deleting it

Azeemering · ‎07-19-2018

See: https://answers.splunk.com/answers/172058/splunk-is-not-starting-due-to-presence-of-pid-file.html

cpetterborg · ‎06-27-2018

You can't log into Splunk, or you can log into the search head machine? Is it working still for people that have already logged in?

If it is just Splunk that you can't log in, get on the search head machine (command line if linux, windows UI if windows), and see if the process is running (you can do a splunk status if you want for that). If it is running, then go check the $SPLUNK_HOME/var/log/splunk/splunkd.log file for errors.

If you can't log onto the search head machine (not in Splunk), you will have to get access.

vrmandadi · ‎07-05-2018

Hello cpetterborg,

I have problem doing a start as it says error reading pid file

How to troubleshoot if splunk is down

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

How to troubleshoot if splunk is down

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...