You can't log into Splunk, or you can log into the search head machine? Is it working still for people that have already logged in?
If it is just Splunk that you can't log in, get on the search head machine (command line if linux, windows UI if windows), and see if the process is running (you can do a
splunk status if you want for that). If it is running, then go check the
$SPLUNK_HOME/var/log/splunk/splunkd.log file for errors.
If you can't log onto the search head machine (not in Splunk), you will have to get access.
Splunk start --debug ( do not leave it running like this whe all is normal again)
Splunk crashed are often because of low memory.
Check if the number of open file descriptors and max user processes are sufficient.
Check if you have enough disk space.
See if there is a crash*log file For example crash-2018-06-27-20:57:26.log
Make note of the time when the crash occured. Check the splunkd_stderr.log
Check the other logs at that time what splunk was doing:
index=_internal sourcetype=splunkd_crash_log | stats count by host
index=_internal sourcetype=splunkd loader message=*xml
Check if a user did some ridiculous search:
index=_audit action="search" (id=* OR search_id=*) | eval user=if(user=="n/a",null(),user) | stats max(total_run_time) as total_run_time first(user) as user by search_id | stats count perc95(total_run_time) median(total_run_time) by user
Hello Azeemering thank you for your response,when i try to re start splunk it fails to do and give a message about splunk.pid file
I go to /opt/splunk/var/run/splunk/splunk.pid and remove it manually to re start splunk.
how to overcome this problem and is there a way to do automatically other than doing manually deleting it
Run a health check on your MC. Most Search Head crashes are do to low RAM, which the Health Check validates. THP should be off and if it isn't, this causes inefficient RAM usage. The Health Check will note this, too.
I have a problem with pid file
there was a pid file splunk.pid under /opt/splunk/var/run/splunk/splunk.pid
that needed to be removed and then done a restart.
What is reason for this and how can it be overcome
The pid file is created when Splunk starts to do several things, chief among those, a foolproof way to provide evidence of a crash on last run. It normally goes like this on startup.
Check for pid file if present, there was a crash, do some additional diagnosting/logging, then delete it. Start splunkd, write pid into pid file.
What can happen is that the user that previously ran Splunk was
root and so the pid file was owned by
root and splunk crashed. Now you are coming in as some other non-root user and you do not have the permissions that allow you to delete/overwrite this file so Splunk cannot start. So you have 2 solutions:
ALWAYS start splunk as the same user, preferably not
Prevent Splunk from crashing.
The former is very easy, the latter...