Having issues with some members of our 6.2 platform where issuing a splunk restart ends up not being able to complete because the mongod process does not exit properly.
Is this a necessary component of 6.2 if we're not running Hunk ... and should it then be disabled?
Can anyone please provide details on how to identify identify in a Linux environment which KV/mongod process is hung? and whether manually closing/restarting it is a viable work around until the bug gets fixed? Thank you!
What I have been doing is to do a "ps aux| grep splunk" or "ps aux|grep mongod" after a failed restart. Netstat for the KV port works fine too.
Splunk does spit out a message on startup of the KV port being bound, so you should be able to check for that as well.
To date, i have had no noticeable issues from killing the leftover mongod process, and splunk starts normallly after that. There is not a pid file created for the mongod process that I can find, but it would not be difficult to add a stanza to the stop portion of the /etc/init.d/splunk file that looks for a pid associated with the KV store port, and kill that process after splunk stops, or the splunk_stop script for that matter.
We have the same issue while evaluating the search head cluster on top of an index cluster with 6.2.1 before going live in our new productive splunk infrastructure which looks like:
After the deploment of applications into the test search head cluster (using the deployer also located on the deployment server) some of the search heads doesn'r start.
This would be a problem for our productive splunk infrastructrue, because according to the search head cluster documentation existing search heads can't me migrated into a cluster.
Looks like we have to postpone going live with the productive infra otherwise we would need to reinstall all search heads again before setting up the cluster.
As far as I can see this problem isalready known since around begin of november last year.
Is there already a planned availability date known for the patch?
Actually KV Store IS required for many of the splunk applications now. Enterprise Security, the windows Infrastructure apps, Exchange, etc. not sure how many others.
I have this issue continually, but only on my search cluster members. The other search heads that i have running kvstore always seem to terminate the KV/mongod process fine, but the cluster members routinely do not. Sadly, this is not a 100% failure. I have been working with Splunk engineers on this for a month and to date i have no answers.
I cannot disable KVstore, but cannot keep my search cluster (which aside from some pain is a REALLY awesome feature and will be coming to ES soon, which makes it even more important to me to find a solution.
I had a suggestion from a co-worker that there are known issues around mongod not terminating due to some NUMA issues, but i haven't yet found anything illuminating.
If someone has found a resolution or workaround for the KV Store staying running during a splunk restart, i would love to hear it.
Splunk does have issue where if mongod process did not shutdown correctly and try to start again, since Splunk does not try to kill off running mongod instance. You could manually kill them off and restart Splunk.
KV Store is not a necessary component to run Splunk, however, some app might use it to keep track of states.
Also to reiterate, the way to disable KV store is by editing server.conf:
disabled = true
jlin > you are correct in saying that if you disable "kvstore" you will not see issye if orphan mongod process.
Not many application use Kvstore, one easy way to check if the envieomnet is using kvstore is to look at Collections.conf file using btool
./splunk cmd btool Collections list --debug
if this is almsot empty- that would mean you have not deployed any app using KV store.
Looks like disabling is good enough workaround for you. However, it'd be really nice to have some closure on the cause of mongod not wanting to die. Did you look at index=_internal to try and find the cause?