I was having problems with one of my heavy forwarders (splunk 6.6.3) running on Windows 2008, so I noted what inputs I had, uninstalled and then installed version 7.0.1. After adding my configurations and restoring my connections, I started Splunk and got the following:
Checking prerequisites... Checking http port : open Checking mgmt port : open Checking appserver port [127.0.0.1:8065]: open Checking kvstore port : open Checking configuration... Done. Checking critical directories... Done Checking indexes... Validated: _audit _internal _introspection _telemetry _thefishbu cket <snip> Done Bypassing local license checks since this instance is configured with a remote license master. Checking filesystem compatibility... Done Checking conf files for problems... Done Checking default conf files for edits... Validating installed files against hashes from 'C:\Program Files\Splunk\ splunk-7.0.1-2b5b15c4ee89-windows-64-manifest' All installed files intact. Done All preliminary checks passed. Starting splunk server daemon (splunkd)... Splunkd: Starting (pid 3540) Done Waiting for web server at https://127.0.0.1:8000 to be available................ ................................................................................ ........................................................................ Done
This took about 30 minutes to before I see "Done". But the web server wasn't running. Checking the log files I had three messages that seemed to relate:
ERROR UiPythonFallback - Appserver running on port 8065 exited unexpectedly: exited with code 1 ERROR UiHttpListener - An applicaiton server has exited unexpectedly, web UI cannot be used until it is restarted WARN UiHttpListener - Web UI now stopped
(yes, application is spelled wrong in the logs)
I didn't have anything on that port when I checked with netstat and via the resource monitor. I tried rebooting to no avail. I found the following posts:
and tried changing those changes, but I still had problems.
After seeing this post: https://answers.splunk.com/answers/616294/unable-to-start-splunk-1.html
I looked around but didn't see a repair option for splunk.
I turned on debuging (http://docs.splunk.com/Documentation/Splunk/latest/Troubleshooting/Enabledebuglogging)
and found the following:
7:00:52.429 PM 03-07-2018 19:00:52.429 -0500 ERROR KVStoreBulletinBoardManager - Failed to start KV Store process. See mongod.log and splunkd.log for details. host = splunk-04 message = Failed to start KV Store process. See mongod.log and splunkd.log for details. 3/7/18 7:00:52.429 PM 03-07-2018 19:00:52.429 -0500 ERROR KVStoreConfigurationProvider - Could not start mongo instance. Initialization failed. host = splunk-04 message = Could not start mongo instance. Initialization failed. 3/7/18 7:00:52.429 PM 03-07-2018 19:00:52.429 -0500 ERROR KVStoreConfigurationProvider - Could not get ping from mongod. host = splunk-04 message = Could not get ping from mongod. 3/7/18 7:00:44.067 PM 03-07-2018 19:00:44.067 -0500 ERROR KVStoreBulletinBoardManager - KV Store changed status to failed. KVStore process terminated. host = splunk-04 message = KV Store changed status to failed. KVStore process terminated. 3/7/18 7:00:44.067 PM 03-07-2018 19:00:44.067 -0500 ERROR KVStoreBulletinBoardManager - KV Store process terminated abnormally (exit code 100, status exited with code 100). See mongod.log and splunkd.log for details. host = splunk-04 message = KV Store process terminated abnormally (exit code 100, status exited with code 100). See mongod.log and splunkd.log for details. 3/7/18 7:00:44.067 PM 03-07-2018 19:00:44.067 -0500 ERROR MongodRunner - mongod exited abnormally (exit code 100, status: exited with code 100) - look at mongod.log to investigate. host = splunk-04 message = mongod exited abnormally (exit code 100, status: exited with code 100) - look at mongod.log to investigate.
Which led me to these posts:
After I changed the permissions on the whole splunk folder, I restarted splunk AGAIN and I have zero ERRORs in my logs, but it STILL takes over 30 minutes to start. And the web interface still won't work.
I then saw these errors:
WARN HttpListener - Socket error from 127.0.0.1 while idling: error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number ERROR UiPythonFallback - Appserver at http://127.0.0.1:8065 never started up!
and searched and found these posts, but they don't seem to be of much help.
It IS forwarding logs, so the basic functionality is there. The logs are windows server logs being collected by wmi calls. But the logs are about 10 minutes behind.
I checked the firewall and for splunkd, I have any any for incoming. But nothing for outgoing. (hopefully that is the simple answer?)
So I'm asking if anyone has any other suggestions?
FYI - I ran into the same issue on one of my two heavy forwarders.
the issue why the port 8000 was not accepting turned out to be an app that i had installed on the heavy forwarder. when i removed the app ( custom app ) the web port was accessible.
upon investigating the app i noticed it had index.conf and i suspect that could have caused the issue, since that app was ment for indexer server, i deleted it from the Heavy forwarder and restarted splunk instance. the web GUI access then started working
In the context of the question, it is a valid answer, just not well articulated.
"Splunk Support advised that Windows 2008 was not supported for Splunk 7.0.1, and suggested migrating to a supported platform"
Is a slightly better worded version.
This sometimes happens when you make changes to indexes.conf, can be by defining
Best approach is to isolate from where the above error message is coming. Undo the recent changes you made on standalone splunk or from deployment server/deployer.
what seems to fix it in my case is clearing out the custom apps from each SH
it looks like one of my apps is messing up the http port or messing up something on the web side,
I deleted the custom apps from my Deployer (/opt/splunk/etc/shcluster/apps)
then ran cluster-bundle to delete the apps from each SH instance, restarted each SH instance manually, and web interface came up on my custom port 8300
bash-4.2$ /opt/splunk/bin/splunk apply shcluster-bundle force -target https://splunksh01.vagrant.local:8089 -auth admin:pw
One thing I've learned from going back and forth with tech support: Splunk v7.x is not approved for Windows 2008 and that is the OS where this particular HF is located. So while Splunk doesn't even have to try and fix this, they said they are looking at the problem.
I have a newer 2016 server and I will slowly migrate everything to that. But since all the forwarding continues to work, I'm not particularly in a hurry.
And lately I've been doing all my admin from the cli rather than the gui anyway....
I did notice that you are NOT running this on a Windows server, but you may want to check the supported versions of whatever linux you are on....
I can confirm I am getting the same issue on my SH cluster, (7.1.1)
issue started happening after I deployed couple of custom apps from my Deployer to the SH cluster
SHeads seem to be in cluster, and working, but getting a hanging startup ,
[root@njosplunksh01 /opt/splunk/bin]# ./splunk stop Stopping splunkd... Shutting down. Please wait, as this may take a few minutes. .. [ OK ] Stopping splunk helpers... [ OK ] Done. [root@njosplunksh01 /opt/splunk/bin]# ./splunk start Splunk> Australian for grep. Checking prerequisites... Checking http port : open Checking mgmt port : open Checking appserver port [127.0.0.1:8065]: open Checking kvstore port : open Checking configuration... Done. Checking critical directories... Done Checking indexes... Validated: _audit _internal _introspection _telemetry _thefishbucket history main summary Done Bypassing local license checks since this instance is configured with a remote license master. Checking filesystem compatibility... Done Checking conf files for problems... Done Checking default conf files for edits... Validating installed files against hashes from '/opt/splunk/splunk-7.1.1-8f0ead9ec3db-linux-2.6-x86_64-manifest' All installed files intact. Done Checking replication_port port : open All preliminary checks passed. Starting splunk server daemon (splunkd)... Done [ OK ] Waiting for web server at https://127.0.0.1:8300 to be available.........................
No errors anywhere, Im reverse proxying Splunk web through Apache (which uses custom certs), and its always been working. Apache has no errors, this is purely on Splunk side.
It wasn't resolved. I upgraded the server, or rather I migrated to a new server running Windows 2016.
My issue was the fact it was running on 2008. If you are running on that OS, you will need to upgrade. If not, then there is a different problem and you will need to either contact support or open a new forum post.
A little more information:
The last time I tried to start, when I went to the web page from the server (https://127.0.0.1:8000) I got the following:
<?xml version="1.0" encoding="UTF-8"?> -<response> -<messages> <msg type="ERROR">Error connecting: Winsock error 10061</msg> </messages> </response>
Searching for this found not much information.
Found this in google cache: https://webcache.googleusercontent.com/search?q=cache:CkGgbZ4NmoIJ:https://support.microsoft.com/en-...
and this on answers: https://answers.splunk.com/answers/41449/winsock-error-10055.html
and this: http://www.altn.com/Support/FAQ/FAQResults/?Number=195 which says: Winsock error 10061 means that the server you are attempting to connect to is actively refusing the connection. This usually results from trying to connect to a service that is inactive on the foreign host.
When I tried again some hours later, I simply got the "page cannot be displayed" error.
When you check netstat -nab, all the proper splunkd processes are listening or established on the correct ports; EXCEPT port 8000.