Security

Waiting for web server to be available for over 30 minutes

reswob4
Builder

I was having problems with one of my heavy forwarders (splunk 6.6.3) running on Windows 2008, so I noted what inputs I had, uninstalled and then installed version 7.0.1. After adding my configurations and restoring my connections, I started Splunk and got the following:

Checking prerequisites...
        Checking http port [8000]: open
        Checking mgmt port [8089]: open
        Checking appserver port [127.0.0.1:8065]: open
        Checking kvstore port [8191]: open
        Checking configuration...  Done.
        Checking critical directories...        Done
        Checking indexes...
                Validated: _audit _internal _introspection _telemetry _thefishbu
cket <snip>
        Done


Bypassing local license checks since this instance is configured with a remote license master.

        Checking filesystem compatibility...  Done
        Checking conf files for problems...
        Done
        Checking default conf files for edits...
        Validating installed files against hashes from 'C:\Program Files\Splunk\
splunk-7.0.1-2b5b15c4ee89-windows-64-manifest'
        All installed files intact.
        Done
All preliminary checks passed.

Starting splunk server daemon (splunkd)...

Splunkd: Starting (pid 3540)
Done


Waiting for web server at https://127.0.0.1:8000 to be available................
................................................................................
........................................................................ Done

This took about 30 minutes to before I see "Done". But the web server wasn't running. Checking the log files I had three messages that seemed to relate:

ERROR UiPythonFallback - Appserver running on port 8065 exited unexpectedly: exited with code 1
ERROR UiHttpListener - An applicaiton server has exited unexpectedly, web UI cannot be used until it is restarted
WARN  UiHttpListener - Web UI now stopped

(yes, application is spelled wrong in the logs)

I didn't have anything on that port when I checked with netstat and via the resource monitor. I tried rebooting to no avail. I found the following posts:

https://answers.splunk.com/answers/545000/slow-splunkweb-startup-caused-by-splunk-instrument.html

https://answers.splunk.com/answers/563807/why-does-splunkweb-in-662-take-so-long-to-start.html

https://answers.splunk.com/answers/211525/how-to-troubleshoot-why-we-are-getting-appserver-p.html

and tried changing those changes, but I still had problems.

After seeing this post: https://answers.splunk.com/answers/616294/unable-to-start-splunk-1.html
I looked around but didn't see a repair option for splunk.

I turned on debuging (http://docs.splunk.com/Documentation/Splunk/latest/Troubleshooting/Enabledebuglogging)

and found the following:

7:00:52.429 PM  
03-07-2018 19:00:52.429 -0500 ERROR KVStoreBulletinBoardManager - Failed to start KV Store process. See mongod.log and splunkd.log for details.

    host =  splunk-04   
    message =   Failed to start KV Store process. See mongod.log and splunkd.log for details.   

    3/7/18
7:00:52.429 PM  
03-07-2018 19:00:52.429 -0500 ERROR KVStoreConfigurationProvider - Could not start mongo instance. Initialization failed.

    host =  splunk-04   
    message =   Could not start mongo instance. Initialization failed.  

    3/7/18
7:00:52.429 PM  
03-07-2018 19:00:52.429 -0500 ERROR KVStoreConfigurationProvider - Could not get ping from mongod. 

    host =  splunk-04   
    message =   Could not get ping from mongod. 

    3/7/18
7:00:44.067 PM  
03-07-2018 19:00:44.067 -0500 ERROR KVStoreBulletinBoardManager - KV Store changed status to failed. KVStore process terminated.

    host =  splunk-04   
    message =   KV Store changed status to failed. KVStore process terminated.  

    3/7/18
7:00:44.067 PM  
03-07-2018 19:00:44.067 -0500 ERROR KVStoreBulletinBoardManager - KV Store process terminated abnormally (exit code 100, status exited with code 100). See mongod.log and splunkd.log for details.

    host =  splunk-04   
    message =   KV Store process terminated abnormally (exit code 100, status exited with code 100). See mongod.log and splunkd.log for details.    

    3/7/18
7:00:44.067 PM  
03-07-2018 19:00:44.067 -0500 ERROR MongodRunner - mongod exited abnormally (exit code 100, status: exited with code 100) - look at mongod.log to investigate.

    host =  splunk-04   
    message =   mongod exited abnormally (exit code 100, status: exited with code 100) - look at mongod.log to investigate. 

Which led me to these posts:

http://docs.splunk.com/Documentation/Splunk/6.4.2/Admin/StartSplunk#Start_Splunk_Enterprise_on_Windo...

https://answers.splunk.com/answers/514443/after-editing-the-kv-store-for-my-custom-app-why-d.html#an...

After I changed the permissions on the whole splunk folder, I restarted splunk AGAIN and I have zero ERRORs in my logs, but it STILL takes over 30 minutes to start. And the web interface still won't work.

I then saw these errors:

WARN  HttpListener - Socket error from 127.0.0.1 while idling: error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number

ERROR UiPythonFallback - Appserver at http://127.0.0.1:8065 never started up!

and searched and found these posts, but they don't seem to be of much help.

https://answers.splunk.com/answers/7899/splunkweb-fails-to-start-timeout-when-binding-to-port.html

https://answers.splunk.com/answers/507379/how-to-resolve-splunk-web-not-starting-after-the-h.html

It IS forwarding logs, so the basic functionality is there. The logs are windows server logs being collected by wmi calls. But the logs are about 10 minutes behind.

I checked the firewall and for splunkd, I have any any for incoming. But nothing for outgoing. (hopefully that is the simple answer?)

So I'm asking if anyone has any other suggestions?

Thanks

1 Solution

reswob4
Builder

Thanks, but we decided to just migrate to 2016 and decommission that server.

View solution in original post

0 Karma

knulps
Engager

FYI - I ran into the same issue on one of my two heavy forwarders.  

the issue why the  port 8000 was not accepting turned out to be an app that i had installed on the heavy forwarder. when i removed the app ( custom app )  the web port was accessible. 

 

upon investigating the app i noticed it had index.conf and i suspect that could have caused the issue,  since that app was ment for indexer server, i deleted it from the Heavy forwarder and restarted splunk instance. the web GUI access then started working 

0 Karma

Gilush
New Member

I ran into the same error using CentOS 7

what I did was to run the setup as root and it did the trick for me.

0 Karma

reswob4
Builder

Thanks, but we decided to just migrate to 2016 and decommission that server.

View solution in original post

0 Karma

neelamsantosh
Path Finder

Thats not the solution/answer.
Don't mark this Accepted answer without proper solution (decommission that server).

0 Karma

nickhills
Ultra Champion

In the context of the question, it is a valid answer, just not well articulated.

"Splunk Support advised that Windows 2008 was not supported for Splunk 7.0.1, and suggested migrating to a supported platform"

Is a slightly better worded version.

If my comment helps, please give it a thumbs up!
0 Karma

reswob4
Builder

Sorry, I should have been clearer. @nickhillscpl has better wording.

0 Karma

askhat_pernebek
Path Finder

it is not an answer -_-

0 Karma

saurabh_tek11
Communicator

This sometimes happens when you make changes to indexes.conf, can be by defining frozenTimePeriodInSecs and coldToFrozenDir.

Best approach is to isolate from where the above error message is coming. Undo the recent changes you made on standalone splunk or from deployment server/deployer.

0 Karma

perfecto25
Path Finder

what seems to fix it in my case is clearing out the custom apps from each SH

it looks like one of my apps is messing up the http port or messing up something on the web side,

I deleted the custom apps from my Deployer (/opt/splunk/etc/shcluster/apps)

then ran cluster-bundle to delete the apps from each SH instance, restarted each SH instance manually, and web interface came up on my custom port 8300

bash-4.2$ /opt/splunk/bin/splunk apply shcluster-bundle force -target https://splunksh01.vagrant.local:8089 -auth admin:pw
0 Karma

reswob4
Builder

One thing I've learned from going back and forth with tech support: Splunk v7.x is not approved for Windows 2008 and that is the OS where this particular HF is located. So while Splunk doesn't even have to try and fix this, they said they are looking at the problem.

I have a newer 2016 server and I will slowly migrate everything to that. But since all the forwarding continues to work, I'm not particularly in a hurry.

And lately I've been doing all my admin from the cli rather than the gui anyway....

I did notice that you are NOT running this on a Windows server, but you may want to check the supported versions of whatever linux you are on....

0 Karma

perfecto25
Path Finder

yes, in my case running splunk on Centos 7.3

0 Karma

perfecto25
Path Finder

I can confirm I am getting the same issue on my SH cluster, (7.1.1)

issue started happening after I deployed couple of custom apps from my Deployer to the SH cluster

SHeads seem to be in cluster, and working, but getting a hanging startup ,

[root@njosplunksh01 /opt/splunk/bin]# ./splunk stop
Stopping splunkd...
Shutting down.  Please wait, as this may take a few minutes.
.. [  OK  ]
Stopping splunk helpers...
 [  OK  ]
Done.
[root@njosplunksh01 /opt/splunk/bin]# ./splunk start

Splunk> Australian for grep.

Checking prerequisites...
    Checking http port [8300]: open
    Checking mgmt port [8089]: open
    Checking appserver port [127.0.0.1:8065]: open
    Checking kvstore port [8191]: open
    Checking configuration...  Done.
    Checking critical directories...    Done
    Checking indexes...
        Validated: _audit _internal _introspection _telemetry _thefishbucket history main summary
    Done


Bypassing local license checks since this instance is configured with a remote license master.

    Checking filesystem compatibility...  Done
    Checking conf files for problems...
    Done
    Checking default conf files for edits...
    Validating installed files against hashes from '/opt/splunk/splunk-7.1.1-8f0ead9ec3db-linux-2.6-x86_64-manifest'
    All installed files intact.
    Done
    Checking replication_port port [8090]: open
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  
Done
 [  OK  ]

Waiting for web server at https://127.0.0.1:8300 to be available.........................

No errors anywhere, Im reverse proxying Splunk web through Apache (which uses custom certs), and its always been working. Apache has no errors, this is purely on Splunk side.

Motoko89
Path Finder

How did you solve it? I am seeing the same issue on SH cluster 7.2.5

0 Karma

reswob4
Builder

It wasn't resolved. I upgraded the server, or rather I migrated to a new server running Windows 2016.

My issue was the fact it was running on 2008. If you are running on that OS, you will need to upgrade. If not, then there is a different problem and you will need to either contact support or open a new forum post.

sorry...

0 Karma

reswob4
Builder

BTW, this has been submitted to tech support and we are still working through troubleshooting.

0 Karma

reswob4
Builder

A little more information:

The last time I tried to start, when I went to the web page from the server (https://127.0.0.1:8000) I got the following:

<?xml version="1.0" encoding="UTF-8"?>

-<response>


-<messages>

<msg type="ERROR">Error connecting: Winsock error 10061</msg>

</messages>

</response>

Searching for this found not much information.

Found this in google cache: https://webcache.googleusercontent.com/search?q=cache:CkGgbZ4NmoIJ:https://support.microsoft.com/en-...

and this on answers: https://answers.splunk.com/answers/41449/winsock-error-10055.html

and this: http://www.altn.com/Support/FAQ/FAQResults/?Number=195 which says: Winsock error 10061 means that the server you are attempting to connect to is actively refusing the connection. This usually results from trying to connect to a service that is inactive on the foreign host.

When I tried again some hours later, I simply got the "page cannot be displayed" error.

When you check netstat -nab, all the proper splunkd processes are listening or established on the correct ports; EXCEPT port 8000.

0 Karma

p_gurav
Champion

Can you try changing web port from 8000 to some other port?

0 Karma
Take the 2021 Splunk Career Survey

Help us learn about how Splunk has
impacted your career by taking the 2021 Splunk Career Survey.

Earn $50 in Amazon cash!