So I'm very much not an export on these things, but I think something has gone horribly wrong with my ports....
about a week ago my splunk we stopped loading unless I explicitly specified the port in the URL like
http://myurl.com:8000/en-US/manager/search
I'm not periodically having splunk web go down all together and getting:
503 Service Unavailable
Return to Splunk home page
The splunkd daemon cannot be reached by splunkweb. Check that there are no blocked network ports or that splunkd is still running.
When I check what ports are being used (with netstat -a) I see none of the normal localhost:8089 lines. I can restart Splunk and have it come back up fine (well using port 8000...) but it will go down again seemingly randomly.
Any help (or where to start looking / googling) would be great.
Thanks
Install the SOS app and look at the warnings for the splunkd/web response time dashboard.
When splunkd response goes above the 30sec, the splunkweb will throw a timeout. Look when it happens and find the pattern.
To reduce the load test :
Check the domain controllers used to process authentication requests.
In our case, the ldap strategy applicable to users experiencing this problem at logon specified the dns name of the authenticating domain. Splunk does an "A" record type DNS query for servers associated with the specified domain and sometimes picks a server in list which is in a far off, bandwidth challenged, branch office. In those cases, the authentication process does not complete before splunk web times out and interprets the delay as an error. We will likely create another DNS alias whose hosts are constrained to a pool of domain controllers nearest to our search heads, and reference that alias instead within our ldap strategies.
Install the SOS app and look at the warnings for the splunkd/web response time dashboard.
When splunkd response goes above the 30sec, the splunkweb will throw a timeout. Look when it happens and find the pattern.
To reduce the load test :
So I'm still not sure whats wrong with my splunk system, but your right my response times keep going well above 30 seconds, and this got me pointed in the right direction.