Monitoring Splunk

Splunk Losing Web UI Access

adam_dixon95
Explorer

Hi,

Since upgrading to Splunk 7.1.0 from Splunk 6.5.0 I've been having issues with splunk losing access to the Web UI after some time.

I restart Splunk via /opt/splunk/bin/splunk restart and it comes back up for quite a while (usually a day) but after a period of time it will go back down again.

I've noticed that when I restart the Splunk service every time it tells me that Splunkd was not running:

splunkd 57690 was not running.
Stopping splunk helpers...

Done.
Stopped helpers.
Removing stale pid file... done.
splunkd is not running.

Splunk> The IT Search Engine.

Checking prerequisites...
        Checking http port [8000]: open
        Checking mgmt port [8089]: open
        Checking appserver port [127.0.0.1:8065]: open
        Checking kvstore port [8191]: open
        Checking configuration...  Done.
        Checking critical directories...        Done
        Checking indexes...
                Validated: _audit _internal _introspection _telemetry _thefishbucket bro cim_modactions cim_summary endpoint_summary firedalerts history ioc main msexchange notable notable_summary os perfmon risk summary threat_activity ubaroute ueba whois windows wineventlog xtreme_contexts
        Done
        Checking filesystem compatibility...  Done
        Checking conf files for problems...
                Invalid key in stanza [syslog:ubaroute] in /opt/splunk/etc/apps/Splunk_TA_ueba/default/outputs.conf, line 7: dropEventsOnQueueFull  (value:  10).
                Your indexes and inputs configurations are not internally consistent. For more information, run 'splunk btool check --debug'
        Done
        Checking default conf files for edits...
        Validating installed files against hashes from '/opt/splunk/splunk-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest'
        All installed files intact.
        Done
All preliminary checks passed.

Starting splunk server daemon (splunkd)...
Done


Waiting for web server at https://127.0.0.1:8000 to be available............... Done

I didn't see anything that stood out in Splunkd.log or Web_access.log - though in syslog I saw the following:

Out of memory: Kill process 31605 (splunkd) score 484 or sacrifice child
May 23 08:05:42 splunk kernel: [244114.815142] Killed process 31605 (splunkd) total-vm:9945680kB, anon-rss:8845420kB, file-rss:0kB

This is becoming quite an issue - any help would be appreciated.

0 Karma

OL
Communicator

It is quite difficult to know exactly what is the problem, but I saw this problem once when there was a conflict with the bucket IDs. Have you scanned for all errors in the splunkd.log?

0 Karma

adam_dixon95
Explorer

Hi,

These are the most recent errors in Splunkd.log that happened around this period:

15  1   05-23-2018 09:28:12.968 -0600 ERROR HttpListener - Handler for /en-US/modules/@4E8ECBCF7F0F0D7AD1FA3361342436F3A19E6A37E099573C0F7432A76B5B12A4/modules-17be7a83a6e3c3c3e5360ea69841a46394a8d1aa.min.css sent a 0 byte response after earlier claiming a Content-Length of 307!
16  1   05-23-2018 09:28:12.968 -0600 ERROR HttpListener - Exception while processing request from 172.20.20.74 for /en-US/modules/@4E8ECBCF7F0F0D7AD1FA3361342436F3A19E6A37E099573C0F7432A76B5B12A4/modules-17be7a83a6e3c3c3e5360ea69841a46394a8d1aa.min.css: Connection closed by peer
17  1   05-23-2018 08:32:52.114 -0600 ERROR HttpListener - Handler for /en-US/static/@4E8ECBCF7F0F0D7AD1FA3361342436F3A19E6A37E099573C0F7432A76B5B12A4/fonts/inconsolata-regular.woff sent a 0 byte response after earlier claiming a Content-Length of 32744!
18  1   05-23-2018 08:32:52.114 -0600 ERROR HttpListener - Exception while processing request from 172.20.20.74 for /en-US/static/@4E8ECBCF7F0F0D7AD1FA3361342436F3A19E6A37E099573C0F7432A76B5B12A4/fonts/inconsolata-regular.woff: Connection closed by peer
19  1   05-23-2018 08:30:11.128 -0600 ERROR KVStoreAdminHandler - An error occurred.
20  1   05-23-2018 08:30:11.128 -0600 ERROR KVStorageProvider - An error occurred during the last operation ('replSetGetStatus', domain: '15', code: '13053'): No suitable servers found (`serverSelectionTryOnce` set): [connection closed calling ismaster on '127.0.0.1:8191']
21  1   05-23-2018 08:29:40.816 -0600 ERROR AdminManagerExternal - External handler failed with code '1' and output: 'REST ERROR[500]: Splunkd internal error - Fail to get capabilities of sessioned user'. See splunkd.log for stderr output.
22  1   05-23-2018 08:29:40.806 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': BaseException: REST ERROR[500]: Splunkd internal error - Fail to get capabilities of sessioned user
23  1   05-23-2018 08:29:40.806 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': msgx='Fail to get capabilities of sessioned user',
24  1   05-23-2018 08:29:40.805 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': File "/opt/splunk/lib/python2.7/site-packages/splunk/admin.py", line 128, in init
25  1   05-23-2018 08:29:40.805 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': admin.init(base.ResourceHandler(Servers), admin.CONTEXT_APP_AND_USER)
26  1   05-23-2018 08:29:40.805 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': File "/opt/splunk/etc/apps/Splunk_TA_nessus/bin/ta_tenable_rh_sc_servers.py", line 24, in <module>
27  1   05-23-2018 08:29:40.805 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': File "/opt/splunk/bin/runScript.py", line 78, in <module>
28  1   05-23-2018 08:29:40.805 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': Traceback (most recent call last):
29  1   05-23-2018 07:39:49.108 -0600 ERROR HttpListener - Handler for /en-US/app/SplunkEnterpriseSecuritySuite/ess_security_posture?hideEdit=true&hideTitle=true&hideSplunkBar=true&hideAppBar=true&targetTop=true sent a 0 byte response after earlier claiming a Content-Length of 4650!
30  1   05-23-2018 07:39:49.108 -0600 ERROR HttpListener - Exception while processing request from 172.20.20.74 for /en-US/app/SplunkEnterpriseSecuritySuite/ess_security_posture?hideEdit=true&hideTitle=true&hideSplunkBar=true&hideAppBar=true&targetTop=true: Connection closed by peer

Apologies for the wall of text!

Thanks.

0 Karma

OL
Communicator

2 things I would check:

  • Permission on the Splunk folder (are you running it with the splunk user? sometimes people are running it with root and this can break things). I would stop Splunk, chown -R splunk:splunk /opt/splunk (double check the user/group name and the path)
  • KVStore status to see if it is healthy
0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...