Monitoring Splunk

Splunk Losing Web UI Access

Explorer

Hi,

Since upgrading to Splunk 7.1.0 from Splunk 6.5.0 I've been having issues with splunk losing access to the Web UI after some time.

I restart Splunk via /opt/splunk/bin/splunk restart and it comes back up for quite a while (usually a day) but after a period of time it will go back down again.

I've noticed that when I restart the Splunk service every time it tells me that Splunkd was not running:

splunkd 57690 was not running.
Stopping splunk helpers...

Done.
Stopped helpers.
Removing stale pid file... done.
splunkd is not running.

Splunk> The IT Search Engine.

Checking prerequisites...
        Checking http port [8000]: open
        Checking mgmt port [8089]: open
        Checking appserver port [127.0.0.1:8065]: open
        Checking kvstore port [8191]: open
        Checking configuration...  Done.
        Checking critical directories...        Done
        Checking indexes...
                Validated: _audit _internal _introspection _telemetry _thefishbucket bro cim_modactions cim_summary endpoint_summary firedalerts history ioc main msexchange notable notable_summary os perfmon risk summary threat_activity ubaroute ueba whois windows wineventlog xtreme_contexts
        Done
        Checking filesystem compatibility...  Done
        Checking conf files for problems...
                Invalid key in stanza [syslog:ubaroute] in /opt/splunk/etc/apps/Splunk_TA_ueba/default/outputs.conf, line 7: dropEventsOnQueueFull  (value:  10).
                Your indexes and inputs configurations are not internally consistent. For more information, run 'splunk btool check --debug'
        Done
        Checking default conf files for edits...
        Validating installed files against hashes from '/opt/splunk/splunk-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest'
        All installed files intact.
        Done
All preliminary checks passed.

Starting splunk server daemon (splunkd)...
Done


Waiting for web server at https://127.0.0.1:8000 to be available............... Done

I didn't see anything that stood out in Splunkd.log or Web_access.log - though in syslog I saw the following:

Out of memory: Kill process 31605 (splunkd) score 484 or sacrifice child
May 23 08:05:42 splunk kernel: [244114.815142] Killed process 31605 (splunkd) total-vm:9945680kB, anon-rss:8845420kB, file-rss:0kB

This is becoming quite an issue - any help would be appreciated.

0 Karma

Communicator

It is quite difficult to know exactly what is the problem, but I saw this problem once when there was a conflict with the bucket IDs. Have you scanned for all errors in the splunkd.log?

0 Karma

Explorer

Hi,

These are the most recent errors in Splunkd.log that happened around this period:

15  1   05-23-2018 09:28:12.968 -0600 ERROR HttpListener - Handler for /en-US/modules/@4E8ECBCF7F0F0D7AD1FA3361342436F3A19E6A37E099573C0F7432A76B5B12A4/modules-17be7a83a6e3c3c3e5360ea69841a46394a8d1aa.min.css sent a 0 byte response after earlier claiming a Content-Length of 307!
16  1   05-23-2018 09:28:12.968 -0600 ERROR HttpListener - Exception while processing request from 172.20.20.74 for /en-US/modules/@4E8ECBCF7F0F0D7AD1FA3361342436F3A19E6A37E099573C0F7432A76B5B12A4/modules-17be7a83a6e3c3c3e5360ea69841a46394a8d1aa.min.css: Connection closed by peer
17  1   05-23-2018 08:32:52.114 -0600 ERROR HttpListener - Handler for /en-US/static/@4E8ECBCF7F0F0D7AD1FA3361342436F3A19E6A37E099573C0F7432A76B5B12A4/fonts/inconsolata-regular.woff sent a 0 byte response after earlier claiming a Content-Length of 32744!
18  1   05-23-2018 08:32:52.114 -0600 ERROR HttpListener - Exception while processing request from 172.20.20.74 for /en-US/static/@4E8ECBCF7F0F0D7AD1FA3361342436F3A19E6A37E099573C0F7432A76B5B12A4/fonts/inconsolata-regular.woff: Connection closed by peer
19  1   05-23-2018 08:30:11.128 -0600 ERROR KVStoreAdminHandler - An error occurred.
20  1   05-23-2018 08:30:11.128 -0600 ERROR KVStorageProvider - An error occurred during the last operation ('replSetGetStatus', domain: '15', code: '13053'): No suitable servers found (`serverSelectionTryOnce` set): [connection closed calling ismaster on '127.0.0.1:8191']
21  1   05-23-2018 08:29:40.816 -0600 ERROR AdminManagerExternal - External handler failed with code '1' and output: 'REST ERROR[500]: Splunkd internal error - Fail to get capabilities of sessioned user'. See splunkd.log for stderr output.
22  1   05-23-2018 08:29:40.806 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': BaseException: REST ERROR[500]: Splunkd internal error - Fail to get capabilities of sessioned user
23  1   05-23-2018 08:29:40.806 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': msgx='Fail to get capabilities of sessioned user',
24  1   05-23-2018 08:29:40.805 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': File "/opt/splunk/lib/python2.7/site-packages/splunk/admin.py", line 128, in init
25  1   05-23-2018 08:29:40.805 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': admin.init(base.ResourceHandler(Servers), admin.CONTEXT_APP_AND_USER)
26  1   05-23-2018 08:29:40.805 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': File "/opt/splunk/etc/apps/Splunk_TA_nessus/bin/ta_tenable_rh_sc_servers.py", line 24, in <module>
27  1   05-23-2018 08:29:40.805 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': File "/opt/splunk/bin/runScript.py", line 78, in <module>
28  1   05-23-2018 08:29:40.805 -0600 ERROR ScriptRunner - stderr from '/opt/splunk/bin/python /opt/splunk/bin/runScript.py execute': Traceback (most recent call last):
29  1   05-23-2018 07:39:49.108 -0600 ERROR HttpListener - Handler for /en-US/app/SplunkEnterpriseSecuritySuite/ess_security_posture?hideEdit=true&hideTitle=true&hideSplunkBar=true&hideAppBar=true&targetTop=true sent a 0 byte response after earlier claiming a Content-Length of 4650!
30  1   05-23-2018 07:39:49.108 -0600 ERROR HttpListener - Exception while processing request from 172.20.20.74 for /en-US/app/SplunkEnterpriseSecuritySuite/ess_security_posture?hideEdit=true&hideTitle=true&hideSplunkBar=true&hideAppBar=true&targetTop=true: Connection closed by peer

Apologies for the wall of text!

Thanks.

0 Karma

Communicator

2 things I would check:

  • Permission on the Splunk folder (are you running it with the splunk user? sometimes people are running it with root and this can break things). I would stop Splunk, chown -R splunk:splunk /opt/splunk (double check the user/group name and the path)
  • KVStore status to see if it is healthy
0 Karma