This has happened twice so far in a week.
Users begin contacting me that they are unable to log in.
Both times I ran a netstat and I saw:
ADMIN: Exiting (status = 0) ...
tcp 0 0 127.0.0.1:31337 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50259 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:30259 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8089 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8666 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8444 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8222 0.0.0.0:* LISTEN
tcp 0 0 :::22 :::* LISTEN
unix 2 [ ACC ] STREAM LISTENING 7956 /var/run/dbus/system_bus_socket
unix 2 [ ACC ] STREAM LISTENING 7997 /var/run/acpid.socket
unix 2 [ ACC ] STREAM LISTENING 7894 /dev/log
splunkd 3517 root 3u IPv4 8307 TCP *:8089 (LISTEN)
splunkd 3517 root 38u IPv4 8399 TCP *:30259 (LISTEN)
splunkd 3517 root 40u IPv4 8401 TCP *:50259 (LISTEN)
I checked multiple files within /var/log/*.
Both times I restarted splunk and checked the netstat and https is now up.
ADMIN: Exiting (status = 0) ...
tcp 0 0 127.0.0.1:31337 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:50259 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:30259 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8089 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8666 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8444 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8222 0.0.0.0:* LISTEN
tcp 0 0 :::22 :::* LISTEN
unix 2 [ ACC ] STREAM LISTENING 7956 /var/run/dbus/system_bus_socket
unix 2 [ ACC ] STREAM LISTENING 7997 /var/run/acpid.socket
unix 2 [ ACC ] STREAM LISTENING 7894 /dev/log
splunkd 22529 root 3u IPv4 345937 TCP *:8089 (LISTEN)
splunkd 22529 root 39u IPv4 346009 TCP *:30259 (LISTEN)
splunkd 22529 root 41u IPv4 346011 TCP *:50259 (LISTEN)
Where should I start looking to trouble shoot this issue?
Thanks.
What version/platform are you running? There was a known bug that was fixed in 4.0.10 that affected some customers running splunkweb in SSL mode.
The first step to troubleshooting splunkweb issues is to inspect the web_service.log at the time of failure. It is very rare that Python will just die, so we want to rule out any unanticipated issues. If the Python process is truly just disappearing, then a support ticket should be filed instead.
Ok here is what I am seeing in the web_service.log
871 ERROR module:59 - Splunkd daemon is not responding: ('[Errno 24] Too many open files',)
It appears that there is a limit to open files? Where would I go to modify this limit?
thanks
4.0.9 build 74233 Dell 710 CentOS 710
I will check both web_service.log and web_access.log. thanks.
This is not a currently known problem. File a case with http://splunk.com/support.
If you want to investigate independently, try looking at web_access.log for exceptions or other errors.
Case open thanks.
try service --status-all when it goes down next time. Make sure apache is running? You can also view the web_access.log in your splunk/var/log/splunk dir.