Installation

Splunk server error and failed to start pid_check.sh to validate splunkd.pid

Path Finder

Hello,

We are having an issue, we have uninstalled a previous Splunk instance on our linux server and now we are trying to re-install Splunk (version 7.3.3). After starting Splunk for the first time with "splunk start" there doesn't seem to be any error at first, we manage to go to the login web page but when we login we receive a 500 server error.

And when we check the status of Splunk with "splunk status" we get the following message:

 

 

Error encountered, failed to start pid_check.sh to validate splunkd.pid. errno=11
Failed to determine if splunkd 5292 was running.
Can't run "btool web list settings --no-log": Resource temporarily unavailable

 

 

Do you know how to solve the problem ? Or how to investigate further ?

Here are some logs from splunkd.log:

 

 

08-24-2020 14:57:23.353 +0200 WARN  ProcessTracker - executable=splunk-optimize failed to start reason='': Resource temporarily unavailable
08-24-2020 15:00:00.003 +0200 INFO  ExecProcessor - setting reschedule_ms=3599997, for command=python /data/splunk/etc/apps/splunk_instrumentation/bin/instrumentation.py
08-24-2020 15:08:17.884 +0200 WARN  Thread - webui: about to throw a ThreadException: pthread_create: Resource temporarily unavailable; 62 threads active
((same logs on repeat...))
08-24-2020 15:10:18.355 +0200 INFO  ProcessTracker - Start process: type=SplunkOptimize idx=_internal procId=199 latency_sec=118.000
08-24-2020 15:10:18.357 +0200 INFO  ProcessTracker - Start process: type=SplunkOptimize idx=_audit procId=200 latency_sec=116.000
08-24-2020 15:10:18.357 +0200 WARN  ProcessTracker - executable=splunk-optimize failed to start reason='': Resource temporarily unavailable
08-24-2020 15:10:19.355 +0200 INFO  ProcessTracker - Start process: type=SplunkOptimize idx=_introspection procId=201 latency_sec=116.999
08-24-2020 15:17:00.654 +0200 ERROR SearchProcessRunner - preforked search=0/3 on process=0/3 caught exception.  completed_searches=0, process_started_ago=0.029, search_started_ago=0.028, search_ended_ago=0.000, total_usage_time=0.028
08-24-2020 15:17:00.654 +0200 ERROR SearchProcessRunner - preforked process=0/3 died on exception: Main Thread: about to throw a ThreadException: pthread_create: Resource temporarily unavailable; 4 threads active
08-24-2020 15:23:44.592 +0200 ERROR SearchProcessRunner - Error reading from preforked process=0/5: Connection reset by peer
08-24-2020 15:23:44.607 +0200 ERROR SearchProcessRunner - preforked search=0/4 on process=0/4 caught exception.  completed_searches=0, process_started_ago=0.029, search_started_ago=0.027, search_ended_ago=0.000, total_usage_time=0.027
08-24-2020 15:23:44.607 +0200 ERROR SearchProcessRunner - preforked process=0/4 died on exception: Main Thread: about to throw a ThreadException: pthread_create: Resource temporarily unavailable; 4 threads active

 

 

It seems like there is problem with the threads maybe ? We have never encountered that before.

Thank you vey much !

Labels (5)
0 Karma
1 Solution

Path Finder

We managed to fix the issue by changing the ulimits in /etc/security/limits.conf:

* hard core 0
* hard maxlogins 10
* soft nofile 65535
* hard nofile 65535
* soft nproc 20480
* hard nproc 20480
* soft fsize unlimited
* hard fsize unlimited

Then we stopped splunk, exited our SSH session, and after reconnecting via SSH and restarting Splunk it works like a charm, no more issues. So the problem was that our ulimits were not properly configured.

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Hi

When you said "uninstalled" what this actually means and what was the old version? Are you also removed all splunk related files which are not part of rpm etc?

Basically this could mean that you have just remove rpm, but left all other files e.g. SPLUNK_HOME/var/.../splunk.pid and when you are trying to install it again installer confused as there are some files left but not all?

r. Ismo

0 Karma

Path Finder

Splunk 7.3.3 was installed on the machine, we had to uninstall it and re-install it (same version). And yes we made sure to: stop splunk, uninstall it with rpm and remove everything from $SPLUNK_HOME afterwards, so there was no file left...

0 Karma

SplunkTrust
SplunkTrust

Weird.

This has worked for me w/o any issues.

systemctl stop splunk OR su - splunk -c "/opt/splunk/bin/splunk stop -f"
yum remove -y splunk && rm -fr /opt/splunk
yum install -y ./splunk......

 You have stopped first running splunk processes? And there haven't been anything under /opt/splunk especially var/run...

r. Ismo

0 Karma

Path Finder

We managed to fix the issue by changing the ulimits in /etc/security/limits.conf:

* hard core 0
* hard maxlogins 10
* soft nofile 65535
* hard nofile 65535
* soft nproc 20480
* hard nproc 20480
* soft fsize unlimited
* hard fsize unlimited

Then we stopped splunk, exited our SSH session, and after reconnecting via SSH and restarting Splunk it works like a charm, no more issues. So the problem was that our ulimits were not properly configured.

View solution in original post

0 Karma