Hi,
I'm upgrading my cluster master from version 8.0.3 to 8.2.1. After installing the new version over the old deployment and starting splunk, I get "ERROR: pid xxx terminated with signal 4 (core dumped)", and the Splunk web server is not available. How can I fix this?
My Splunk environment is running on AWS Linux EC2s. This is the information i have about the OS:
NAME="Amazon Linux AMI"
VERSION="2018.03"
ID_LIKE="rhel fedora"
Hello, we encountered this same issue when we also tried to upgrade and I actually saw your post when we were looking for a solution.
Ultimately, we were able to find out that there was a conflict with something else we had installed on the host.
Can you try to look in /var/log/messages after trying to start splunk?
In our case, we observed the following message:
Aug 7 00:22:54 <splunk_host> kernel: splunkd[53109] trap invalid opcode ip:XXXXXXX sp:XXXXXXXXXX error:0 in <non_splunk_agent>.so[XXXXXX+XXXXX]
After we saw that, we were able to uninstall the agent and splunk was able to start normally on the search head.
Hello,
I had the same problem and found this solution:
You must deactivate the "usePreloadedPstacks" parameter in
$SPLUNK_HOME/etc/system/local/servers.conf
[watchdog]
usePreloadedPstacks = false
Were you able to get any resolution for this @leahs ?
Im facing the exact same issue while doing a dry run in preparation for the actual upgrade..
Checking prerequisites...
Checking http port [9443]: open
Checking mgmt port [8089]: open
Checking appserver port [127.0.0.1:8065]: open
Checking kvstore port [8191]: open
Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _internal _introspection _metrics _metrics_rollup _telemetry _thefishbucket history main summary
Done
Checking filesystem compatibility... Done
Checking conf files for problems...
Done
Checking default conf files for edits...
Validating installed files against hashes from '/opt/splunk/splunk-8.2.1-ddff1c41e5cf-linux-2.6-x86_64-manifest'
All installed files intact.
Done
Checking replication_port port [23456]: open
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
ERROR: pid 2283 terminated with signal 4 (core dumped)
Done
[ OK ]
Waiting for web server at http://127.0.0.1:9443 to be available..............................................................................................................................................................................................................^C
(dev2) splunk@splnkhfvm-xxx:~ $ ./bin/splunk version
Splunk 8.2.1 (build ddff1c41e5cf)
I am upgrading an SH Cluster from v8.0.5 to 8.2.1
(dev2) splunk@splnkhfvm-xxx:~ $ hostnamectl
Static hostname: splnkhfvm-qvjj5.xxx
Icon name: computer-vm
Chassis: vm
Machine ID: xxx
Boot ID: xxx
Virtualization: kvm
Operating System: Red Hat Enterprise Linux
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.9:GA:server
Kernel: Linux 3.10.0-1160.15.2.el7.x86_64
Architecture: x86-64
(dev2) splunk@splnkhfvm-xxx:~ $
Entry in /var/log/messages log file
Aug 3 18:04:45 splnkhfvm-xxx sudo: anirban : TTY=pts/1 ; PWD=/home/dasd ; USER=splunk ; COMMAND=/opt/splunk/bin/splunk start
Aug 3 18:04:49 splnkhfvm-qvjj5 kernel: [7308988.389979] traps: splunkd[5860] trap invalid opcode ip:7f6a374da3bf sp:7ffda55435b0 error:0 in liboneagentproc.so[7f6a374c6000+84000]
To whoever might find this interesting.
I've recently encountered with such issue after installing Dynatrace OneAgent chart in the same k8s cluster with Splunk. In my case I wasn't able to just delete the liboneagentproc.so file, so I had to uninstall Dynatrace chart and then delete /opt/oneagent directory in pod where Splunk runs.
Link to upgrade notes: https://docs.splunk.com/Documentation/Splunk/8.2.1/ReleaseNotes/Knownissues
Hi @leahs ,
Did you manage to find a solution to this?
I am receiving the same error, though mine is a fresh build of an on-prem VM not cloud.
Thanks,
Jono
Hi @jonoped,
I haven't solved the problem yet. The instances are running on this OS:
NAME="Amazon Linux AMI"
VERSION="2018.03"
ID_LIKE="rhel fedora"
What OS are you running?
Lea
Hello, we encountered this same issue when we also tried to upgrade and I actually saw your post when we were looking for a solution.
Ultimately, we were able to find out that there was a conflict with something else we had installed on the host.
Can you try to look in /var/log/messages after trying to start splunk?
In our case, we observed the following message:
Aug 7 00:22:54 <splunk_host> kernel: splunkd[53109] trap invalid opcode ip:XXXXXXX sp:XXXXXXXXXX error:0 in <non_splunk_agent>.so[XXXXXX+XXXXX]
After we saw that, we were able to uninstall the agent and splunk was able to start normally on the search head.
Hi @jasen_m ,
Thank you so much!! This solved the issue.