Splunk Enterprise

Terminated with signal 4 (core dumped) when upgrading

leahs
Explorer

Hi,

I'm upgrading my cluster master from version 8.0.3 to 8.2.1. After installing the new version over the old deployment and starting splunk, I get "ERROR: pid xxx terminated with signal 4 (core dumped)", and the Splunk web server is not available. How can I fix this?

My Splunk environment is running on AWS Linux EC2s. This is the information i have about the OS:

NAME="Amazon Linux AMI"
VERSION="2018.03"
ID_LIKE="rhel fedora"

Labels (3)
Tags (3)
1 Solution

jasen_m
Engager

Hello, we encountered this same issue when we also tried to upgrade and I actually saw your post when we were looking for a solution.

Ultimately, we were able to find out that there was a conflict with something else we had installed on the host.

Can you try to look in /var/log/messages after trying to start splunk?

In our case, we observed the following message:
Aug 7 00:22:54 <splunk_host> kernel: splunkd[53109] trap invalid opcode ip:XXXXXXX sp:XXXXXXXXXX error:0 in <non_splunk_agent>.so[XXXXXX+XXXXX]

After we saw that, we were able to uninstall the agent and splunk was able to start normally on the search head.

View solution in original post

pignardh
Engager

Hello,
I had the same problem and found this solution:


You must deactivate the "usePreloadedPstacks" parameter in
$SPLUNK_HOME/etc/system/local/servers.conf
[watchdog]
usePreloadedPstacks = false

 

anirbandasdeb
Path Finder

Were you able to get any resolution for this @leahs ?

Im facing the exact same issue while doing a dry run in preparation for the actual upgrade.. 

 

 

 

Checking prerequisites...
        Checking http port [9443]: open
        Checking mgmt port [8089]: open
        Checking appserver port [127.0.0.1:8065]: open
        Checking kvstore port [8191]: open
        Checking configuration... Done.
        Checking critical directories...        Done
        Checking indexes...
                Validated: _audit _internal _introspection _metrics _metrics_rollup _telemetry _thefishbucket history main summary
        Done
        Checking filesystem compatibility...  Done
        Checking conf files for problems...
        Done
        Checking default conf files for edits...
        Validating installed files against hashes from '/opt/splunk/splunk-8.2.1-ddff1c41e5cf-linux-2.6-x86_64-manifest'
        All installed files intact.
        Done
        Checking replication_port port [23456]: open
All preliminary checks passed.

Starting splunk server daemon (splunkd)...
ERROR: pid 2283 terminated with signal 4 (core dumped)
Done
 [  OK  ]

Waiting for web server at http://127.0.0.1:9443 to be available..............................................................................................................................................................................................................^C
(dev2) splunk@splnkhfvm-xxx:~ $ ./bin/splunk version
Splunk 8.2.1 (build ddff1c41e5cf)

 

 

I am upgrading an SH Cluster from v8.0.5 to 8.2.1

 

 

(dev2) splunk@splnkhfvm-xxx:~ $ hostnamectl
   Static hostname: splnkhfvm-qvjj5.xxx
         Icon name: computer-vm
           Chassis: vm
        Machine ID: xxx
           Boot ID: xxx
    Virtualization: kvm
  Operating System: Red Hat Enterprise Linux
       CPE OS Name: cpe:/o:redhat:enterprise_linux:7.9:GA:server
            Kernel: Linux 3.10.0-1160.15.2.el7.x86_64
      Architecture: x86-64
(dev2) splunk@splnkhfvm-xxx:~ $

 

 


Entry in /var/log/messages log file

 

Aug  3 18:04:45 splnkhfvm-xxx sudo:  anirban : TTY=pts/1 ; PWD=/home/dasd ; USER=splunk ; COMMAND=/opt/splunk/bin/splunk start
Aug  3 18:04:49 splnkhfvm-qvjj5 kernel: [7308988.389979] traps: splunkd[5860] trap invalid opcode ip:7f6a374da3bf sp:7ffda55435b0 error:0 in liboneagentproc.so[7f6a374c6000+84000]

 

 

0 Karma

vzabawski
Path Finder

To whoever might find this interesting.

I've recently encountered with such issue after installing Dynatrace OneAgent chart in the same k8s cluster with Splunk. In my case I wasn't able to just delete the liboneagentproc.so file, so I had to uninstall Dynatrace chart and then delete /opt/oneagent directory in pod where Splunk runs.

Link to upgrade notes: https://docs.splunk.com/Documentation/Splunk/8.2.1/ReleaseNotes/Knownissues

0 Karma

jonoped
New Member

Hi @leahs ,

Did you manage to find a solution to this?
I am receiving the same error, though mine is a fresh build of an on-prem VM not cloud.

Thanks,

Jono

0 Karma

leahs
Explorer

Hi @jonoped,

I haven't solved the problem yet. The instances are running on this OS:

NAME="Amazon Linux AMI"
VERSION="2018.03"
ID_LIKE="rhel fedora"

What OS are you running?

Lea

0 Karma

jasen_m
Engager

Hello, we encountered this same issue when we also tried to upgrade and I actually saw your post when we were looking for a solution.

Ultimately, we were able to find out that there was a conflict with something else we had installed on the host.

Can you try to look in /var/log/messages after trying to start splunk?

In our case, we observed the following message:
Aug 7 00:22:54 <splunk_host> kernel: splunkd[53109] trap invalid opcode ip:XXXXXXX sp:XXXXXXXXXX error:0 in <non_splunk_agent>.so[XXXXXX+XXXXX]

After we saw that, we were able to uninstall the agent and splunk was able to start normally on the search head.

leahs
Explorer

Hi @jasen_m ,

Thank you so much!! This solved the issue. 

0 Karma
Get Updates on the Splunk Community!

Splunk Classroom Chronicles: Training Tales and Testimonials

Welcome to the "Splunk Classroom Chronicles" series, created to help curious, career-minded learners get ...

Access Tokens Page - New & Improved

Splunk Observability Cloud recently launched an improved design for the access tokens page for better ...

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

&#x1f342; Fall into November with a fresh lineup of Community Office Hours, Tech Talks, and Webinars we’ve ...