Getting Data In

Getting "unclean shutdown detected" alerts and "fast recovery" prompts on splunkd start-up after 4.2 upgrade

hexx
Splunk Employee
Splunk Employee

Since I upgraded my indexer to 4.2, I very frequently see the following output on Splunk start-up :

[root@splunk-indexer]# /opt/splunk/bin/splunk start
Splunk> The IT Search Engine.
Checking prerequisites... PID 3763 was not running. removing stale pid file... done. Checking http port [10.1.12.212:8000]: open Checking mgmt port [10.1.12.212:8089]: open Checking configuration... Done. Checking index directory... Validated databases: _audit _blocksignature _internal _thefishbucket history main summary
Splunk has detected an unclean shutdown. Recovery should be attempted in order to ensure accurate search results, but this may take a while. If you choose 'No' here, you will have the option to recover again upon restarting Splunk, however that recovery may take significantly longer.
Perform faster recovery now? [Y/n] Y

This seems to happen fairly often when Splunk is stopped on system shutdown by the "stop" procedure in the "/etc/init.d/splunk" start-up script, but also sometimes even after I manually shut down Splunk with "splunk stop".

What exactly triggers the unclean shutdown warning and the recovery prompt?

Why is this occurring so often in 4.2?

Also, how can I change the behavior of splunkd so that it automatically accepts to run the recovery when the server is restarted?

Tags (3)
1 Solution

Genti
Splunk Employee
Splunk Employee

As the log itself mentions this is caused by an unclean shutdown. An unclean shutdown might happen when:
- the splunkd process is killed
- the whole server goes down
- the splunkd process crashed

We have received many notification of this happening also when actually issuing a ./splunk stop or a ./splunk restart
One thing to mention though is that in most cases i have seen (if not in all of the cases i have seen) this is a by product of splunk also taking a very long time to shut down. (very long time = 6 minutes and change in seconds). This causes splunk to actually send a forced stop and kill the process. Hence, the next time you start the daemon it complains about the unclean shutdown.
This is related to a known issue SPL-37407 with the fix expected in the next maintenance release, 4.2.1 of the product.

This actually has been an issue for a while and not introduced in the 4.2 version, however it is now visible because the 4.2 release of the product actually checks for previous shutdowns as well as performs a check on all the indexes and databases. Hence why the user is able to see it only now.

In order to have splunk start without human interaction then you might want to run the following:

./splunk start --answer-yes

Check this answers post for more thorough instructions in how to add these attributes in your init.d/splunk start script.

View solution in original post

staze
Path Finder

I'm running 4.2.4 on Mac OS 10.6.8, and I still see this error every time splunk starts (whether it's after a reboot, or a "restart" of splunk.

And yes, splunk takes a very long time to shutdown when asked to. So, this bug still exists somewhere...

0 Karma

Michael
Contributor

Bug in "splunk enable boot-start"?

I seem to experience this same thing when I restart my Linux (Redhat) systems. It appears that it's not shutting down properly -- and when it attempts to start back up, it sees an improperly closed database -- then, while it appears that the process is running, it's not until you issue a manual "splunk start" you'll see the "unclean shutdown errorr" and be prompted to fix it.

The "fix" noted above merely shows you how to do this ("./splunk start --answer-yes
") -- and gets you going, but doesn't fix the underlying issue. You can easily re-create this by simply killing the process (without a "splunk stop" then doing a "splunk start" -- this is essentially what's happening when you reboot.

In looking at the rc.d files, there's startup commands issued (/etc/rc.d/rc3.d/S90splunk) for Splunk, but no shutdown (i.e., missing: /etc/rc.d/rc3.d/K90splunk). In fact, I find a start file in rc3.d, rc4.d, and rc5.d -- but no shutdown ones.

You have to create your own K file to shut it down (/opt/splunk/bin/splunk stop).

I would think that a database system with it's own mechanism for creating startups, would also provide for a clean shutdown in init.d. I'm chalking this up to a faulty "splunk enable boot-start" -- I'm calling it a bug.

0 Karma

Genti
Splunk Employee
Splunk Employee

As the log itself mentions this is caused by an unclean shutdown. An unclean shutdown might happen when:
- the splunkd process is killed
- the whole server goes down
- the splunkd process crashed

We have received many notification of this happening also when actually issuing a ./splunk stop or a ./splunk restart
One thing to mention though is that in most cases i have seen (if not in all of the cases i have seen) this is a by product of splunk also taking a very long time to shut down. (very long time = 6 minutes and change in seconds). This causes splunk to actually send a forced stop and kill the process. Hence, the next time you start the daemon it complains about the unclean shutdown.
This is related to a known issue SPL-37407 with the fix expected in the next maintenance release, 4.2.1 of the product.

This actually has been an issue for a while and not introduced in the 4.2 version, however it is now visible because the 4.2 release of the product actually checks for previous shutdowns as well as performs a check on all the indexes and databases. Hence why the user is able to see it only now.

In order to have splunk start without human interaction then you might want to run the following:

./splunk start --answer-yes

Check this answers post for more thorough instructions in how to add these attributes in your init.d/splunk start script.

hexx
Splunk Employee
Splunk Employee

As of Splunk 4.2.1, this recovery prompt has been special-cased so that it will be answered in the positive in the case of any unattended splunk start/restart.

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...