Deployment Architecture

Search head cluster captain overloaded with jobs waiting at 100% or 0% post upgrade from 6.2.x to 6.4.1 (On Solaris)

Communicator

Indexer cluster with 30 peers
Search head cluster of 5 instances connected to the indexer cluster
Configured a schedule search that runs every 15min

Noticed that job inspector has some searches that just says:

| [earliest time=1/1/70 12:00:00 AM, latest time]

with dispatched time at: 1/1/70 12:00:00 AM, and status Running 0%
The SHC members are on Splunk version 6.4.1 on Solaris.

0 Karma
1 Solution

Splunk Employee
Splunk Employee

I have seen that happen when some stale splunk process from before the restart were left behind.

Please try the following

Stop SHC member.
used preap to kill orphan --https://docs.oracle.com/cd/E23823_01/html/816-5165/preap-1.html
start splunk

View solution in original post

Getting the same issue,
Restarting may fix my issue.
Can someone from splunk team help us to understand the root cause of the issue. So that we can avoid this in future?

0 Karma

Motivator

+1 to @rbal[Splunk] 's answer. We have seen jobs from 10 days ago in the list and still @ Running 0%. Unfortunate thing is, it will eventually skip the next occurrences (Atleast happened in my case).

Rolling restart and get the stuck jobs cleared and start your day fresh 🙂

Thanks,
Raghav

0 Karma

Splunk Employee
Splunk Employee

I have seen that happen when some stale splunk process from before the restart were left behind.

Please try the following

Stop SHC member.
used preap to kill orphan --https://docs.oracle.com/cd/E23823_01/html/816-5165/preap-1.html
start splunk

View solution in original post