Splunk Enterprise

Skipped searches - Searchable rolling restart or upgrade is in progress

arielpconsolaci
Path Finder

Hi Splunkers,

Good day. I am experiencing an issue in our cluster where the searches are all skipping with the reason "Searchable rolling restart or upgrade is in progress".

My understanding is that having a searchable rolling restart enabled in the Cluster Manager (indexer) during bundle push minimizes impact to running searches. However, my case is that all the searches are getting skipped regardless.

Seeking advise.

Splunk installed in the SH cluster and Indexer Cluster all has the same version at 8.0.2.

Thank you in advance.

0 Karma
1 Solution

gjanders
SplunkTrust
SplunkTrust

This is working as designed unfortunately

You may wish to vote for https://ideas.splunk.com/ideas/EID-I-12 Splunk indexing tier searchable rolling restart should allow the scheduler to run jobs as expected

 

Here's a copy and paste of the idea The title may sound counter-intuitive if you are not familiar with this feature, the current implementation (8.0.1, 7.3.4) of the searchable rolling restart feature at the indexing tier results in the scheduler at the search head level pausing all scheduled jobs until the rolling restart or rolling upgrade completes

 

Personally I would have preferred the current feature be called "searchable ad-hoc only rolling restart", as per https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Userollingrestart#Disable_deferred_sched... there are settings such as:

 

defer_scheduled_searchable_idxc

 

However this setting is also slightly unclear, what it can do is allow continuously scheduled saved searches to run during a rolling restart, it has no effect on realtime scheduled searches.

 

For those not familiar with the difference, real time scheduling is all alerts, most reports excluding those that are designed to summarize data for summary indexes.

 

Therefore the feature in its current form allows ad-hoc only searching while the indexer cluster is restarting or undergoing an upgrade.

 

 

This idea is that the feature allows the scheduler to continue to run searches in a reliable fashion during the rolling restart, this would mean that searchable rolling restart is truly searchable for all search types, not just ad-hoc only searches

 

For anyone unfamiliar with continuous and real time scheduling refer to https://docs.splunk.com/Documentation/Splunk/latest/Report/Configurethepriorityofscheduledreports

View solution in original post

gjanders
SplunkTrust
SplunkTrust

This is working as designed unfortunately

You may wish to vote for https://ideas.splunk.com/ideas/EID-I-12 Splunk indexing tier searchable rolling restart should allow the scheduler to run jobs as expected

 

Here's a copy and paste of the idea The title may sound counter-intuitive if you are not familiar with this feature, the current implementation (8.0.1, 7.3.4) of the searchable rolling restart feature at the indexing tier results in the scheduler at the search head level pausing all scheduled jobs until the rolling restart or rolling upgrade completes

 

Personally I would have preferred the current feature be called "searchable ad-hoc only rolling restart", as per https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Userollingrestart#Disable_deferred_sched... there are settings such as:

 

defer_scheduled_searchable_idxc

 

However this setting is also slightly unclear, what it can do is allow continuously scheduled saved searches to run during a rolling restart, it has no effect on realtime scheduled searches.

 

For those not familiar with the difference, real time scheduling is all alerts, most reports excluding those that are designed to summarize data for summary indexes.

 

Therefore the feature in its current form allows ad-hoc only searching while the indexer cluster is restarting or undergoing an upgrade.

 

 

This idea is that the feature allows the scheduler to continue to run searches in a reliable fashion during the rolling restart, this would mean that searchable rolling restart is truly searchable for all search types, not just ad-hoc only searches

 

For anyone unfamiliar with continuous and real time scheduling refer to https://docs.splunk.com/Documentation/Splunk/latest/Report/Configurethepriorityofscheduledreports

arielpconsolaci
Path Finder

This explains it. Thanks for this!

0 Karma

isoutamo
SplunkTrust
SplunkTrust
How big indexer cluster you have? Is there enough nodes to do this without disruption?
0 Karma

arielpconsolaci
Path Finder

Hi @isoutamo Thank you for your response. We have 6 search heads and 12 indexers. Enough to avoid such issue.

0 Karma

isoutamo
SplunkTrust
SplunkTrust
And those are in one single site cluster? You probably have read this: https://docs.splunk.com/Documentation/Splunk/8.0.2/Indexer/Userollingrestart#Best_practices_for_sear...
If those are wrongly set the result could be a stuck restart or as you have skipped searches.

Did this works when you are doing RR from command line/GUI without initiate it with apply cluster bundle? If yes then you should check those configurations that they are present to do searchable RR by default also in apply (I haven't try this with apply and I haven't enough big cluster where to test it now).

Also there could be a situation that not all buckets are fulfil SF over cluster (e.g. some buckets in some nodes has already frozen but those replicas are still in one node). In this kind of situation searchable RR cannot do.
r. Ismo
0 Karma

arielpconsolaci
Path Finder

Appreciate your response @isoutamo. Yes, I've gone through that document and the best practice in our cluster is observed below.

[clustering]
restart_timeout = 600
rolling_restart = searchable_force
decommission_force_timeout = 180

The cluster is in a multi-site cluster.

I am running the indexer apply bundle push via backend and  I've observed based from logs in the UI that the searchable rolling restart is running.

0 Karma

isoutamo
SplunkTrust
SplunkTrust

How many buckets you have in your cluster? Is this 180s enough long time to make all buckets to searchable on another nodes when one node is going down?

0 Karma

arielpconsolaci
Path Finder

Thanks once again @isoutamo for answering to my queries. We have 30,000 buckets per indexer (slave/member) in the cluster.

How do I check the below?


@isoutamo wrote:

How many buckets you have in your cluster? Is this 180s enough long time to make all buckets to searchable on another nodes when one node is going down?

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Basically not so much buckets (expecting that your storage has at least 800 IOPS).

I haven't any idx cluster in my hand now to check the correct strings to get actual service start time, but you could get the estimated down time by this query.

index=_internal (component=ServerConfig "My GUID") OR ( component=IndexProcessor "request state change from=RUN to=SHUTDOWN_SIGNALED")
| transaction host startswith="request state change from=RUN to=SHUTDOWN_SIGNALED" endswith="My GUID"
| eval time = tostring(duration,"duration")
| table _time host duration

It starts counting from shutdown signal and ends it when splunkd starts again. In real cluster it do a lot of stuff before it's ready for service. You could check those from individual indexer or cm to get actual time.

r. Ismo 

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

A few months ago, we released Splunk Enterprise Security 8.0 for our cloud customers. Today, we are excited to ...

Logs to Metrics

Logs and Metrics Logs are generally unstructured text or structured events emitted by applications and written ...

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...