Solved: Re: Skipped searches - Searchable rolling restart ...

arielpconsolaci · ‎05-23-2021

Hi Splunkers,

Good day. I am experiencing an issue in our cluster where the searches are all skipping with the reason "Searchable rolling restart or upgrade is in progress".

My understanding is that having a searchable rolling restart enabled in the Cluster Manager (indexer) during bundle push minimizes impact to running searches. However, my case is that all the searches are getting skipped regardless.

Seeking advise.

Splunk installed in the SH cluster and Indexer Cluster all has the same version at 8.0.2.

Thank you in advance.

gjanders · ‎06-05-2021

This is working as designed unfortunately

You may wish to vote for https://ideas.splunk.com/ideas/EID-I-12 Splunk indexing tier searchable rolling restart should allow the scheduler to run jobs as expected

Here's a copy and paste of the idea The title may sound counter-intuitive if you are not familiar with this feature, the current implementation (8.0.1, 7.3.4) of the searchable rolling restart feature at the indexing tier results in the scheduler at the search head level pausing all scheduled jobs until the rolling restart or rolling upgrade completes

Personally I would have preferred the current feature be called "searchable ad-hoc only rolling restart", as per https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Userollingrestart#Disable_deferred_sched... there are settings such as:

defer_scheduled_searchable_idxc

However this setting is also slightly unclear, what it can do is allow continuously scheduled saved searches to run during a rolling restart, it has no effect on realtime scheduled searches.

For those not familiar with the difference, real time scheduling is all alerts, most reports excluding those that are designed to summarize data for summary indexes.

Therefore the feature in its current form allows ad-hoc only searching while the indexer cluster is restarting or undergoing an upgrade.

This idea is that the feature allows the scheduler to continue to run searches in a reliable fashion during the rolling restart, this would mean that searchable rolling restart is truly searchable for all search types, not just ad-hoc only searches

For anyone unfamiliar with continuous and real time scheduling refer to https://docs.splunk.com/Documentation/Splunk/latest/Report/Configurethepriorityofscheduledreports

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

View solution in original post

gjanders · ‎06-05-2021

This is working as designed unfortunately

You may wish to vote for https://ideas.splunk.com/ideas/EID-I-12 Splunk indexing tier searchable rolling restart should allow the scheduler to run jobs as expected

Here's a copy and paste of the idea The title may sound counter-intuitive if you are not familiar with this feature, the current implementation (8.0.1, 7.3.4) of the searchable rolling restart feature at the indexing tier results in the scheduler at the search head level pausing all scheduled jobs until the rolling restart or rolling upgrade completes

Personally I would have preferred the current feature be called "searchable ad-hoc only rolling restart", as per https://docs.splunk.com/Documentation/Splunk/latest/Indexer/Userollingrestart#Disable_deferred_sched... there are settings such as:

defer_scheduled_searchable_idxc

However this setting is also slightly unclear, what it can do is allow continuously scheduled saved searches to run during a rolling restart, it has no effect on realtime scheduled searches.

For those not familiar with the difference, real time scheduling is all alerts, most reports excluding those that are designed to summarize data for summary indexes.

Therefore the feature in its current form allows ad-hoc only searching while the indexer cluster is restarting or undergoing an upgrade.

This idea is that the feature allows the scheduler to continue to run searches in a reliable fashion during the rolling restart, this would mean that searchable rolling restart is truly searchable for all search types, not just ad-hoc only searches

For anyone unfamiliar with continuous and real time scheduling refer to https://docs.splunk.com/Documentation/Splunk/latest/Report/Configurethepriorityofscheduledreports

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

arielpconsolaci · ‎07-25-2021

This explains it. Thanks for this!

isoutamo · ‎05-24-2021

How big indexer cluster you have? Is there enough nodes to do this without disruption?

arielpconsolaci · ‎05-24-2021

Hi @isoutamo Thank you for your response. We have 6 search heads and 12 indexers. Enough to avoid such issue.

isoutamo · ‎05-24-2021

And those are in one single site cluster? You probably have read this: https://docs.splunk.com/Documentation/Splunk/8.0.2/Indexer/Userollingrestart#Best_practices_for_sear...
If those are wrongly set the result could be a stuck restart or as you have skipped searches.

Did this works when you are doing RR from command line/GUI without initiate it with apply cluster bundle? If yes then you should check those configurations that they are present to do searchable RR by default also in apply (I haven't try this with apply and I haven't enough big cluster where to test it now).

Also there could be a situation that not all buckets are fulfil SF over cluster (e.g. some buckets in some nodes has already frozen but those replicas are still in one node). In this kind of situation searchable RR cannot do.
r. Ismo

arielpconsolaci · ‎05-24-2021

Appreciate your response @isoutamo. Yes, I've gone through that document and the best practice in our cluster is observed below.

[clustering]
restart_timeout = 600
rolling_restart = searchable_force
decommission_force_timeout = 180

The cluster is in a multi-site cluster.

I am running the indexer apply bundle push via backend and I've observed based from logs in the UI that the searchable rolling restart is running.

isoutamo · ‎05-24-2021

How many buckets you have in your cluster? Is this 180s enough long time to make all buckets to searchable on another nodes when one node is going down?

arielpconsolaci · ‎05-27-2021

Thanks once again @isoutamo for answering to my queries. We have 30,000 buckets per indexer (slave/member) in the cluster.

How do I check the below?

@isoutamo wrote:
How many buckets you have in your cluster? Is this 180s enough long time to make all buckets to searchable on another nodes when one node is going down?

isoutamo · ‎05-28-2021

Basically not so much buckets (expecting that your storage has at least 800 IOPS).

I haven't any idx cluster in my hand now to check the correct strings to get actual service start time, but you could get the estimated down time by this query.

index=_internal (component=ServerConfig "My GUID") OR ( component=IndexProcessor "request state change from=RUN to=SHUTDOWN_SIGNALED")
| transaction host startswith="request state change from=RUN to=SHUTDOWN_SIGNALED" endswith="My GUID"
| eval time = tostring(duration,"duration")
| table _time host duration

It starts counting from shutdown signal and ends it when splunkd starts again. In real cluster it do a lot of stuff before it's ready for service. You could check those from individual indexer or cm to get actual time.

r. Ismo

Skipped searches - Searchable rolling restart or upgrade is in progress

configuration

troubleshooting

using Splunk Enterprise

Tech Talk Recap | Mastering Threat Hunting

Observability for AI Applications: Troubleshooting Latency

Splunk AI Assistant for SPL vs. ChatGPT: Which One is Better?

Are you a member of the Splunk Community?