Deployment Architecture

Why Search Head in a Search Head Cluster is in manual detention, but still taking searches?

BlueSocket
Communicator

I need to upgrade a Search Head Cluster from 7.3.4 to 8.1.9 and I have run the first two commands:

splunk upgrade-init shcluster-members

splunk edit shcluster-config -manual_detention on

We are monitoring the active searches using the following command:

splunk list shcluster-member-info | grep "active"

And we see:

active_historical_search_count:1
active_realtime_search_count:0

And it seemed to never reduce down to 0 for the active_historical_search_count, but after 90 minutes, it seems to have come down to 0. We checked the currently-running searches and found some new searches running on the detention server after 1 hour. We have the following set in the server.conf:

decommission_force_finish_idle_time = 0

decommission_node_force_timeout = 300

decommission_search_jobs_wait_secs = 180

...so why is it taking 90 minutes to stop running savedsearches?

We did find some savedsearches that were running for long times and we fixed them, but should not all new searches be moved to another server once it is in manual detention? What can I do to fix this, so that my SHC can be upgraded?

Labels (1)
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

based on documentation those decommission_* parameters are valid only on search peers (indexers) or cluster master and only when you are doing "splunk offline" command. See:

When you have put SHC member manually into detention mode, it just wait that searches will finished.

  • On a search head that is in manual detention but not a part of a searchable rolling restart. These searches will run to completion.
  • On a search head that is a part of a rolling upgrade. During rolling upgrade of a search head cluster, you can put a single search head into manual detention and wait for the existing search jobs to run to completion before you shut down the search head.

I'm not sure if there is any way to force gracefully those sessions on SHC member? I usually just wait some time and after that cancel those jobs if needed.

I agree with you that on detention mode it shouldn't accept any new queries from schedule or users anymore. But maybe there is error in documentation and it means that it don't accept any new sessions from user to this node? 

r. Ismo

0 Karma

alexanderl
Engager

Hi!

I have the same issue. The "active_historical_search_count" does not go to 0.

I don't see any running searches under "jobs" in gui.

"I usually just wait some time and after that cancel those jobs if needed"

How do you find and cancel the jobs?

Regards Alex

0 Karma

vgrote
Path Finder

And another not so happy user here.

The documentation clearly states "When a search head cluster member is in manual detention, it stops accepting all new searches from the search scheduler or from users. Existing ad-hoc and scheduled search jobs run to completion. New scheduled searches are distributed by the captain to search head cluster members that are up and not in detention."

As expected, an interactive search is refused.

Yet when I monitor the active_historical_search_count of a member in detention, I observe the count going up and down.

When I look at the Job Manager screen, I see lots of newly created jobs.

Either I misunderstood the detention feature, or the documentation is off the mark, or there is a bug.

What is it?

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust
You should ask clarification from doc team. Just leave a comment o that document page and the will be back to you later.
0 Karma

isoutamo
SplunkTrust
SplunkTrust
The easiest way is shutdown node and start update. Another option is check what are running jobs and then cancel those one by on from GUI.
0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...