Deployment Architecture

Effects of a kvstore resync on a search head cluster

emallinger
Communicator

Hello everyone,

I have a kvstore down on a 4 member search head cluster.

I'd like to resync it without impacting too much the users activity.

I've got 2 options :

- put the instance in detention, stop it, clean the store, start it and put it out of detention

=> but I'm not sure how the new requests will be handled from the load balancer in front of the cluster...

- log in to the kvstore captain, and resync all => does that action launch a rolling restart ? does it affect the performance ? I couldn't find info on that.

I used mostly : https://docs.splunk.com/Documentation/Splunk/latest/Admin/ResyncKVstore

Thanks in advance,

Ema

Labels (3)
Tags (2)

emallinger
Communicator

Hi again,

 

You can backup one collection at a time. And yes, you have to "unshedule" all the report that fill your kvstore collections. That can be a bit long and painful.

I only had 2 collections so it was not a strech in my case.

I had to migrate the kvstore from one SH to another, and I used a backup file. Never tried to sync 2 instances sorry.

 

I would suggest you try to backup the kvstore in the state you are and then try to restore it on a test SH.

If this works => clean both your kvstore SHs and restore there.

If not, I'm afraid I would do it all over again : clean it all and then painfully fill again your collections with data.

If you still possessed the pertaining data, you can calculate again the collection, at a fast pace for past periods even if it is long. There will definititely be a small service interruption for these collections, but in the end you'll win.

 

And for later, when you have succeeded to get you kvstore back up and running, I would suggest you add a backup task planned (each day for example).

 

On another matter : why is your kvstore failing to start ? There should be some insight in `/opt/splunk/var/log/splunk/mongod.log`.

Maybe it's just a renew of the certificate for mongo that is needed (probably too easy that one, but who knows...)

 

Regards,

Ema

Abass42
Path Finder

Over 3 years later, and i am wondering the same thing. I have two individual SH that have a faulty KV store and I was wanting to see the impact of running the 

splunk clean kvstore --local command. 

 

Hope we get an answer within the next 3 years 🙂

0 Karma

emallinger
Communicator

HI,

Had recently to clean --local and resync without much success due to some difficulties for the kvstore to acknowledge they were in the same cluster...

Some conf are very persistent when you move a SH from a SHC to another SHC.

My mistake on that.

 

In the end : we

- backup the correct kvstore

- clean kvstore on all instances

- restore kvstore and resync

=> good as new !

 

Ema

Abass42
Path Finder

Thank you for the response. The more I read about the backups, the more worried I get. 

 

Did you backup the entire KV store or just a specific collection? I would like to clean the entire store on two servers. 

 

In the Splunk Docs, for the KV store backup, it says

  1. Ensure that the backupRestoreStatus field and the status field are both in the ready state.



Our backupRestoreStatus is ready, but our status on each of the Failing servers is starting

Abass42_1-1702572024550.png

Both of the servers look like that. I think you resync'd the entire cluster. Have you had to resync individual servers? And I do not want to move to any other SHC, these two are staying where they are. 

 

The Docs also mentioned:
If you are running any searches that use outputlookup with the default append=f parameter, end them or allow them to complete before taking a backup, or the backup fails

 

Is there a search that you ran to get all searches that are using outputlookup? I dont believe we have many, but we may. 

 

Im trying to backup everything correctly since I am not entirely sure to what extent the KV Store is affecting all of the searches/reports. 

 

Thank you for your response, wasnt expecting one. I appreciate the help. 

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...