Solved: Swap indexers from indexer cluster with new peers

krusty · ‎07-22-2020

Hi there,

our current Splunk Installation contains a indexer cluster with 2 nodes and 1 searchhead which also has the cluster master role, License role and deployment role included.

I now added two new indexer peers to the existing cluster. So for the moment we have 4 indexer nodes in production. The main goal should be to leave only the 2 new indexers up and running.

I have read in another thread that indexed data could not replicated to the new servers, so I need to wait until the retention period is reached. So far no Problem but I have a few questions about the current setup:

- Do I have to change the Replication Factor / Search Factor? The current setting is RF=2 / SF=2

- We created a couple of Server classes in our deployment server, each server class has it's own outputs.conf file, where I defined the tcpout through only the old indexer. Should I have to change that to the new indexers, so all the new data will directly go through this indexers?

- Are there any other configuration files to modify to make sure new data will only go to the new indexers?

- At the cluster overview webpage, I see that all indexers are searchable (4). Will Splunk searchhead automatically detect where to search for events?

At the splunk documentation I couldn't find much information about this topic/situation. So any information would be helpful.

Thanks.

isoutamo · ‎08-03-2020

Hi

no problem 😉 Better to check things before do anything fatal on production. For that reason it's good to have test/lab environment where this kind of stuff can be test. Basically you could do it with trial licenses. Only issue is that you cannot use LM with trial but you could use local licenses on all nodes 😉

Then to your questions.

Even you are sending all data to the new indexers those will replicate it to old ones unless you have put those to detention mode.

Probably you need to check it from servers DB directories or (better) query from _internal that there is no replication to old peers. So put 1st those to detention then check that replication has ended. Maybe you must roll hot buckets to warm to ensure that all replication has stopped to old peers?

index=_internal earliest=-10m component=BucketReplicator

Based on your activity select reasonable time period.

As I said detention mode is needed to stop replication to old nodes (both internal and external data).

Then start rebalancing (could take long time, based on your amount of data, size of your nodes, IOPS of disks and connections between nodes).

When rebalancing has done, then remove one node at time and wait until it has removed successfully. Then check that everything is like it should and then the next node (https://docs.splunk.com/Documentation/Splunk/8.0.5/Indexer/Takeapeeroffline#Take_a_peer_down_permane...).

splunk offline --enforce-counts

This commands tells what is has done after it's ready. Also you could check situation from MC and/or CM and clean up Excess buckets.

Remove it from dashboards etc.

splunk remove cluster-peers -peers <guid>

Those step should be enough. At least we have used those with success.

r. Ismo

View solution in original post

isoutamo · ‎07-22-2020

Hi

No need to wait data retention. You could do the next. First you should change outputs.confs point to the new peers and remove old. Then set old peers to the detention mode. After that you could remove old nodes one by one by instructions found docs (something like remove peer permanently from cluster). You must wait that first one is removed before continue with second one.

SF and RF is ok as final situation is only two peers.

As you are using cluster there is no need to tell to SH that there are new peers.

If I recall right there is no other files which needs to modify. But if you have added server names to your SPL queries etc. then you must update those.
r. Ismo

ivanreis · ‎07-22-2020

Hi @krusty, just adding more information:

The new indexers will not have historical data, so If you need to query them, my suggestion is to keep the indexers until the data retention reach out, or you no longer need the historical data.

If you have any heavy forwarder sending data to the indexer cluster, you also have to update the outputs.conf with the new indexer peers as well

for further information, please check this document link ->https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/Addclusterpeer

isoutamo · ‎07-22-2020

Hi

One of clusters base features is rebalancing (as asked and/or when e.g. nodes restarted) buckets. This is way how you could and should transfer old buckets from old nodes to the new ones (Rebalance indexer cluster primary bucket copies).This ensure that all nodes have approximately same amount of searchable buckets. After you have rebalanced the buckets then you should take old peers away with Take a peer down permanently: the enforce-counts offline command which automatically "move" rest copies to another search peer. After all primary copies have transferred to another peer it will go down and you could decommission it. Then you should remove the second old peer.

Also you must put old peers into detention mode or otherwise those will get new data all time (at least replicas). Put a peer into detention

r. Ismo

krusty · ‎08-03-2020

Hi soutamo,

thanks for your answers and links to the docs.
I have read the docs but I do not fully understood how it will work. Maybe you can help me again. 😉
This is the first time I have to do such an operation to our productive systems. Therefore I like to be sure that I do not do any mistake.
This is what I understood from the docs:

Stop new external data input
Verify that all forwarders will send data only to the new peers. So that only internal data will be indexed for the moment.
Is there a way to check if all incoming data are no longer use the "old" indexers? I checked the different menu's from Monitoring Console but could not find any usable report. So I tried searching with this SPL:

index=* source=* splunk_server="<old indexer1>" OR splunk_server="<old indexer2>" | stats count by host

Set "old" indexer peers in detention mode
To stop indexing new internal and external data I have to switch the old indexers into detetion mode.
Command to use from the command line of the "old" peer.

splunk edit cluster-config -auth <username>:<password> -manual_detention on

Initiate data rebalancing for all existing indexes by entering this command

splunk rebalance cluster-data -action start -searchable true
Will the master automatically detect which indexes and data needs to be replicated?

Remove a peer from the master's list

splunk remove cluster-peers -peers <guid>,<guid>

I read I have to take the peer "down" or "gracefulshutdown" before I can remove the peer. Will it be enough to take the node offline?

How can I check before removing the peer from the cluster that all buckets are replicated to the "new" indexers? Do you have any check search available?

Hope I do not miss anything and the foreseen procedure is correct. As I said, this is the first time I have to do such a task.

Many thanks.

isoutamo · ‎08-03-2020

Hi

no problem 😉 Better to check things before do anything fatal on production. For that reason it's good to have test/lab environment where this kind of stuff can be test. Basically you could do it with trial licenses. Only issue is that you cannot use LM with trial but you could use local licenses on all nodes 😉

Then to your questions.

Even you are sending all data to the new indexers those will replicate it to old ones unless you have put those to detention mode.

Probably you need to check it from servers DB directories or (better) query from _internal that there is no replication to old peers. So put 1st those to detention then check that replication has ended. Maybe you must roll hot buckets to warm to ensure that all replication has stopped to old peers?

index=_internal earliest=-10m component=BucketReplicator

Based on your activity select reasonable time period.

As I said detention mode is needed to stop replication to old nodes (both internal and external data).

Then start rebalancing (could take long time, based on your amount of data, size of your nodes, IOPS of disks and connections between nodes).

When rebalancing has done, then remove one node at time and wait until it has removed successfully. Then check that everything is like it should and then the next node (https://docs.splunk.com/Documentation/Splunk/8.0.5/Indexer/Takeapeeroffline#Take_a_peer_down_permane...).

splunk offline --enforce-counts

This commands tells what is has done after it's ready. Also you could check situation from MC and/or CM and clean up Excess buckets.

Remove it from dashboards etc.

splunk remove cluster-peers -peers <guid>

Those step should be enough. At least we have used those with success.

r. Ismo

krusty · ‎08-24-2020

Hi soutamo,

after a while I was able to do all the steps and now the old indexer/peers are removed from our environment. Everything works as expected.

Thanks a lot for your help.

Swap indexers from indexer cluster with new peers

indexer clustering

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!