We are on index clustering with version 6.3.3.
We have 6 indexers.
We manage the outputs.conf on our forwarders through an app called allforwarderoutputs from deployment server . The app has the outputs.conf defined as below,
[tcpout:group1] server=server1:9997, server2:9997, server3:9997, server4:9997, server5:9997, server6:9997 autoLB = true useACK = true
We have replaced server1 indexer with server7. So i updated allforwarderoutputs/local/outputs.conf by replacing server1 with server7 and pushed to all forwarders.
1) Even though we have restartSplunkd = true is defined on the serverclass alldeploymentclients, the forwarders did not get recycled. I had to restart them manually on all. (all forwarders received the updated outputs.conf) our serverclass is defined as below
[serverClass:all_deployment_clients:app:all_forwarder_outputs] restartSplunkWeb = false restartSplunkd = true stateOnClient = enabled [serverClass:all_deployment_clients] whitelist.0 = *
2) Some of the forwarders are still sending data to old indexer server1. I have verified that those servers received new outputs.conf and no other copies of outputs.conf exist. I have recycled these forwarders several times. Verified that splunkd.log on the forwarder does not have Connected to idx=server1 ip message but has Connected to idx=server7 ip
Can some one help me on this please?
Did you take your old indexer offline?
It looks like the old indexer is getting data replicated from other nodes. When you want to replace indexer with new indexer, you need to execute following command on the old indexer:
$SPLUNL_HOME/bin/splunk offline --enforce-counts
This command will ensure that all the buckets are moved to other indexers before taking this node offline.
The old indexer is still online and part of the cluster.
My thought was, being it part of cluster, the data on it can be still read by and eventually data get aged out and at that point remove it from cluster and offline it.
No if you keep it in cluster, it will keep on receiving new data though not directly from the forwarders but from the replicated copies to meet RF and SF.
In order to decommission indexer node, we need to use offline command with enforce-counts parameter. This command will ensure no new copies get replicated on node which is decommissioning. All the existing copies from the node, are moved to other indexers. Once bucket count is zero, the Splunk instance will be stopped.
So I need to,
run $SPLUNL_HOME/bin/splunk offline --enforce-counts from my old indexer.
run *$SPLUNL_HOME/bin/splunk remove cluster-peers -peers * from master to remove it from master's list as per http://docs.splunk.com/Documentation/Splunk/6.3.3/Indexer/Removepeerfrommasterlist
Is that correct?
Yes you are right.
Execute the first command on indexer. After that, check the status on indexer master under Indexer Clustering. It will show "Decommissioning" for the old indexer. Once the status of old indexer is either "GracefulShutDown" or "Down", execute second command on indexer master.
I think @hardikJsheth's answer is right. If you're done with an indexer, remove it thereby causing the data RF and SF to get enforced on the hosts that remain.
BUT, if that doesn't fix it, then you're likely manually looking for conf files and overlooking something. For that, I recommend btool to see what the forwarder truly is loading. http://docs.splunk.com/Documentation/Splunk/latest/Troubleshooting/Usebtooltotroubleshootconfigurati...
I initiated $SPLUNK_HOME/bin/splunk offline --enforce-counts on old indexer.
Our master dashboard shows "Decommissioning" against the old server. Its been 8 hours now.
Bucket status->Fixup Task Pending shows the count as 78. Its not reducing. status shows "Cannot fix search count as the bucket hasn't rolled yet."
It does take a time to complete decommissioning process depending on the index volume and machine configuration.
What's the status of RF and SF? If the RF and SF are met and there are no on going fixup tasks, you can put the master node in maintenance mode and then restart the master node.
I am assuming you are asking for Search Factor and Replication Factor number from Bucket Status->Fixup Task Pending view.
Search Factor = 39
Replication Factor =39