Deployment Architecture

Why am I getting "Missing enough suitable candidates to create a replicated copy" building a multisite indexer cluster staging environment?

harrymclaren
Explorer

Hello,

I have built a Splunk testing / staging environment on top of 6 VMs. Splunk version is 6.2.3 and us running on CentOS 6.6.
I have 1 Search Head, 1 Utilities Server (Cluster Master / Deployment Server) and 4 Indexers (Multi Site Cluster).

The cluster has come up and is showing all hosts within the management console.

Search factor is set to 2 and replication factor is set to 3.
The only indexes are _audit and _internal. Searchable data is fine 2/2 but replicated data is 2/3.
Bucket status is showing 8x fixup tasks - pending. I have rebooted the cluster master and peer nodes (rolling restart).

If I view the bucket statuses I get:

Missing enough suitable candidates to create a replicated copy in order to meet replication policy. Missing={ site2:1 }

Any help or advice would be appreciated.

0 Karma
1 Solution

harrymclaren
Explorer

As it's a new build I just cleaned out the indexes and now the issue has gone away:
splunk stop
splunk clean eventdata -index _internal -f
splunk clean eventdata -index _audit -f
splunk start

View solution in original post

mleid
Engager

We received the same bucket status as you described on most of our pending fixup actions, both on search factor and replication factor, respectively.

Search factor:

Missing enough suitable candidates to create searchable copy in order to meet replication policy. Missing={ site3:1 }

Replication factor:

Missing enough suitable candidates to create a replicated copy in order to meet replication policy. Missing={ site3:1 }

In our case, we verified all our configurations were correct per the link posted by dxu and elsewhere in the documentation and then just had to wait it out. The status did not go away until our pending fixup actions got into the ~300 range (down from ~30k originally).

So in short, it resolved itself. This was on Splunk 7.3.

0 Karma

louismai
Path Finder

I got a similar issue with 7.3.3 when migrating from single-site to multisite. I found that reducing replication_factor and seach_factor helped the process faster.

0 Karma

kamal_jagga
Contributor

Any solution for this issue in production for single site clustered environment. Kindly advise.

0 Karma

harrymclaren
Explorer

As it's a new build I just cleaned out the indexes and now the issue has gone away:
splunk stop
splunk clean eventdata -index _internal -f
splunk clean eventdata -index _audit -f
splunk start

View solution in original post

woodcock
Esteemed Legend

ATTENTION!!!  WARNING!!!  This answer is not really a solution: it is deleting all of the buckets and data so DO NOT USE THIS APPROACH IN PRODUCTION or on useful indexed data.

Tags (1)

Eric_Mcknight
Explorer

This is a horrible idea. Deleting data is not the proper way to resolve SF/RF issues.

schose
Builder

I downvoted this post because data loss.

effem
Communicator

I downvoted this post because i downvoted this post because i can't remove all data to fix my data issue

sbattista09
Contributor

I downvoted this post because removed data to fix data issue is not a answer.

nplamondon
SplunkTrust
SplunkTrust

I downvoted this post because this isn't a realistic answer and should very clearly state that all data in that index will be destroyed.

doe2013
Explorer

I downvoted this post because wiping all data doesn't solve the problem

fabiocaldas
Contributor

I downvoted this post because i cant remove all data to fix my data issue

mikelanghorst
Motivator

wiping all data isn't really a realistic answer

harrymclaren
Explorer

The trigger condition is showing:
"Removing peer "

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!