My cluster has one issue with data durability, everything else seems fine. All Indexers are online and running, even the healthchecks return a somewhat good result. What I noticed is one peer has 920 buckets and the other has 919 buckets, is that the issue? What should I do?
So after looking around in Splunk I found the bucket
Any Ideas what to do with that?
Bucket-Status
_internal~1260~8D19E36A-C3DF-465D-9B7E-908324F333E5 | Aktion | _internal | does not meet: primacy & rf | 3 Tag(e) 20 Stunde(n) 10 Minute(n) | Cannot replicate as bucket hasn't rolled yet. |
When you say "one has" and "the other has" it suggests you have only two indexers. That's not a very good cluster design. As a rule of thumb you should have at least RF+1 indexers so that in case of a disaster the cluster can recover to a complete state. (you can compare it to a RAID setup with a hot-spare)
But that's not the main point.
The main point is that if Replication Factor is not met it means that for some reason not all buckets are available in many enough copies. You can check which buckets are where with help of the dbinspect command and then look for messages regarding that particular missing bucket in the _internal log. That should give you some hint about the cause of the missing copy.
I dont know how to identify the missing bucket and what to do after I identified that.
The missing one will be the one that is only on one of the indexers.
What to do - well, it will depend on the reason for the bucket not being properly replicated.
The error message you mentioned, “Data Durability Root Cause(s): Replication Factor is not met,” indicates that the replication factor for some buckets is not being fulfilled.
Check if any peers are offline or experiencing connectivity problems. An offline peer can prevent the replication factor from being met
https://docs.splunk.com/Documentation/Splunk/9.2.0/DMC/Usefeaturemonitoring
The message “All data is not searchable” suggests that some buckets are not fully searchable.
Ensure that all buckets have primary copies. If a bucket lacks a primary copy, it can impact searchability.
Resync the state of the affected bucket copies on the manager.
https://docs.splunk.com/Documentation/Splunk/9.2.0/Indexer/Anomalousbuckets
Anomalous bucket issues - Splunk Documentation
Well I do not get how to fix that, I cant see a dashboard with faulty buckets.
I dont know what buckets are at fault, the bucket count is still different on both peers.
Well I somehow fixed my problem, by going to the "Bucket-Status" page and "summarized" the affected bucket in the repair tasks tab. can someone explain what that did? I still do not get it.