I'm having a frustrating time attempting to set up a test environment with Index Clustering and I've reached a tipping point! I've searched online for answers but I'm not finding anything substantial that's been able to fix my problem. The VM network that I set up has one Deployment Server (DS), a Master Node (MN), a Search Head (SH), 3 Indexers, and 2 Forwarders. I set the Replication Factor to 3, and the Search Factor to 2. I followed the following steps to set up the network and create the index cluster:
Created VMs, installed Splunk on each box, pinged entire network to ensure connectivity between every VM.
On the DS I configured some Apps, created some server classes, and organized the forwarders all nice and neat-like.
On the MN I enabled indexer clustering via UI and set everything to default values and created a simple password for the cluster.
I enabled each indexer as a peer node and connected them to the MN via UI - I received an error saying they couldn't communicate with the MN or the Replication Factor hadn't been met yet.
Finally, I enabled the SH via UI.
This is where I'm running into some problems. I haven't begun sending data from my forwarders yet but the _audit and _internal aren't being replicated fully, there's only one replicated and searchable copy between all three. I've waited for over an hour while I worked on other projects but the replication has stayed the same. There's a few buckets that were replicated to other indexers but after a brief period of time they stopped, so 4/10 buckets would become 5/11, then 6/12, etc...
So far I have tried:
Checked that all relevant ports were being used by Splunk.
Navigated to the "Bucket Status" page to try and find a manual solution.
Uninstalling and reinstalling Splunk entirely. (yes)
These are some of the error messages I've received on the MN:
**Search peer 'indexer1_name' has the following message: Indexer Clustering: Too many bucket replication errors to target peer='indexer2_ip_address'8080. Will stop streaming data from hot buckets to this target while errors persist. Check for network connectivity from the cluster peer reporting this issue to the replication port of target peer. If this condition persists, you can temporarily put that peer in manual detention.**
**06-28-2018 14:27:08.061 -0400 INFO CMMaster - event=handleReplicationError bid=_internal~7~9EB230C3-F26E-4110-A543-1C5DBB249AAC tgt=E106836F-8C34-4AAF-8922-8E859E898E62 peer_name='indexer2_name' msg='target doesn't have bucket now. ignoring'**
**06-28-2018 14:27:08.061 -0400 INFO CMMaster - replication error src=A6FBB117-781D-4AD8-B620-8981371DE05F tgt=E106836F-8C34-4AAF-8922-8E859E898E62 failing=tgt bid=_internal~7~9EB230C3-F26E-4110-A543-1C5DBB249AAC**
**06-28-2018 14:27:08.056 -0400 INFO CMMaster - postpone_service for bid=_internal~8~E106836F-8C34-4AAF-8922-8E859E898E62 time=150.000**
I'm wondering if anyone has a hunch about what the happy heck could be going on that I'm overlooking. I've set up a cluster before in a separate Splunk Lab so this is extra weird to me - I thought I had most of the basics down, but apparently not! Any thoughts or advice would be greatly appreciated. Thanks,