I'm having a frustrating time attempting to set up a test environment with Index Clustering and I've reached a tipping point! I've searched online for answers but I'm not finding anything substantial that's been able to fix my problem. The VM network that I set up has one Deployment Server (DS), a Master Node (MN), a Search Head (SH), 3 Indexers, and 2 Forwarders. I set the Replication Factor to 3, and the Search Factor to 2. I followed the following steps to set up the network and create the index cluster:
This is where I'm running into some problems. I haven't begun sending data from my forwarders yet but the _audit and _internal aren't being replicated fully, there's only one replicated and searchable copy between all three. I've waited for over an hour while I worked on other projects but the replication has stayed the same. There's a few buckets that were replicated to other indexers but after a brief period of time they stopped, so 4/10 buckets would become 5/11, then 6/12, etc...
So far I have tried:
These are some of the error messages I've received on the MN:
**Search peer 'indexer1_name' has the following message: Indexer Clustering: Too many bucket replication errors to target peer='indexer2_ip_address'8080. Will stop streaming data from hot buckets to this target while errors persist. Check for network connectivity from the cluster peer reporting this issue to the replication port of target peer. If this condition persists, you can temporarily put that peer in manual detention.** **06-28-2018 14:27:08.061 -0400 INFO CMMaster - event=handleReplicationError bid=_internal~7~9EB230C3-F26E-4110-A543-1C5DBB249AAC tgt=E106836F-8C34-4AAF-8922-8E859E898E62 peer_name='indexer2_name' msg='target doesn't have bucket now. ignoring'** **06-28-2018 14:27:08.061 -0400 INFO CMMaster - replication error src=A6FBB117-781D-4AD8-B620-8981371DE05F tgt=E106836F-8C34-4AAF-8922-8E859E898E62 failing=tgt bid=_internal~7~9EB230C3-F26E-4110-A543-1C5DBB249AAC** **06-28-2018 14:27:08.056 -0400 INFO CMMaster - postpone_service for bid=_internal~8~E106836F-8C34-4AAF-8922-8E859E898E62 time=150.000**
I'm wondering if anyone has a hunch about what the happy heck could be going on that I'm overlooking. I've set up a cluster before in a separate Splunk Lab so this is extra weird to me - I thought I had most of the basics down, but apparently not! Any thoughts or advice would be greatly appreciated. Thanks,
Just a shot in the dark but did you set the pass4symkey in the server.conf on all required instances?
If that's not it I hope the rest of that doc may be of some use to you.