we have a splunk cluster with :
-a search head
we are planning maintenance updates etc ...
so i tested out high availability of our splunk cluster.
The facts are that i stoped an indexer for few hours to see how buckets will react.
The cluster reacts ok BUT i have an issue with a few hot buckets that are not replicated from the host that stayed up to the host that was Down.
I think that buckets wich were started without a peer node to start replication are not replicated.
I think they will get replication when they go warm.
Meaning in my configuration i have to force hot buckets to go warm so i can replicate them and meet my replication factor.
Is there a way to start replication of hotbucket ?
Well after putting down and then up (2 hours later) an indexing peer (lets call him SRVLOG2);
my cluster wasn't able to rebluid indexes and i couldn't reach my replication factor of 2.
I had to restart the other indexing peer (SRVLOG3) to get a few more buckets and finaly restart SRVLOG2 to get back to a fully operational cluster.
Obviously i have a bucket replication issue; i had the message :
Too many streaming errors to target=. Not rolling hot buckets on further errors to this target. (This condition might exist with other targets too. Please check the logs.)
restarting the splunk service was the first solution i think off; but i think a lighter solution would be to move from hot to warm i'll try this solution soon.
Of course the best would be not to have to do nothing when a peer goes back on, bucket fixing operations from the master should do that job.
PS : thanks for your answer 😃
Are you seeing bucket errors in SRVLOG2/...var/log/splunk/splunk.d?
Hot buckets are replicated too. (The replication is not per-event but a certain slice of data.) See http://docs.splunk.com/Documentation/Splunk/6.0/Indexer/Howclusteredindexingworks for more information.
Could you elaborate on what exactly was the issue?