What are people's experiences and expectations of the impact caused by a multisite indexer cluster rolling restart (on version 6.4.3)? By "impact", I mean that a complete set of events is not returned in searches.
I can identify two classes of data: "historical" data that was sent to the cluster before it was put into maintenance mode, and "recent" data sent to the cluster after it was put into maintenance mode.
For the purposes of example let's assume:
- A two site cluster with search factor of 2, and 1 copy forced in each site
- The rolling restart will never restart more than 1 indexer at a time
I had expected that there would be no search impact to "historical" data, as there should always be a second searchable copy of the data within the cluster. This would require search heads to break site affinity and search across site, but according to the docs on cluster maintenance mode, this should happen as the cluster master will still attempt to reassign primaries. In testing, I have found that historical data availability is severely and unpredictably affected. Often a search head will not search across sites at all towards the beginning of a rolling restart. Later in the rolling restart, it starts to search across sites, but data is still not complete.
I can see that potentially there could be an impact to "recent" data while the rolling restart is in progress. An event could have been written to a certain indexer, not replicated since the cluster was in maintenance mode, and then the indexer that holds it goes down for its restart and the data becomes unavailable until that indexer returns.
Does this mean that it is the case that it is not possible to restart an indexer cluster without severely impacting data searchability, and so it becomes necessary to prevent user access throughout, as well as disabling alerting and anything else that relies on search? The docs seem to say that indexer clustering provides high availability where the data is always available for searching, but this appears to be a false claim.
If this impact is real and I haven't stuffed it up somehow, how can it be mitigated?
... View more