We preparing to move from a single indexer to an index cluster. I'm trying to determine the performance implications of a two node index cluster with the replication factor set to two and the search factor also set to two?
In the documentation manual "Managing Indexers and Clusters of Indexers" doc, under the section "How indexer clusters work", the subsection "Buckets and indexer clusters"
under the heading Data files it states:
"If the cluster has a search factor greater than 1, some or all of the target peers also create index files for the data. For example, say you have a replication factor of 3 and a search factor of 2. In that case, the source peer streams its raw data to two target peers. One of those peers then uses the raw data to create index files, which it stores in its copy of the bucket. That way, there will be two searchable copies of the data (the original copy and the replicated copy with the index files)."
I’m reading that as only the raw data is replicated, not the index files. The index files are recreated on the peer. So in a two node cluster with replication factor of 2 and a search factor of 2 both nodes would always be indexing the data.
The first indexer to receive an event indexes it. It then forwards that event to another indexer for replication. The replicate does not contain tsidx files unless it is necessary to meet the search factor.
So, to answer the question, setting Search Factor=2 means the replicating indexer has to do more work to create the tsidx files. I don't have specific metrics on that. Of course, the added tsidx files take up more storage space (35% of the uncompressed raw data size).
First I'd like to thank you for taking the time to respond to my question.
The "...unless it is necessary to meet the search factor." is confusing me, can you expand on that? Specifically considering a two peer index cluster with replication factor of two and a search factor of two:
My reading of the documentation would indicate that only the raw compressed data is transmitted to the target peer. The target peer is responsible for creating the tsidx file in order to achieve the search factor of two. The performance implications being not only additional storage (the tsidx file on the target peer), but the CPU required by the target peer to "reindex" the raw/compressed copy.
I read the documentation the same way as you.