We have a distributed environment with multi-site indexer clustering and search head clustering. The replication factor is setup so that every indexer will have a copy of all data that has been indexed by the cluster. I'm trying to figure out if there is a way to craft a query from our search heads that will be directed to a specified indexer in the cluster and get it to return all data for a particular index.
If I specify splunk_server in the query, it seems to direct the request to the indexer specified, but it only returns events that were indexed by that particular indexer, not all events that have been indexed by the cluster.
Is there another way to go about this?
Comment from @d: In a cluster indexers will only return data from buckets that are designated as primaries. What is primary on indexerA is not in indexerB and what is being indexed by indexerA is likely to remain primary until a failure occurs. The only apples-apples comparison is to index the same data in each indexer and test with it with splunk_server. Anything else will be an approximation.
Could you tell us why you'd like to do this? It seems like it's fundamentally and precisely not what the clustering is supposed to achieve, so it makes me wonder if perhaps a change in how its set up may help your situation.
My only real thought is that you have a multi-site physical topology but haven't set up multi-site clustering with site-aware search heads.
The reason I am looking to query the indexers in this way is for troubleshooting and performance testing. We have some indexers setup on different hardware/storage and with different OS configurations and want to see if we can get an apples to apples comparison on performance for searches and indexing. If we can pull the same events from each indexer then we can be more confident in the uniformity of the data set that we are querying.
Hmm. You might consider posting a different question: How to compare indexer performance when clustering is used?
The question here is very specific and there might be alternate approaches to solve the overall challenge.
That said, there might be approaches that can be used with the hidden bucket field. What is the search factor and replication factor?
I will follow up with a more open ended question regarding performance comparisons. I am still interested in finding the answer to this particular question as well.
Replication:
origin:3,total:6
Search:
origin:2,total:4