I recently inherited this splunk system, and I am gradually working out how it is set up. When running a search yesterday, I noticed something. We have 10 indexers, 5 at site1, 5 at site2. We have 4 search heads, all assigned to Site0. When inspecting my search job, I saw that my results were only pulled from a single site's peers, not from both. Here are some pictures to explain:
My rep factor tells me I should have 2 copies at each site.
My search factor tells me I should have 2 searchable copies at each site.
This would imply that when I run a search across my 10 indexers, it would be pulling data from both sites.
So then i run a search on a specific index, and I see this:
I expected to see data pulled equally from both sites, but I see Site k is left completely alone.
Even if a single indexer was the ingest point for all the data, it would still be scattered across the 10 indexers as it worked to meet the replication/search factors. There is no reason everything should be stuck on one site.
Am I way off base here, or is something configured wrong?
As @sperkins points out, the search heads will periodically get the current generation (https://docs.splunk.com/Splexicon:Generation) from the indexer cluster master, this informs them where the primary copies are for each bucket which is the copies of the bucket they use when searching.
"Even if a single indexer was the ingest point for all the data, it would still be scattered across the 10 indexers as it worked to meet the replication/search factors. There is no reason everything should be stuck on one site."
I'm not sure this will be true, since if all the data is going to a single indexer the primary copies would initially be those buckets on that single indexer. Now, it is possible to re-balance the primary copies across indexers in the same site (this will happen when you restart indexers) but it doesn't appear possible to re-balance the primary copies across the whole cluster: https://docs.splunk.com/Documentation/Splunk/8.2.6/Indexer/Rebalancethecluster#Rebalance_indexer_clu...
Given this, I'd say if you have disabled search affinity and want your search heads to search across all indexers you probably need to be sending an even amount of data for each indexer to both sites.
Search Affinity is disabled (Cluster = 0)
I guess i just assumed a search would run everywhere, not just at one place.
The piece that might have thrown me off is that my Search Factor is 4, so I have 4 searchable copies? But when i search a single individual log, it still touches all 5 at site G. If my search factor is only 4, why does it touch 5?
because forwarders are automatic load balancers they distribute across all the indexers.
So say one search needs to pull 4 buckets of one index (index=example) depending on parameters set)
indexer 1 could have the primary copy of bucket one and 2 of index example
indexer 2 has the primary copy of bucket three and the back ups for bucket one and two
indexer 3-4 have the back ups for bucket 3 and 4.
and indexer 5 has the primary of bucket 4. Etc…
so in this search indexer 1,2, and 5 would participate.
By default SHC members searches on the local site. This is called search affinity and designed to reduce cross site network traffic. Read more about it here: https://docs.splunk.com/Documentation/Splunk/8.2.6/Indexer/Multisitesearchaffinity
With your replication factor/search factor setting, each site has a searchable copy of data and just searching on single site (which is site0, the local site of your search heads) is sufficient to query all data. You'll see data being searched across site1 if one or more indexers on site0 is down.
To ensure that exactly one copy of each bucket participates in a search, one searchable copy of each bucket in the cluster is designated as primary. Searches occur only across the set of primary copies.
This is to ensure that there isn’t duplicate data/events being returned in a search.
So if the primary copies are stored in one site then that is the site the Search Head will search. The others are “back-ups” in the event that one of the Indexers goes down. This is feature is more for data loss prevention and user search ability not being interrupted.
Also check to be sure your SH is configured to disable search affinity: https://docs.splunk.com/Documentation/Splunk/8.2.6/Indexer/Multisitesearchaffinity