I want to start with a couple of statements that I'd like to be corrected on if I'm interpreting them incorrectly.
In a single site indexer cluster, the search affinity can be replicated, but only one "active"/searchable copy is available at any time.
In a multisite indexer cluster, the search affinity allows for replication of the active portion of the searchable data, so that each site can have an actively searchable copy.
In Splunk 6.3, a search head can be configured to a site value of "site0" which disables its search affinity.
So, my questions are:
Given an exaggerated example, where a 2 site multisite cluster has site1 collecting ALL (100%) of data, and site2 acting as a replicated store with a positive search affinity for DR or continuity or whatever.
1. I expect a search head using site0 distribute its search load across both sites, but would it receive data back from both sites or just the site where the data was first indexed?
2. If it's returned both sites, will both sites return the full amount of data that is searchable on them, duplicating the the returned data, and if so, would the search head deduplicate that data? or would the indexers return an intelligent portion of the data?
3. If it's returned from only the origination site, I guess that means that it lacks a performance boost it might get if that search load were balanced over the indexers from both sites?
(#3 isn't really a question, as much as an observation)
Any insight would be helpful, as it affects an active build-out that could change based on how this all actually works behind the scenes.
Thank you in advance,
With site affinity disabled, the searches will be distributed to all sites that are searchable / known peers.
Those peers will return any searchable data based on the results of the query. The indexers aren't aware of the search results that are sent by other peers, so the SH does a bit of work to dedup what it believes are local or replica copies of the buckets.
With site affinity disabled, the searches will be distributed to all sites that are searchable / known peers.
Those peers will return any searchable data based on the results of the query. The indexers aren't aware of the search results that are sent by other peers, so the SH does a bit of work to dedup what it believes are local or replica copies of the buckets.
So, in any scenario where all data has search copies in the site local to a search head, disabling affinity on that search head actually causes (at least some) performance overhead? and based on that info, disabling search affinity should be reserved for scenarios where the search head is not local to searchable copies all the data it would be searching?
Thank you very much for this info. It helps a lot!
I also have a follow-up question if you'd oblige it. To squeeze every bit of performance out of our build-out that includes a low-latency high-bandwidth DR site; and considering the information you have provided here; it seems I COULD gain performance by
1. Distributing indexing across both sites
2. Configuring a replication factor that ensures data redundancy
3. NOT configuring a multisite search factor
4. Disabling affinity on the search-heads
In this way, I would expect that I get the full benefit of distributing search and index load across all indexers, and the search head would never receive duplicate data which it would have to dedup. The downside being that if a site were to fail, the indexes would have to be built at the functioning site before the data could be searched again. Does this all sound correct?