Solved: Multisite indexer cluster : Why is data from othe...

Cloud001 · ‎07-16-2024

Why is data from other sites retrieved?
1. splunk version 9.2.1
2. server.conf : manager-node
[general]
serverName = site01_master
pass4SymmKey = $7$50dW7T6+mDkef5xS4o2BemFWDAur04JWlGHTwFKCNHAXuGtkZkOaEg==
site = site1

[clustering]
available_sites = site1,site2
mode = manager
multisite = true
pass4SymmKey = $7$lBUz3IZR3TZJeUAdYDUZR4tesE3AL0ttpupYUywS3UrG7PdwqHZ01g==
site_replication_factor = origin:3,site1:3,total:6
site_search_factor = origin:2,total:2

3. server.conf : site1-SH
[general]
serverName = site01_sh01
pass4SymmKey = $7$lX74ABK5XURidryB9htlMI9hsjjZZSq0PulPOi3bCbCziiWrBBnN5g==
site = site1

[clustering]
manager_uri = https://192.168.79.141:8089
mode = searchhead
multisite = true
pass4SymmKey = $7$JZddW4jKx48TGUx03PmTHexz76aYtTwK/aW7cQ9AGFsnZaA++xv1lA==

3. server.conf : site2-SH
[general]
serverName = site02_sh01
pass4SymmKey = $7$zFcBrd6VgPug9rgiJvI+mvRI5H7PRWwuaGgg0HBY0UKp4hTMN1CBmQ==
site = site2

[clustering]
manager_uri = https://192.168.79.141:8089
mode = searchhead
multisite = true
pass4SymmKey = $7$3u+CM93kvNCnGZolsv6K9EdD6fyYpalpNDyfL/+Bq0D0Vuzd5u3kuQ==

PickleRick · ‎07-17-2024

1. There is no such thing as "just raw data" even if a bucket is not searchable it still retains at least default metadata fields and fields extracted in index time.

2. If you want the data to not be shared across sites why not just make two separate clusters?

3. You can't differentiate (site) RF/SF between indexes. You can only enable/disable replication altogether for an index.

4. https://docs.splunk.com/Documentation/Splunk/9.2.2/Indexer/Multisitearchitecture#Multisite_searching...

When there are no primaries in the site for which you have affinity set SH will reach for primaries to another site. That's by design. See p. 2.

View solution in original post

Cloud001 · ‎07-16-2024

.

PickleRick · ‎07-16-2024

Are you sure you have searchable buckets from this site2 index in site1 and the other way around?

 site_search_factor = origin:2,total:2

In this case a bucket originating in site2 will stay in site2. So search will reach across intersite link for primaries since it has no searchable primaries in their own site.

Cloud001 · ‎07-16-2024

A search effector exists on each site.

How do I ensure that only data from the site the SH belongs to is retrieved?

Tom_Lundie · ‎07-16-2024

Hi,

Quoting the docs:

If the cluster is not in a valid state and the local site does not have a full complement of primaries (typically, because some peers on the site are down), remote peers also participate in the search, providing results from any primaries missing from peers local to the site.

Looking at your diagram and search, am I correct in thinking that index=site01_* is only configured on site01 and index=site02_* is only configured on site02? If so, firstly this is a misconfiguration and bad practice. However, it makes sense that your search affinity is not working because you would not have a copy of the data in both sites. Only site02 would have data in site02_* and therefore index=site0* would return data from both sites!

You should be managing your indexes via the cluster manager so that it's consistent. If you want different indexes per site, then you should be using a multi-cluster deployment. If you want to restrict access between sites and search heads then you can use RBAC and search filters. Search affinity is not designed to be a security control and should not be treated as such.

Cloud001 · ‎07-16-2024

This is a temporary index created for testing purposes.

The index is deployed from master.

How do I ensure that only my own site's data is retrieved?

Tom_Lundie · ‎07-17-2024

Okay, everything should be working then...

You can check which search peers returned the event data using the following search:

index=* | stats values(splunk_server) by index

So long as your search factor is met, the values of splunk_server should be the local peer names depending on the SH that you run that from.

You can also check the search logs in the Job Inspector.

What is your overall goal here? As I say, search affinity is not a security control and is only designed to make searches more efficient. All data in site1 is replicated to site2 and vice-versa anyway according to your config.

Cloud001 · ‎07-17-2024

Spoiler

A. The search results are shown below.

B. My goals are as follows

1. site1's SH wants to retrieve only the data that site1's indexer has.

2. site2's SH wants to retrieve only the data that site2's indexer has.

3. site1's indexer stores RAW data from site1 and site2.

4. site2's indexer stores only site2's RAW data.

C. Is it possible to configure the following structure?

D. server.conf Option

1. On site1_SH, there is no difference between the behavior when server.conf is set to site=site0 and when it is set to site=site1.

2. On site2_SH, there is no difference in behavior between setting server.conf to site=site0 and site=site2.

PickleRick · ‎07-17-2024

1. There is no such thing as "just raw data" even if a bucket is not searchable it still retains at least default metadata fields and fields extracted in index time.

2. If you want the data to not be shared across sites why not just make two separate clusters?

3. You can't differentiate (site) RF/SF between indexes. You can only enable/disable replication altogether for an index.

4. https://docs.splunk.com/Documentation/Splunk/9.2.2/Indexer/Multisitearchitecture#Multisite_searching...

When there are no primaries in the site for which you have affinity set SH will reach for primaries to another site. That's by design. See p. 2.

Cloud001 · ‎07-17-2024

I misunderstood search affinity and misunderstood the purpose of multi-site configuration.

Thank you for your kind notice.

gcusello · ‎07-16-2024

Hi @Cloud001,

what are the Replication Factor and the Search Factor?

anyway, usually logs are plicated between the indexers of each site anche between the sites, in this way, you have at least one searcheabel copy (or more) in each site.

e.g. to have two copies of data in each site, you should have:

site_replication_factor = origin:2, site1:2, total:4

for more details see at https://docs.splunk.com/Documentation/Splunk/9.2.2/Indexer/Multisitearchitecture

Ciao.

Giuseppe

Cloud001 · ‎07-16-2024

site_replication_factor = origin:3,site1:3,total:6

site_search_factor = origin:2,total:2

Multisite indexer cluster : Why is data from other sites retrieved?

indexer

Developer Spotlight with Paul Stout

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

Data-Driven Success: Splunk & Financial Services