Getting Data In

Multisite indexer cluster : Why is data from other sites retrieved?

Cloud001
Explorer

Why is data from other sites retrieved?
 1. splunk version  9.2.1 
 2. server.conf : manager-node
     [general]
     serverName = site01_master
     pass4SymmKey = $7$50dW7T6+mDkef5xS4o2BemFWDAur04JWlGHTwFKCNHAXuGtkZkOaEg==
     site = site1

     [clustering]
     available_sites = site1,site2
     mode = manager
     multisite = true
     pass4SymmKey = $7$lBUz3IZR3TZJeUAdYDUZR4tesE3AL0ttpupYUywS3UrG7PdwqHZ01g==
     site_replication_factor = origin:3,site1:3,total:6
     site_search_factor = origin:2,total:2

3. server.conf : site1-SH
     [general]
     serverName = site01_sh01
     pass4SymmKey = $7$lX74ABK5XURidryB9htlMI9hsjjZZSq0PulPOi3bCbCziiWrBBnN5g==
     site = site1

     [clustering]
     manager_uri = https://192.168.79.141:8089
     mode = searchhead
     multisite = true
     pass4SymmKey = $7$JZddW4jKx48TGUx03PmTHexz76aYtTwK/aW7cQ9AGFsnZaA++xv1lA==

      3. server.conf : site2-SH
     
[general]
     serverName = site02_sh01
     pass4SymmKey = $7$zFcBrd6VgPug9rgiJvI+mvRI5H7PRWwuaGgg0HBY0UKp4hTMN1CBmQ==
     site = site2

     [clustering]
     manager_uri = https://192.168.79.141:8089
     mode = searchhead
     multisite = true
     pass4SymmKey = $7$3u+CM93kvNCnGZolsv6K9EdD6fyYpalpNDyfL/+Bq0D0Vuzd5u3kuQ==

 

Cloud001_0-1721129478148.png

 

Labels (1)
Tags (2)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

1. There is no such thing as "just raw data" even if a bucket is not searchable it still retains at least default metadata fields and fields extracted in index time.

2. If you want the data to not be shared across sites why not just make two separate clusters?

3. You can't differentiate (site) RF/SF between indexes. You can only enable/disable replication altogether for an index.

4. https://docs.splunk.com/Documentation/Splunk/9.2.2/Indexer/Multisitearchitecture#Multisite_searching...

When there are no primaries in the site for which you have affinity set SH will reach for primaries to another site. That's by design. See p. 2.

View solution in original post

Cloud001
Explorer

.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Are you sure you have searchable buckets from this site2 index in site1 and the other way around?

 site_search_factor = origin:2,total:2

In this case a bucket originating in site2 will stay in site2. So search will reach across intersite link for primaries since it has no searchable primaries in their own site.

0 Karma

Cloud001
Explorer

A search effector exists on each site.

How do I ensure that only data from the site the SH belongs to is retrieved?

 

1000005377.jpg

0 Karma

Tom_Lundie
Contributor

Hi,

Quoting the docs:

If the cluster is not in a valid state and the local site does not have a full complement of primaries (typically, because some peers on the site are down), remote peers also participate in the search, providing results from any primaries missing from peers local to the site.

Looking at your diagram and search, am I correct in thinking that index=site01_* is only configured on site01 and index=site02_* is only configured on site02? If so, firstly this is a misconfiguration and bad practice. However, it makes sense that your search affinity is not working because you would not have a copy of the data in both sites. Only site02 would have data in site02_* and therefore index=site0* would return data from both sites!

You should be managing your indexes via the cluster manager so that it's consistent. If you want different indexes per site, then you should be using a multi-cluster deployment. If you want to restrict access between sites and search heads then you can use RBAC and search filters. Search affinity is not designed to be a security control and should not be treated as such.

0 Karma

Cloud001
Explorer

This is a temporary index created for testing purposes.

1000005380.jpg

The index is deployed from master.

How do I ensure that only my own site's data is retrieved?

0 Karma

Tom_Lundie
Contributor

Okay, everything should be working then...

You can check which search peers returned the event data using the following search:

index=* | stats values(splunk_server) by index

So long as your search factor is met, the values of splunk_server should be the local peer names depending on the SH that you run that from.

You can also check the search logs in the Job Inspector.

What is your overall goal here? As I say, search affinity is not a security control and is only designed to make searches more efficient. All data in site1 is replicated to site2 and vice-versa anyway according to your config.

0 Karma

Cloud001
Explorer
Spoiler
 

A. The search results are shown below.

Cloud001_0-1721228992664.png

B. My goals are as follows

1. site1's SH wants to retrieve only the data that site1's indexer has.

2. site2's SH wants to retrieve only the data that site2's indexer has.

3. site1's indexer stores RAW data from site1 and site2.

4. site2's indexer stores only site2's RAW data.

Cloud001_1-1721229006385.png

 

C. Is it possible to configure the following structure?

 

Cloud001_1-1721230316588.png

D.  server.conf Option

1. On site1_SH, there is no difference between the behavior when server.conf is set to site=site0 and when it is set to site=site1.

2. On site2_SH, there is no difference in behavior between setting server.conf to site=site0 and site=site2.

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

1. There is no such thing as "just raw data" even if a bucket is not searchable it still retains at least default metadata fields and fields extracted in index time.

2. If you want the data to not be shared across sites why not just make two separate clusters?

3. You can't differentiate (site) RF/SF between indexes. You can only enable/disable replication altogether for an index.

4. https://docs.splunk.com/Documentation/Splunk/9.2.2/Indexer/Multisitearchitecture#Multisite_searching...

When there are no primaries in the site for which you have affinity set SH will reach for primaries to another site. That's by design. See p. 2.

Cloud001
Explorer

I misunderstood search affinity and misunderstood the purpose of multi-site configuration.

Thank you for your kind notice.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Cloud001,

what are the Replication Factor and the Search Factor?

anyway, usually logs are plicated between the indexers of each site anche between the sites, in this way, you have at least one searcheabel copy (or more) in each site.

e.g. to have two copies of data in each site, you should have:

site_replication_factor = origin:2, site1:2, total:4

for more details see at https://docs.splunk.com/Documentation/Splunk/9.2.2/Indexer/Multisitearchitecture

Ciao.

Giuseppe

Cloud001
Explorer

site_replication_factor = origin:3,site1:3,total:6

site_search_factor = origin:2,total:2

0 Karma
Get Updates on the Splunk Community!

Customer Experience | Splunk 2024: New Onboarding Resources

In 2023, we were routinely reminded that the digital world is ever-evolving and susceptible to new ...

Celebrate CX Day with Splunk: Take our interactive quiz, join our LinkedIn Live ...

Today and every day, Splunk celebrates the importance of customer experience throughout our product, ...

How to Get Started with Splunk Data Management Pipeline Builders (Edge Processor & ...

If you want to gain full control over your growing data volumes, check out Splunk’s Data Management pipeline ...