Solved: Search over indexed data.

yunit11 · ‎05-30-2017

We have a headquarters in US and subsidiaries in Africa and the MESA region. They're connected with not very much reliable VPN channels. There are approximately 50 endpoints in each region. What we want to achieve is the ability to search ONLY local data in each of the subsidiaries region. While the headquarter search head should be able to search over aggregated data from subsidiaries as well as headquarter data itself. The data from subsidiaries must be available 24/7 from the headquarter's search head, although it could be not the newest.

We tried two approaches:

1. Universal Forwarders were installed on endpoints to forward data to the heavy-forwarder in each region with local indexing enabled. Data from heavy-forwarder is sent to the headquarter's indexer. There are search heads in Africa and MESA region as well as search head on headquarter. But the drawback of such an approach is the double-spend of the license on indexing.

2. Multisite cluster. Same Universal Forwarders on endpoints that forward data to local indexer. Replication settings: site_replication_factor = origin: 1, site1: 1, total: 2. site_search_factor = site_replication_factor. There is a single search heads in each region. In that configuration data is being replicated from subsidiaries to the headquarter eliminating double-spending of the license. Headquarter's search head is able to search over the aggregated data. But we don't want to allow to search over the data that does not belong to the origin (prohibit to search MESA region data from Africa's search head). There is an option to whitelist only selected indexers ( srchIndexesAllowed = <string>). But we want to configure access control centrally, i.e. in US headquarter.

skalliger · ‎05-31-2017

Actually, a multisite cluster is a bad decision for your scenario. The idea of a multisite cluster in Splunk is to replicate data amongst several locations/data centers.

This is, clearly, not what you want to have. So, here is what I would do:

One indexer cluster per region (2x idx)
Put one of the indexers physically into your HQ's data center.
Join the region specific Search Heads to one specific indexer cluster.
Your HQ Search Heads will be the only search heads that are allowed to join all the indexer clusters and thus, search all the data. Reference: https://docs.splunk.com/Documentation/Splunk/6.6.0/Indexer/Configuremulti-clustersearch

Any more questions?

Edit: Example below.
- 1 indexer cluster (2x idx) in Africa, 1 SH (cluster would require 3 SHs atleast) in Africa, only joining that idx cluster
- 1 indexer cluster (2x idx) in MESA. 1 SH in MESA only joining that idx cluster
- one of each indexers will be physically at your HQ, so you will need to have a SF and RF of 2 in case of a VPN outage.

Skalli

View solution in original post

skalliger · ‎05-31-2017

Actually, a multisite cluster is a bad decision for your scenario. The idea of a multisite cluster in Splunk is to replicate data amongst several locations/data centers.

This is, clearly, not what you want to have. So, here is what I would do:

One indexer cluster per region (2x idx)
Put one of the indexers physically into your HQ's data center.
Join the region specific Search Heads to one specific indexer cluster.
Your HQ Search Heads will be the only search heads that are allowed to join all the indexer clusters and thus, search all the data. Reference: https://docs.splunk.com/Documentation/Splunk/6.6.0/Indexer/Configuremulti-clustersearch

Any more questions?

Edit: Example below.
- 1 indexer cluster (2x idx) in Africa, 1 SH (cluster would require 3 SHs atleast) in Africa, only joining that idx cluster
- 1 indexer cluster (2x idx) in MESA. 1 SH in MESA only joining that idx cluster
- one of each indexers will be physically at your HQ, so you will need to have a SF and RF of 2 in case of a VPN outage.

Skalli

yunit11 · ‎05-31-2017

Thanks, the configuration looks a bit simplier than a multisite solution. The only problem I see here is the master-node management. If I understand correctly, it would require master-node in each cluster. How to manage apps in that case? I think we can employ Deployment Server to provision apps to master-nodes, that in turn would push apps to indexer cluster. Should we configure master-nodes as Deployment Clients in that case?

skalliger · ‎05-31-2017

The answer kind of depends on the deployment.

First, it's correct that you would need one master node for each indexer cluster.

Second, you may have only one deployer for multiple SH clusters. This is only possible if all of your Search Heads get the same apps/configurations. Otherwise you need one deployer for each Search Head cluster.
Reference: http://docs.splunk.com/Documentation/Splunk/6.5.0/DistSearch/PropagateSHCconfigurationchanges#Deploy...

Third, your explanation is correct. You could set the master and the deployer to a deployment client and push configurations from the deployment server to the master and deployer. And from there, to the SHs and indexers.

Edit: If you are not using SH clusters at all, deploying the configurations with the deployment server to the standalone Search Heads is alright then.

woodcock · ‎05-30-2017

You are forgetting BY FAR the cheapest option: site-specific Indexers w/ site-specific Search Heads plus a corporate Search Head.

You setup each region's Forwarders to send ONLY to that region's Indexer tier. You setup each region's single Search Head to peer to ONLY that region's Indexer tier. Then you setup a global/universal Search Head that peers with ALL the Indexers. We have done this several times for clients and it works very well.

yunit11 · ‎05-30-2017

Hello, thanks for reply.

What did you mean by site-specific search head? Is it just a single search head attached to the indexer in a non-clustered configuration?
If I'm right, then the main issue with such approach is that data will not be available from headquarters's search head if VPN tunnel is down. We should avoid this as much as possible.

woodcock · ‎05-30-2017

Yes, this is true, it won't work if VPN is down. I thought that your VPN comment was regarding the reliability (timeliness) of transport of the Forwarder data to the Indexers.

Search over indexed data.

license

search head

universal forwarder

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

ATTENTION: We’re Moving! (AGAIN!)

Deep Dive: Optimizing Telemetry Pipelines in Splunk Observability Cloud

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation