I have three geographically separated sites where I am implementing a multisite Splunk Indexer Cluster. The master site will have (1) search head, a (2) clustered indexers, (1) master node, and (1) deployment server. Each of the other two sites will have (1) search head, and a (2) clustered indexers. The main issue I have is ensuring each local indexer cluster indexes only data that is produced in the geographic area in which it is located.
http://docs.splunk.com/Documentation/Splunk/6.2.0/Indexer/Sitereplicationfactor
Make your origin = your total like this:
site_replication_factor = origin:2,total:2
site_search_factor = origin:2,total:2
http://docs.splunk.com/Documentation/Splunk/6.2.0/Indexer/Sitereplicationfactor
Make your origin = your total like this:
site_replication_factor = origin:2,total:2
site_search_factor = origin:2,total:2
this configuration will set buckets to only replicate in the site that they were indexed into. so a forwarding sending data into site1 will not get replicated outside of site1 (seems like this is what you wanted, i just wanted to expand a little more on this setting)
I am aware of this. I have created a separate server class to hold all of my site specific apps/conf files and one of those files (the outputs.conf file) contains the relevant indexers that pertain to the remote site.
So I am pretty sure this all worked. I made the changes in the Master server.conf file and was able to bring all Splunk services back up. The only problem I am faced with now is the remote index cluster replicating over the wire. It is making painfully slow progress that will eventually lead to full replication at all sites.
I have one question. I have pushed the Splunk to a few clients at my remote site via the deployment server located that is located at my main site. Should those new clients be send data to my remote search head or does the replication have to complete first?
Thanks for the assistance.
Tom Forbes
Hey those were just examples... if you carefully read the docs you should be able to get rid of replication across the WAN altogether.
forwarders should always send data to indexers, deployment servers should only be used to manage forwarders... so "new clients send data to my remote search head" doesnt make sense to me.
new clients = forwarders.
I have multiple sites that are separated geographically. So when I refer to a remote search head or remote indexer cluster I mean the server instance that is not part of my main site that hosts (in addition to a search head and indexer cluster) a master node and deployment server.
Depends on what you put in the outputs.conf on the remote forwarders. If you're having them send to indexers in your main site, search heads in your main site should see the data as soon as it arrives. Search heads in the remote site wouldn't see the data at all because you wanted to keep data in the origin site only and in this case the origin will be the main site.
The origin is where the data is first indexed if that helps.
Do I make this change to my master only?
The answer to your question is in the link under syntax:
configure the site replication factor with the site_replication_factor attribute in the master's server.conf file. The attribute resides in the [clustering] stanza, in place of the single-site replication_factor attribute. For example:
[clustering]
mode = master
multisite=true
available_sites=site1,site2
site_replication_factor = origin:2,total:3
site_search_factor = origin:1,total:2
What about the "available_sites" value?
available_sites MUST list all the site names in the multisite cluster as per this: http://docs.splunk.com/Documentation/Splunk/6.2.0/Indexer/Multisiteconffile
To find that I googled "available_sites Splunk".