I am planning a multisite architecture.
I have 3 sites in 3 different locations (different countries across Europe) and the first thing I need is to store local indexed data locally on each site (so that the data will not go through internet while indexing or replicating, only while searching) but the second thing I need is to be able to run searches across all 3 sites from a single search head, located at one site.
The first architecture plan I came up with is one single indexing cluster (with 2 or 3 indexing peers located at each site and master node+search head located at one site) but I am not sure if it is possible to set up indexing cluster replication that way so it will replicate indexed data only across local indexing peers at each site.
The second architecture plan is 3 separate single-site indexing clusters (with 2-3 indexing peers and cluster master) and one search head at one of 3 sites, but here I am not sure if it will be possible to run searches from the single search head across all 3 singlesite indexing clusters.
Splunk gurus please help me to come up with what of these 2 architecture plans will work properly and which of them would be better to chose according to my described preferences?
If you don't require replication across sites, you do not need to (and probably should not) set up a multi-site cluster. Instead, configure three individual indexing environments (clustered or not depends on your HA/DR requirements).
There is no problem configuring a search head to search across multiple indexing environments. If you go with 3 indexer clusters, just configure a search head with the three cluster masters. The only thing you need to make sure is that you have WAN connectivity between search heads and cluster masters/indexers on port 8089.
There is also no issue in configuring search across both clustered and non-clustered environments. All of this is documented in detail here.
The only thing you really need to be concerned about, is how the WAN latency will affect your overall search performance.
I would also take a closer look at the Distributed Deployment Manual for some basics, if you haven't already read it.
Hope this helps!
As ssievert says, if you don't need replication across sites, simple is better. For instance, that way you don't need to bother about search affinity, site specific considerations etc.
However, you might want to consider the replication factor. When your environment grows, and your HA requirements with them, it might be more cost-effective to use multi-site clustering. This is because it allows you to set replication within your sites. It can make quite a bit of difference in the amount of storage you need. There is not a rule-of-thumb here, so I would discuss this in detail with your splunk architect.