Hi All- I realize a variety of derivations to this question have been asked before but it is still either illogical or not clear to me how to accomplish what I'm seeking.
We would like to create a scenario where we have multiple sites that index their own data (with this server serving as a funnel point for all UFs at that location and also as a SH for the data). I'd like for this data, and only this data, to be searchable locally but have it flow into the enterprise repository where all data is searchable (via a separate SHC) along with data from all the other sites. Reading the documents, it looks like indexandforward is the way to go, but it also appears that there is an indexing cost on both sets indexers.
I was hoping that someone has a solution to perform something like this without having to pay for the exact same data to be indexed twice (and even going into the same indexes, etc.).
Anyone have any thoughts?
Depends on your search types. If your searches are mostly transforming searches, then remote searches will not be a big issue over the WAN because not much data would be returned from the indexers (relative to the total data set). If they're primarily raw searches, then you're right, you'd want that data local. (see https://docs.splunk.com/Documentation/Splunk/8.0.1/Search/Writebettersearches). Good Splunk users will write efficient searches, inexperienced will do things like "index = *" over all time. I've seen users try to search 9 billion events using a simple "stats count" query, and complain they didn't have a search memory quota that allowed it. I explained their quota existed because it was meant to stop people from doing terrible searches like that and then taught them how to do a tstat search...
There's really no way around the licensing with indexandforward, unfortunately (depending on your license of course - an unlimited license would not matter :).)
So, what I am reading is that there is no way to avoid the double license hit if I ingest in two places, even if one is a subset of the other?
Looking at outputs.conf, would it not be possible to configure the indexer to use the forwardedindex..blacklist = settings?
Legally, no, you have to pay for the amount of data ingested. If you ingest it twice (on the same indexer or different indexers), you pay for it twice.
If you blacklist an index, then you won't forward that data and won't be charged for it twice. But I don't think that's what you're looking for (correct me if I'm misinterpreting your request).
If it were me, I would see if having the central SHC reaching out to the remote indexers delivers reasonable performance (remember, the slowest indexer to respond will affect the time gof the search), and what data you really need forwarded. Beware of things that might consume your bandwidth unnecessarily (do you need _internal data forwarded? It's free, but can be big if you have a large environment). You also haven't mentioned what kind of use you have for the data - if you're running Enterprise Security, your performance for your data summarization might be terrible if run over a WAN (and you can't use the summaries generated by the remote site, as each set of accelerated data is accessible by only one search head cluster).
You haven't really said much about quantity of data, and if you want all the data searchable at both sites.
For what you're describing you don't really need to index again all data on a centralised location, you can point your central SH or SHC to search all locations separately. Example you have an Indexer Cluster (or no cluster, doesn't matter) in Asia where you would index data from local servers and a SH local in Asia where you can search for that data, then another Indexer Cluster in Europe and same as Asia, and another in US (doesn't really matter where). Then on a central location you setup a Search Head or a SHC and configure it to join Asia/Europe/US Indexers Clusters or just configure the Indexers as Search Peers if not using Indexer Cluster. This way the central location will be searching all locations but the SH on each location can only search the local data.
See this .conf slides about a similar deployment using ES: https://conf.splunk.com/files/2016/slides/enterprise-security-multi-tenant-fundamentals.pdf. Just use this as an example as the idea works with or without ES.
Hope this helps.
Thanks for the response! Much appreciated.
I am trying to avoid allowing users to execute searches that will effectively go across the WAN to indexers that are not geographically close, mostly because I don't want or need super large result sets going from the indexer (in Asia, for example), all the way SH (in Europe, for example). I'd like to keep the central data center searches hitting central data center indexers, only, and the remote search heads hitting the remote indexers, only.
It makes sense but I don't know if that is a big concern. How much data per day are you talking about?
I know I few customers that used this design to search various small locations between to 5GB to 50GB per day remotely. Not all data is transferred from the Indexers to the Search Heads. It also depends what kind of searches you perform from that central location.
One other idea but I never tried, have multi-site clusters and not use SHC because you would need site affinity on the central location. Example: Asia has a multi-site index where it keeps one copy on the original site and send another copy to the remote site (central located indexer). The local SH in Asia would be site aware and only search in Asia (unless Asia indexer is down, in that case it would search central indexer). And then same thing for Europe, one indexer in Europe, one in central location. This way you would have 2 central located indexers, one in each multi-site cluster. Then you could use the central SH with site affinity, that way it would search only the central located indexers and only the remote indexers in case the central from that cluster is down.