Is there a way to quantify how much data is sent b...

mattbrowne · ‎01-26-2017

Hi,

I'm at the planning stages of designing a Splunk deployment in our global setup, I've been tasked with making this as lightweight on the network as possible as our WAN links are expensive (time and cost) and I can't get in the way of existing traffic. So I think I need to ignore the best practice examples of having indexers all replicating their data between them as that appears to be all about search performance. We're happy to accept slower searches over less data replication cost.

Please point me at docs if this idea is covered but I haven't found anything myself.

I'm planning the following:

One indexer in each data center around the globe, with hosts sending their logs to their local indexer and nowhere else.
One search head in each data center and users will use the search head nearest to them.

Am I right in thinking that a search head will send a query to each indexer (or should I be saying search peer?) and they will prepare a results set and send back to the requesting search head to collate and presents results to the user.

If that's all true and would work is there a way to quantify how much data is sent between the indexers and search head, is it as simple as just the _raw values that meet the search criteria and the search head does any further processing?

Thanks in advance!

skalliger · ‎01-26-2017

I am not a Splunk certified architect, so I would rather give you some tips instead of giving a complete answer. First tip, I would consider contacting Splunk Professional Services if you are planning such an environment.

Talking about your questions, there are a few things you might want to consider:

You only want to have one indexer per datacenter? Are they going to be VMs? I would atleast set up two indexers in every datacenter as a clustered environment. But this would mean one additional master for every indexer cluster.
Depending on your expected search load, one search head may be insufficient. But talking of a SH cluster, you would atleast need 3 search heads.
When it comes to network traffic, always try to prefer Universal Forwarders over Heavy Forwarders if possible, see this blog plost: http://blogs.splunk.com/2016/12/12/universal-or-heavy-that-is-the-question/
A SH will not send its query to all indexers that exist, depending on your overall setup and configuration. You can define one or more master_uris on your search heads, which means that search head A might have only one master therefore only one indexer cluster to search, but search head B may have defined two master_uris (mutliple stanzas) and therefore being able to search multiple indexer clusters.
When the SH sends a search to its indexers, the search gets split into several parts. Depending on your searches, the load (results) might be very different. Whereas distributed streaming commands are ran on the indexers, centralized and transforming commands are ran on the search heads. For a better understanding, browse the Splunk's conf2016 slides for "Behind the Magnifying Glass: How Search Works".

Edit: typo

Skalli

mattbrowne · ‎01-31-2017

Hi,

Thanks for the reply, we're talking with a splunk representative as well - but its useful / quicker to get the views of the community at times!

I'll take a look at the resources you mention, they seem v. useful.

Thanks

kirilb123 · ‎08-22-2017

I am interested in what was the final solution you have arrived to?

Is there a way to quantify how much data is sent between search head and indexers?

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

Are you a member of the Splunk Community?

Is there a way to quantify how much data is sent between search head and indexers?

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...