Getting Data In

Is there a way to quantify how much data is sent between search head and indexers?

mattbrowne
Engager

Hi,

I'm at the planning stages of designing a Splunk deployment in our global setup, I've been tasked with making this as lightweight on the network as possible as our WAN links are expensive (time and cost) and I can't get in the way of existing traffic. So I think I need to ignore the best practice examples of having indexers all replicating their data between them as that appears to be all about search performance. We're happy to accept slower searches over less data replication cost.

Please point me at docs if this idea is covered but I haven't found anything myself.

I'm planning the following:

  • One indexer in each data center around the globe, with hosts sending their logs to their local indexer and nowhere else.
  • One search head in each data center and users will use the search head nearest to them.

Am I right in thinking that a search head will send a query to each indexer (or should I be saying search peer?) and they will prepare a results set and send back to the requesting search head to collate and presents results to the user.

If that's all true and would work is there a way to quantify how much data is sent between the indexers and search head, is it as simple as just the _raw values that meet the search criteria and the search head does any further processing?

Thanks in advance!

0 Karma

skalliger
Motivator

I am not a Splunk certified architect, so I would rather give you some tips instead of giving a complete answer. First tip, I would consider contacting Splunk Professional Services if you are planning such an environment.

Talking about your questions, there are a few things you might want to consider:

  • You only want to have one indexer per datacenter? Are they going to be VMs? I would atleast set up two indexers in every datacenter as a clustered environment. But this would mean one additional master for every indexer cluster.
  • Depending on your expected search load, one search head may be insufficient. But talking of a SH cluster, you would atleast need 3 search heads.
  • When it comes to network traffic, always try to prefer Universal Forwarders over Heavy Forwarders if possible, see this blog plost: http://blogs.splunk.com/2016/12/12/universal-or-heavy-that-is-the-question/
  • A SH will not send its query to all indexers that exist, depending on your overall setup and configuration. You can define one or more master_uris on your search heads, which means that search head A might have only one master therefore only one indexer cluster to search, but search head B may have defined two master_uris (mutliple stanzas) and therefore being able to search multiple indexer clusters.
  • When the SH sends a search to its indexers, the search gets split into several parts. Depending on your searches, the load (results) might be very different. Whereas distributed streaming commands are ran on the indexers, centralized and transforming commands are ran on the search heads. For a better understanding, browse the Splunk's conf2016 slides for "Behind the Magnifying Glass: How Search Works".

Edit: typo

Skalli

0 Karma

mattbrowne
Engager

Hi,

Thanks for the reply, we're talking with a splunk representative as well - but its useful / quicker to get the views of the community at times!

I'll take a look at the resources you mention, they seem v. useful.

Thanks

0 Karma

kirilb123
New Member

I am interested in what was the final solution you have arrived to?

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...