Getting Data In

heavy forwarder network traffic requirements

Contributor

hey guys -

a few questions we've got regarding the forwarding of indexed data between a heavy forwarder and indexers. we are deploying splunk across several regions which are connected across WAN links, and therefore are trying to be mindful in how we deploy the forwarders and what do expect.

  1. is there a formula for calculating bandwidth requirements for forwarded indexes? (i.e. the ratio of total index size to transfer size if using encryption and/or compression)
  2. when forwarding indexed data between indexers – is the transfer steady or bursted (scheduled and/or regular intervals)? (using this for WAN link capacity planning)
  3. is there some rule of thumb or calculation for calculating search traffic bandwidth utilisation to search peers? (would like to use this for determining whether to place indexers at remote branch sites or colocate them at the head office site)

Communicator

A heavy forwarder is aware of props, and therefore does compression of data before sending it. It's really hard to determine your bandwidth requirements, since it is completely dependent on your environment, what you're logging, is it raw log files, or scripted inputs, how many forwarders, what type of OS, chattiness of logs on that forwarder, etc...

That being said, Splunk is quite tolerant of network slowdowns/outages, especially when using multiple indexers and doing auto loadbalancing, which is highly recommended when you need to scale out indexers, or to ensure that your data is always being received somewhere.

Contributor

thanks adam. to clarify:

1 - i'm mostly curious as to what compression is used, to estimate the actual transfer sizes (regardless of speed).
3 - i'd like to see some sizing guidance regarding search traffic. for instance - what does an average splunk user generate per day in terms of search data - including queries and responses (there must be one because there is a one CPU per user guideline already).

as for #2 - this may already be answered... according to splunk, this is pretty much a FIFO queue so that as fast or steady as the data comes in, so will the data be put out.