We have observed yesterday that there was around 90+% of indexing queue on our indexers.
This resulted in failed connections between Heavy Forwarders (HF) and Indexers.
Once the indexing queue receded, data from HFs started flowing to indexers and data was then written to disks.
I have a few questions regarding this :
Due to MAJOR improvements in the S2S and the Universal Forwarder build, if you are on v6 (particularly later versions of v6), then you should only be using HFs for things like DBConnect
. For things like syslog
, you should DEFINITELY be using a Universal Forwarder
. This is the answer to #3.
This is our infrastructure
Servers -> UF -> HF -> Indexers
Desktops -> UF -> HF -> Indexers
Syslog Servers -> HF -> Indexers
DBConnect HF -> Indexers
We are in version 6.4.4
Your architecture is very v4 and is now an albatross around your bottleneck. In the updated v6 hotness it should be like this:
Servers -> UF -> Indexers
Desktops -> UF -> Indexers
Syslog Servers -> UF -> Indexers
DBConnect HF -> Indexers
The key on all the UFs is to set autoLB=true
and also EVENT_BREAKER
for every input to ensure proper balancing. Do not use external Load Balancers, either.
Thank you for your inputs @woodcock , is there any documentation where this is published, so that i can take a look, read through and proceed on making these major changes.
Looking by the response, you are asking me to remove the HF tier completely. Am i getting this right ?
AutoLB is true with Indexer ACK enabled.
Keep HF for DBConnect only and yes, ditch the rest. The documentation about this evolution is not as clear as it should be but all of the testing that I have seen mirrors the PS scuttlebutt/buzz that I have been hearing about best practices having evolved to disclude HFs except in very (few) extreme circumstances. Here are a few places where there is some documentation:
https://www.splunk.com/blog/2014/03/18/time-based-load-balancing.html
http://docs.splunk.com/Documentation/Forwarder/6.6.0/Forwarder/Configureloadbalancing
https://docs.splunk.com/Documentation/Splunk/6.6.0/Admin/Outputsconf
forceTimebasedAutoLB = [true|false]
* Forces existing streams to switch to newly elected indexer every
AutoLB cycle.
* On universal forwarders, use the EVENT_BREAKER_ENABLE and
EVENT_BREAKER settings in props.conf rather than forceTimebasedAutoLB
for improved load balancing, line breaking, and distribution of events.