Getting Data In

Intermediate Heavy Forwarder setup

mxanareckless
Path Finder

There doesn't seem to be a lot of documentation or discussions online which cover the setup of an intermediate, heavy forwarder.

We need this for the following reasons:

* to scrub/anonymize personal information from data coming from universal forwarders

* to reduce load on indexing server, whose parsing queues are consistently full

Here is the deployment:

[uf] > [hf] > [indexer]

Does anybody have example .conf files that would support this? So far, mine look as such:

Universal forwarder's outputs.conf:

[tcpout]
defaultGroup = pspr-heavy-forwarder
[tcpout:pspr-heavy-forwarder]
disabled = false
server = 192.168.60.213:9997

Heavy forwarder's outputs.conf:

[tcpout]
defaultGroup = central-indexer
indexAndForward = false
sendCookedData = true
useACK = true

[tcpout:central-indexer]
disabled = false
server = 192.168.60.211:9997

Indexer's inputs.conf:

[default]
queue = indexQueue

I've directed all universal forwarders to send to the intermediate forwarder, but the main indexer's still showing saturated queues. Local monitoring is limited to Splunk's own logs. Is there a way I can view what exactly is going into these queues, so I know where to chase the problem?

full.PNG

0 Karma
1 Solution

gcusello
Legend

Hi @mxanareckless ,

I'm havinbg a problem similat to your but related to the syslog sending to a third party that you haven't!

there isn't a special configuration on the Heavy Forwarder and you can use the default settings because the limits to bandwidth occupation aren't present in HFs.

There's only two hints I can give you:

  • at first check the throughtput on storage: splunk requires at least 800 (or more) IOPS on disks, how many IOPS have your storage?
  • then check the resources on your Indexers (especially CPUs!): Splunk requires at least 12 CPUs on Indexers and more if you have to index many logs and you have many scheduled searches.

You can measure IOPS using a tool like Bonnie++.

Disks usually are the bottleneck in Splunk architectures.

Then check the resources (always CPUs) on the HFs: I use HFs only if I need to concentrate flows , never to move some jobs from the Indexers to another system, I prefer to give more resources to the Indexers.

Ciao.

Giuseppe

View solution in original post

isoutamo
SplunkTrust
SplunkTrust

Hi

unfortunately I haven’t our configurations on my hands, but those are really simple. 

UF: 

  • point To IHF(s) 
  • useACK = true

IHF:

  • normal input for listening UFs
  • output points to CM with IDX discovery or directly to indexers
  • useACK = true
  • if needed you could add pipelines here

IDX:

  • normal input (without any queue definitions)

Please remember if/when you need to do any props, transforms etc. changes those must be done on the first non UF node. And only indexing queue parts can do on indexer nodes!


r. Ismo 

isoutamo
SplunkTrust
SplunkTrust
And one way to monitor what are happening on those IHFs is add those ass IDX to MC with own group.

gcusello
Legend

Hi @mxanareckless ,

I'm havinbg a problem similat to your but related to the syslog sending to a third party that you haven't!

there isn't a special configuration on the Heavy Forwarder and you can use the default settings because the limits to bandwidth occupation aren't present in HFs.

There's only two hints I can give you:

  • at first check the throughtput on storage: splunk requires at least 800 (or more) IOPS on disks, how many IOPS have your storage?
  • then check the resources on your Indexers (especially CPUs!): Splunk requires at least 12 CPUs on Indexers and more if you have to index many logs and you have many scheduled searches.

You can measure IOPS using a tool like Bonnie++.

Disks usually are the bottleneck in Splunk architectures.

Then check the resources (always CPUs) on the HFs: I use HFs only if I need to concentrate flows , never to move some jobs from the Indexers to another system, I prefer to give more resources to the Indexers.

Ciao.

Giuseppe

mxanareckless
Path Finder

@gcusello 

 

I did suspect this was an NFS issue, as the bottleneck first appeared after I migrated the indexes over. Turns out Splunk was using the slower backup pool, and not the iops-optimized pool. Thanks for pointing me in the right direction!

0 Karma
Get Updates on the Splunk Community!

Improve Your Security Posture

Watch NowImprove Your Security PostureCustomers are at the center of everything we do at Splunk and security ...

Maximize the Value from Microsoft Defender with Splunk

 Watch NowJoin Splunk and Sens Consulting for this Security Edition Tech TalkWho should attend:  Security ...

This Week's Community Digest - Splunk Community Happenings [6.27.22]

Get the latest news and updates from the Splunk Community here! News From Splunk Answers ✍️ Splunk Answers is ...