Getting Data In

Recommended ports & best practices for intermediate forwarding?

dineshraj9
Builder

We have requirement to add a Heavy Forwarder tier between Universal Forwarder and Indexers.

Is there a recommended port for communication between UF -> HF?
I know that port 9997 can be used for communication between HF -> IDX.

I am aware that all the above ports are configurable and just wanted to know if there are any recommendations and best practices while setting intermediate forwarding.

0 Karma
1 Solution

sloshburch
Splunk Employee
Splunk Employee

Given all the details you provided (about filtering), I would encourage a slightly different approach, but first, to answer your question.

There is no recommendation around ports. You are welcome to use the default ones. If you prefer security through obscurity, you can use alternate ones. All ports are treated equally so there's not difference there (assuming Splunk is not running as root, you'll be limited to a port above 1024 due to OS restrictions on Unix).

BUT, don't worry about that because the impact of adding a HF in your topology is devastating compared to letting the indexers do the filtering directly. To elaborate:

  • Indexers can do filtering and when they discard data it does NOT count against your license. If you find documentation that says the contrary, let us know. You can validate this by doing a small test of filtering and notice how it doesn't show up in your indexes, not in the index=_internal source=*license_usage* type=Usage.
  • The HF cooks the data. That means it adds event parsing, metadata, and a lot of extra payload. Conversely, the UF does the minimal amount of work to send the data along to the indexers. This is important because it means the HF has a lot more content to send across the wire to the indexers (and compress, assuming SSL) and in turn, the indexers must unpack all of that which is certainly more resource intense than the UF's payload. This may seem subtle but it is often the root cause for indexer performance issues.
  • Think about the way the UF load balances. With many UF sending data, the data gets sprayed across all the indexers pretty evenly. When all the data is bottlenecked by a HF, all that data is concentrated into one stream bouncing amongst indexers. The result is buckets with significant density of data AND uneven data load on indexers. The impact of that is horrible search performance because the map-reduce parallelization of searches is totally minimized. Because of this, we even encourage there to be approx 2x the number of forwarders to indexers to ensure a smooth distribution of data. That may not always be practical, but it helps demonstrate the point made in this bullet.

That was a lot of stream-of-brain pre-coffee so shout if any of it is unclear.

View solution in original post

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Given all the details you provided (about filtering), I would encourage a slightly different approach, but first, to answer your question.

There is no recommendation around ports. You are welcome to use the default ones. If you prefer security through obscurity, you can use alternate ones. All ports are treated equally so there's not difference there (assuming Splunk is not running as root, you'll be limited to a port above 1024 due to OS restrictions on Unix).

BUT, don't worry about that because the impact of adding a HF in your topology is devastating compared to letting the indexers do the filtering directly. To elaborate:

  • Indexers can do filtering and when they discard data it does NOT count against your license. If you find documentation that says the contrary, let us know. You can validate this by doing a small test of filtering and notice how it doesn't show up in your indexes, not in the index=_internal source=*license_usage* type=Usage.
  • The HF cooks the data. That means it adds event parsing, metadata, and a lot of extra payload. Conversely, the UF does the minimal amount of work to send the data along to the indexers. This is important because it means the HF has a lot more content to send across the wire to the indexers (and compress, assuming SSL) and in turn, the indexers must unpack all of that which is certainly more resource intense than the UF's payload. This may seem subtle but it is often the root cause for indexer performance issues.
  • Think about the way the UF load balances. With many UF sending data, the data gets sprayed across all the indexers pretty evenly. When all the data is bottlenecked by a HF, all that data is concentrated into one stream bouncing amongst indexers. The result is buckets with significant density of data AND uneven data load on indexers. The impact of that is horrible search performance because the map-reduce parallelization of searches is totally minimized. Because of this, we even encourage there to be approx 2x the number of forwarders to indexers to ensure a smooth distribution of data. That may not always be practical, but it helps demonstrate the point made in this bullet.

That was a lot of stream-of-brain pre-coffee so shout if any of it is unclear.

0 Karma

dineshraj9
Builder

Thanks Slosh for the detailed response! Just one last question.
If I am doing a lot of filtering at Indexer level will it impact the Indexer performance (increased CPU or indexing lag)?

0 Karma

sloshburch
Splunk Employee
Splunk Employee

Yes. Even a pebble dropping in the air makes a slight impact the wind around it. (how's that for zen!)

Unfortunately, I can't think of a confident means to measure the impact but I would gamble that unless we're talking about regex on every single event and regex that's really poorly written then you may not even be able to notice the difference.

The worst that happens is you make a business case for more indexers, which means you're searching will perform that much better. So yea, there's a cost but there's also benefits. You'll have to decide which approach produces the best net positive for you when everything is taken into account. Fair?

0 Karma

adonio
Ultra Champion

hello there,
haven't seen any best practices around it as well but i think its very safe to keep it at default ports on HF for
have seen it like that in many environments.
best practice is to avoid using an intermediate forwarder between forwarders and indexers
hope it helps

0 Karma

dineshraj9
Builder

My requirement is to filter events at the heavy forwarder, before I send them to the Indexer.

So can both communication happen on 9997?

UF -> HF - 9997
HF -> IDX - 9997

Also, if I don't use a heavy forwarder and do the filtering at Indexers, will it consume any license?

0 Karma

thepittman
Engager

Yes that works

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...