Solved: HFs as intermediate forwarder behind F5: How to ex...

GaetanVP · ‎01-26-2023

Hello Splunkers,

I am currently using a F5 load balancer in front of two HFs that are used as intermediate forwarders and also doing the parsing jobs for incoming data.

I would like to create (index time) a new field for all logs passing through my HF that can indicate which HF has done the job.

In other words, I want to keep a trace of which HFs was choose for each logs.

I suppose I need to use a props.conf file but I do not where to place it and I do not know how to dynamically set a field = hostname of my machine.

I am using a DS to deploy apps on my HFs by the way and I would like to avoid any custom / manual config on each HF.

Thanks a lot,

GaetanVP

gcusello · ‎01-26-2023

Hi @GaetanVP,

ansewering to your questions:

for my knowledge there isn't an environment variable to assign as a meta value.

yes: the [default] stanza will apply to all logs.

fields.conf: me too: I discovered it when I had a problem like yours .

Ciao.

Giuseppe

View solution in original post

PickleRick · ‎01-26-2023

Just as a side note - if you're receiving tcp/udp on a network port on your forwarder, you can simply set the source for each input (if you have just one, you'll have one) listening on a port. If you don't, it gets named as protocol:port (for example tcp:8514) but you can just set it up to say "forwarder1:8514" and "forwarder2:8514" and you don't have to fiddle with custom fields.

GaetanVP · ‎01-27-2023

Hello @PickleRick,

That's a nice trick indeed but I would not be able to use my DS anymore to deploy inputs.conf on both HF - since forwarder1 and forwarder2 are "hardcoded".

If preferer to keep a generic inputs.conf (being deployed by DS) in my specific app location and have a custom file in /etc/system/local for each HF.

Or maybe I am missing something !

But your proposition is great and can fit a lot of other situations thanks 👍

PickleRick · ‎01-27-2023

You can have several different apps with different definitions of an input and push them to different indexers. That's what serverclasses are for 😉

And try not to touch etc/system/local. Those settings are not overwritable by apps so if you put something there, you can't manage it with DS anymore.

PickleRick · ‎01-26-2023

Expanding on @gcusello 's answer.

While you can add the _meta parameter to your splunktcp input on each HF (but each must be unique so that you can track each HF separately - that's the whole idea here), you have to remember that this setting will overwrite any _meta you might create earlier (on your source UFs). It might not be an issue in your case but it's good to know - I use custom field to distinguish between source environments so it would be unacceptable for me.

And you should _not_ use external LB in front of your HFs. Just use internal Splunk's load-balancing strategy. If you have some external source writing to HEC inputs then by all means - deploy a http LB in front of your HFs but for s2s communication just go straight to HFs.

GaetanVP · ‎01-26-2023

Thanks @PickleRick,

All clear for your first parameter, good to know !

For the second parameter, I am not sure to understand. I have many use cases where a network appliance (where UF cannot be installed) can send udp or tcp logs on a specific port. So instead of targeting a unique HF I target a F5 virtual IP and then load balance to my two Splunk HF - it keeps protocol and port.

What is wrong with that approach ? As far as I know Splunk does not provide a HA feature for HF listening to incoming network data.

Also I see a lot of Splunk Add-On that needs to be installed on a HF to just pull data from Network Appliances but with that pulling mechanism, you always end up with your HF being a Single Point of Failure...

I made this post in the past where we discussed about pulling limitation : https://community.splunk.com/t5/All-Apps-and-Add-ons/Multiple-HF-for-one-Event-Hub-Splunk-Add-on-for...

Thanks a lot for always sharing your knowledge!

gcusello · ‎01-26-2023

Hi @GaetanVP,

as @PickleRick said: the Load Balancer is useful to take syslogs or HEC, it isn't useful for Universal Forwarders that send logs to the HFs because Splunk, for its internal communications, has an auto load balancing feature.

Ciao.

Giuseppe

GaetanVP · ‎01-26-2023

Hello @gcusello, yes understood.

But in my use cases I am talking about logs coming directly from udp or tcp, not from UF.

For instance in the Palo Alto documentation, it is literally said to configure this udp or tcp input (via GUI or CLI) to forward data from Panorama instance to Splunk Indexer or HF :

https://splunk.paloaltonetworks.com/firewalls-panorama.html#gui

So here, if I do not want to have a Single Point Of Failure, I need to have a load balancer between the Panorama machine and my 2 HF listening for incoming network logs.

I also cannot tell my Panorama to send the logs to the two HF otherwise I would end up with duplicated events.

Thanks for your time,

GaetanVP

PickleRick · ‎01-26-2023

Ok. You wrote "intermediate forwarder" which in Splunk terminology means a forwarder which receives data from another forwarder (a setup typically used for gathering data from some isolated environments from which you don't want to open traffic to your Splunk environment directly but instead use this intermediate forwarder placed in DMZ).

Secondly, as I said before, for just receiving tcp/udp streamed "syslogs" UF is enough. You don't need HF.

But it's generally not the best idea to receive network streams directly on Splunk forwarder (performance reasons, no way of recording the network stream metadata - you don't know which IP the event came from, longer interruptions in case of - especially HF - restart).

And always take non-Splunk provided instructions with a grain (or sometimes even a spoonfull) of salt - they are often written by people not very proficient in splunking and might only cover options and scenarios easiest or most typical.

GaetanVP · ‎01-26-2023

Hi again @PickleRick, thanks again for all those information (you even made me laugh with the last sentence) !

So basically for the above Palolo use case, how would you have done it ?

PickleRick · ‎01-26-2023

Remember that if you have a single load-balancer, you have move the SPOF to that LB 🙂

So the question is where you want this SPOF to be (or in other words - which solution you trust the most to be resilient to outage).

Anyway, for syslog there are usually several options and choosing the appropriate one is often down to some local conditions (and sometimes politics :-)).

You can of course receive plain tcp/udp syslog on a network port on a forwarder (again - it doesn't have to be a HF, UF suffices). Apart from the possible performance reasons and the downtime during forwarder restart, you need to have separate inputs on different ports for each sourcetype which quickly becomes hard to manage. But for small installations it can be sufficient. You can put an LB in front of that but typically if you have so much data and it's so important that you want to have LB... you probably want another option for receiving the events than by listening right on the forwarder.

You can use SC4S.

You can use rsyslog receiving syslog and sending events to HEC output(s).

You can use any syslog daemon which writes events to files and have UF pick up events from files (this was the recommended option before SC4S).

In any case, instead of using LB, you can two (or more) syslog receivers on hosts coupled with keepalived or similar solution where you have a floating IP and if one node fails, the IP fails over to another one (kinda like what LB cluster should do if you have one).

GaetanVP · ‎01-27-2023

Hello @PickleRick,

Haha you're right... My LB is now the SPOF ! I have never thought about that... 🤔

I will definitely take a look at Splunk Connect for Syslog, I have never tried it for now

Thanks again for your help,

GaetanVP

PickleRick · ‎01-27-2023

Still, just one SC4S (or any other syslog server) will still be a SPOF. Designing HA solutions (especially if protocols you're using are not "ha-friendly") is not that easy 😉

gcusello · ‎01-26-2023

Hi @GaetanVP,

you approach is correct and it's the one I usually try to apply in all my projects.

Ciao.

Giuseppe

GaetanVP · ‎01-26-2023

Ok thanks @gcusello, I will try to use the answer you gave above

Bye,

GaetanVP

gcusello · ‎01-26-2023

Hi @GaetanVP,

you have to create a meta data adding to each HF an inputs.conf different:

[default]
_meta = hf:HF1

where hf is the field name and HF1 (or HF2) is the field value.

then on the indexers you have to add a conf file called "fields.conf" containing:

[foo]
INDEXED = true

Ciao.

Giuseppe

GaetanVP · ‎01-27-2023

Hello @gcusello,

I tried your solution but it is not working...

In the fields.conf the [foo] stanza should be [hf] here right ?

Do we agree that afterwards, I should see the hf fields/value when I do a search right ?

Thanks a lot,

GaetanVP

gcusello · ‎01-27-2023

Hi @GaetanVP,

sorry, my mistake:

[default]
_meta = hf::HF1

Ciao.

Giuseppe

GaetanVP · ‎01-27-2023

Please do not excuse !

It's working like a charm now 👍

Thanks,

GaetanVP

GaetanVP · ‎01-26-2023

Hello @gcusello,

Ok it means I cannot have a generic inputs.conf to be deployed on both HF right ?

Because HF1 is basically an hardcoded value, there is not Splunk variable to say "for this fields look at the hostname of the machine itself" right ?

If I want to add the metadata to all my logs I can create the inputs.conf files in /ect/systen/local path of the HFs ? The [default] stanza will apply to all logs ?

I've never worked with fields.conf... I will take a look at it thanks

gcusello · ‎01-26-2023

Hi @GaetanVP,

ansewering to your questions:

for my knowledge there isn't an environment variable to assign as a meta value.

yes: the [default] stanza will apply to all logs.

fields.conf: me too: I discovered it when I had a problem like yours .

Ciao.

Giuseppe

HFs as intermediate forwarder behind F5: How to extract hostname for all logs?

heavy forwarder

intermediate forwarder

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes