we used a F5 load balancer in front of 2 Intermediate Forwarders, to receive syslog messages.
where can I investigate this issue?
Your choices are to work on the F5 LB, speak to your network team for the VIP/pool/failover/irules config and test it out as to work works best.(Its not my area of experteise, I'm just concept aware)
Note:The Splunk UF is not a load balancer in the networking sense (It contains Auto Data Loadbalancing function to spray the data across multiple indexers if you have miltiples of them, and to even the data out, its not desinged failover based on load to another UF). The UF is an agent to collect data and send it to Splunk.
This sounds like and LB issue and not Splunk.
As to why your F5 is not switiching it might be due to the continuous stream of syslog data being sent, so therefore you will need check your F5 LB conifg options such as round-robin/least connections etc, and ensure its configured for Layer 4 routing and test it out.
When using Splunk instances such as HF's as syslog receiver's its generally for testing and non-production enviroments.
Why, because if you restart the HF you will loose data for UDP sources, syslog is Fire and forget and Syslog as a protocol is not ideal for load balancing, so if you can live with the fact you can lose data then so be it. Other issues you can get are, data imbalance on the indexers,data not being parsing correctly as the TA's need reconfiguring to handle sourcetype / parsing when sending syslog to Splunk receiver ports.
The best practise for Splunk production enviroments and syslog data are Splunk SC4S and if HA is required then look at KeepaliveD(Layer 4) or Vmotion for HA. SC4S can handle the data and apply metadata for parsing and many other features to effectivly handle common syslog data. LB and HA are two different concepts.
thank @deepakc for your reply.
on our deployment, we are using UF on our rsyslog box and every data source is sent to ryslog server on a specific file, then we use "Monitor Files & Directories" as data input.
as you mentioned "F5 is not switching it might be due to the continuous stream of syslog data being sent"
-i believe the solution to LB issue is to increase the size of these files on UF itself so in this scenario the second UF will work only if first one down because of continuous stream of syslog data
-other suggestion, can we configure our LB to achieve the below challenge:
if the first Universal Forwarder becomes overwhelmed by the continuous stream of syslog data, another UF can take over and handle the load.
please advise with the best practice in this scenario.