Rsyslog failover and load balancing while forwardi...

simonselvin2019 · ‎10-22-2019

2 heavy forwarders are configured to receive syslog inputs on port UDP / TCP 1600.Linux servers are configured to send the logs on a single dns entry instead of an IP address.The dns entry has been configured using DNS round robin and has the IP's of the 2 heavy forwarders. This is to achieve load balancing and in case if one of the HF fails the dns entry will try the second HF to send the logs.
But this doesn't work in the Rsyslog deamon for Linux as it doesn't attempt for the next HF .If one HF fails it doesn't reach out to the 2nd HF listed in the dns entry as a result the logs are not forwarded in spite of having 2 HF's configured with DNS round robin.
Need advice on this please and if there is a workaround.
We are going agentless for unix servers.

FrankVl · ‎10-22-2019

Yeah, rsyslog only resolves the destination DNS once and not again until restart.

One alternative is to use a network loadbalancer between your syslog sources and the Splunk servers.

Another could be to use a tool like keepalived on the Splunk servers to have them take over each other’s IP address when one goes down.

simonselvin2019 · ‎10-24-2019

Thanks for the response. Thought of the LB option but then the data would travel through the LB.
i was looking for a simple option without adding extra components through configurations.
I tried an option that is to force Rsyslog to send logs using TCP instead UDP and I disabled the UDP data input on both my HF's.
I now enabled the TCP input option on both my HF's
When I disable the TCP input on one of the HF , I see that it starts sending the logs to the next HF automatically with in 30-60 secs.So the Failover is happening , but both the HF needs to be a part of the same network tier the servers are in.I am not sure if this also does the load balancing when both the HF's are enabled as I am not sure if there is a way to check if both the HF's are receiving logs at the same time in a distributed manner.
While this doesn't happen when UDP is enabled . It does not fail over.

FrankVl · ‎10-25-2019

Right, sounds like rsyslog re-resolves the DNS record of its destination when the tcp connection times out. That is something (hopefully it does that before data gets dropped). Maybe you can also tune the time-out duration in the rsyslog config, to make them switch quicker.

Checking whether both HFs receive data in parallel can be done in various ways. You can check the incoming TCP connections on the HF (e.g. using netstat), check for flowing data using tcpdump.

Or look in metrics.log of the respective HFs, to see how much data they are processing for the respective inputs/sourcetypes/etc.

Also: sending data straight to a TCP or UDP input on a HF is not the recommended approach. Best practice is to run a dedicated syslog daemon on the HF server that receives the syslog and writes it to files (e.g. one file per source host, per hour or so) and then let Splunk monitor those files. This not only creates a local cache that can handle short outages of Splunk, but it also helps troubleshooting whether data is coming in (since you can have a look at the files the syslog daemon is writing to).

simonselvin2019 · ‎10-25-2019

Thanks Frank,
Will try the option to look into the metrics.log file.
We have configured the heavy forwarders to receive logs from the unix hosts and forward it to the indexer as we aren't using the splunk universal forwarder like we do for windows.
So these HF's would act as the centralized syslog server, who would accumulate the logs and then send it to the Indexers in real time.
I understood the idea of forwarding the logs to the centralized syslog server and then forward it to the splunk HF/Indexer. Wouldn't there be too many components in this scenario and also i am not sure if the logs if not forwarded by the system it originates from would have a different timestamp/hostname by the time it reaches the HF/Indexers.they could have the timestamp of the centralized syslog server, which is used to forward the data.

FrankVl · ‎10-25-2019

The idea is that you run the syslog receivers on your current HFs. They just take over the UDP/TCP input part, write to file and then the HFs read from file instead of from network.

Rsyslog failover and load balancing while forwarding logs to a FQDN(dns) which has 2 Heavy Forwarder IP's configured in DNS Round Robin

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Unlocking Unified Insights: New Gigamon Federated Search App for Splunk

GA: New Data Management App in Splunk Platform

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation