As the titles suggests, I'm looking into whether it's possible or not to load balance Universal Forwarder hosts that are also hosting rsyslog.
I want to pointedly ask, Is there anyone here doing something like this?
The rsyslog config on each host is quite complex.. I'm using 9 different custom ports for up to 20 different source devices. If you are curious its setup like such: port xxxx used for PDU's, port cccc used for switches, port vvvv for routers, etc, etc. The Universal Forwarders then sent the data directly to Splunk Cloud.
It's likely not the best, and is certainly not pretty but it gets the job done. Currently there is 2 dedicated UF hosts for two physical sites. These sites are being combined into a single colo, hence the LB question.
Thanks!
So Im understanding here that yeah, syslog itself doesn't lend well to being balanced. Splunk Universal forwarder should not be used to receive Syslog traffic, but a Heavy forwarder "can".
And if I wanted to load balance a pair of Splunk UF's then I should setup a real load balancer.. And if I want to LB syslog, should also setup a real LB and have the syslog and splunk functions on dedicated hosts. this look accurate?
The question is whether you really want LB or do you mistake it with HA.
And receiving syslogs directly on Splunk box regardless of if it's UF or HF is not a great idea.
No not confusing the two.. I'm well aware of the differences. My scenario is that I have about a hundred devices all sending syslog data between two receivers in two sites currently. Which is then picked up by two UF's and then forwarded to Splunk Cloud.
When those two sites get rolled up together into a single colo I'll need to combine them (for lack of better words). Hence the load balancing. HA would be fine, if I could determine if both UF and rsyslog could be operated in a high availability setting. Pretty sure the former is possible.. the latter though not from what I understand.
Ok, from the start. The source.
Syslog is meant as a simple, low overhead protocol for sending data "locally".
The source typically can send events to one destination. Some sources can send the same event to multiple destinations at the same time (sometimes in different formats).
And that's it.
You can't do any load-balancing on the source level. At least I've never seen a source capable of something like that and I've seen quite a few.
Oh, and if you're sending udp syslogs, you can't verify reachability.
So any syslog solution you come up with will have a single receiving point (for a given source). You might try to do HA setup with multiple receivers in active-passive setup but that's it. I don't think anyone bothered to try to implement a network-level loadbalancer for syslog (especially that syslog is usually meant to be sent only across your local network segment; it's generally not a good practice to route syslog events across big networks).
After this point you can do load-balancing to downstream components.
You can't load balance syslog traffkc without an external load-balancer (and even then syslog traffic doesn't load balance very well as load balancers typically don't speak syslog; you can do your own rsyslog-based load balancer but then you're introducing another SPOF).
You can do a relatively-high-availability setup with an active-standby setup and a floating IP using keepalived or similar solution. It's still not 100% foolproof and you'll have data loss when there is a failure of the primary node before the IP falls over to secondary node (and tcp connections time out/get reset).
As @scelikok pointed out, it's not the scope of the UFs. Maybe you could change from UFs to HFs and have the job done? Because with HFs, you could receive the data from TCP/UDP ports, make some transformations or discards or data, and then, send it to Splunk.
Kind regards,
Rafael Santos
I knew HF's could handle custom ports but our team has been limited to UFs for over a year now. I'd be tickled pink to ditch rsyslog entirely... I should have perhaps asked a different question. Replacing syslog with a Heavy Forwarder.
Splunk recommends not using a forwarder as a syslog receiver because it can lead to data loss.
The preferred method is to use a dedicated syslog server (syslog-ng or rsyslog) to write syslog events to disk files and have a UF monitor those files and forward the contents to your indexers. Another option is Splunk Connect for Syslog (SC4S), which wraps around syslog-ng and eliminates the need for a forwarder.
Finally got around to reading about SC4S and I'm fairly certain this is the route forward. I got the ok to replace the pair of UF's with a Heavy if needed. But I have a good feeling that the SC4S will be able to handle things very well.
Ive read that too and it was kinda ambiguous to me. The same linux server is running an instance of Splunk UF as well as rsyslog. The latter is writing to log files that are listed in the formers inputs.conf.
If this is the exact scenario that Splunk says is not good to, they should word it a bit better.
The syslog daemon would have to write to logs files across the network.. or the UF would have to reach out to read the log files remotely. Either of those ideas seemed like a bad thing.
In any case.. with our impending colo changes I'm going to have to refactor these hosts. So if separating the services is the best path forward that's what we'll do. But I will still need to LB a pair of forwarders.. unless someone knows how well the UF and/or HF scale hardware-wise? Like if I throw 8 cores, a fistfull of memory and 25gigs of network throughput, can a single Splunk host process enough data to keep up with say 100k events per day?
What Splunk says to avoid is having any Splunk instance listen on a TCP/UDP port for syslog data. Whenever Splunk restarts any data sent to the port will be lost until Splunk comes back up, which could be minutes. A dedicated syslog receiver is much faster to restart.
The problem could be alleviated somewhat by fronting the Splunk TCP port with a load balancer.
If you plan to refactor, consider putting multiple SC4S instances (they're Docker containers) close to the syslog sources.
Yup, And these Splunk instances aren't listening on tcp/udp ports so that's good.
Hey @scelikok Thanks for the pointer.. I realize my question might fall into a grey area, I hope others who may have insight or experience still chime in. Learning things about load balancing just the UF will also be super helpful!
Hi @Skeer-Jamf,
This is out of Splunk context but you can check Linux Keepalived service for redundancy. Keepalived supports active/passive failover mode and load balancing setup is possible.
It creates and manages virtual IP address that forwards the incoming traffic to healthy backend servers.