Solved: Re: Splunk High Availability

cvolcko · ‎05-01-2013

Hello

I am new to Splunk and have a couple demo versions set up for testing. We want to use splunk primarily to log for troubleshooting purposes in our VMware VSphere infrastructure environment

Today we log our ESXi and Nix infrastructure machines --> to a syslog collector UDP 514. This syslog collector sits behind a Netscaler which provides fail-over capability between one or the other if one is down for whatever reason.

As I understand it if we install the universal agent on machines that we can. The universal forwarder can handle the "heartbeating" and fail over to one splunk receiver or the other. If this is accurate, what do we do for machines we can not ,or is not logical to install the Splunk Universal Forwarder. We tend to not want to switch to syslog-ng TCP because of network concerns.

We are trying to get rid of the netscaler in our equation. But as I understand it Splunk does not recommend putting Splunk receivers behind load balancers. Also I think I am correct that in the Splunk world HA is more focused on High Availability of the Data that has been already received/collected.

Do you have any recommendations in our scenario.

Thanks,
-Christian

Navisite, Inc.
A Time Warner Company

jonuwz · ‎05-01-2013

We have our servers spitting out UDP 514 to a load balancer to spread the load over a pair of splunk indexers.

The important thing to remember is that udp is one way.

Your load balancer will not be able to tell if there is something listening on UDP on the target devices.

If the box goes down, yes, a ping test will pick that up. But if the UDP listener goes down - nothing. The traffic will go down a black hole.

We worked around this by having the same process listening on UDP 514 and TCP 514.

That way the load balancer can at least check that the recieving process is up on the target boxes by polling TCP 514. You can still use UDP for the actual messages though.

You will lose any messages until the load balancer determines one process has stopped.

The universal forwarder uses TCP. So I'm not sure if you'll still have the same concerns as your "syslog-ng TCP" solution.

Aside from pre-parsing logs :

syslog-ng PE 4.2 has RLTP and disk buffers, which is approximately what the universal forwarder offers. (minus failover).

rsyslogd does everything that the universal forwarder does in terms of buffer and failover.

View solution in original post

gavind · ‎05-02-2013

This worked with firewall disabled only to find out that we just need to open UDP 514. Thanks.

jonuwz · ‎05-01-2013

We have our servers spitting out UDP 514 to a load balancer to spread the load over a pair of splunk indexers.

The important thing to remember is that udp is one way.

Your load balancer will not be able to tell if there is something listening on UDP on the target devices.

If the box goes down, yes, a ping test will pick that up. But if the UDP listener goes down - nothing. The traffic will go down a black hole.

We worked around this by having the same process listening on UDP 514 and TCP 514.

That way the load balancer can at least check that the recieving process is up on the target boxes by polling TCP 514. You can still use UDP for the actual messages though.

You will lose any messages until the load balancer determines one process has stopped.

The universal forwarder uses TCP. So I'm not sure if you'll still have the same concerns as your "syslog-ng TCP" solution.

Aside from pre-parsing logs :

syslog-ng PE 4.2 has RLTP and disk buffers, which is approximately what the universal forwarder offers. (minus failover).

rsyslogd does everything that the universal forwarder does in terms of buffer and failover.

jonuwz · ‎05-01-2013

Yeah, we have enough devices that you cant install forwarders on, it seemed simpler to come up with a one size fits all solution.

That said, we are going to use forwarders on some boxes where we must get the logs at all costs.

cvolcko · ‎05-01-2013

Hi,

Thank you for the input. Yes today we have rsyslog behind a netscaler and we have syslog running on on 514 TCP/UDP. I did suspect even though Splunk doesn't recommend using LB's I have seen a decent amount of people using them and to my logic if nothing is falling on the floor today with the rsyslog nix boxes, then using a Splunk receiver in place of them it should be a one to one. Thanks for sharing the insight!

Splunk High Availability

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes

Welcome to the Splunk Community!