Solved: Re: How to have HA for syslog inputs

ultima · ‎04-08-2014

Hi.

I was wondering. In a 2 UF, 2 Idexer setup, where UF config looks like this:

[tcpout:SplunkIndexerGroup1]
server = SplunkIndexer1:port,SplunkIndexer2:port
autoLB = true
useACK = true

[tcpout]
defaultGroup = SplunkIndexerGroup1

disabled = false

If i send regular syslog data via UDP to both UF simultaneously, (ie. Cisco ASA logs), they will both forward the log to the inderxers where they both will be stored as duplicates.

How can i make it so that it will only forward 1 of the logs, or know that it is a duplicate and only index one ?

ultima · ‎04-14-2014

Splunk customer support called me, and we talked about the solution.

As both of the answers have pointed out, it is not possible to have splunk separate the events.

What we did is to install Heartbeat for linux
linux-ha.org/wiki/Main_Page
and split the rsyslog on two servers.

So if one server goes down, the other one will know and take the clusterIP for itself and start the syslog service.
Then the Splunk Forwarder on both servers listen to the /var/log/syslog file and forward it to the indexer.

This will make the UDP log from any source HA and prevent it from beeing duplicated.

Thanks for any response, and thanks to the splunk custom er support for contacting me 🙂

View solution in original post

ultima · ‎04-14-2014

Splunk customer support called me, and we talked about the solution.

As both of the answers have pointed out, it is not possible to have splunk separate the events.

What we did is to install Heartbeat for linux
linux-ha.org/wiki/Main_Page
and split the rsyslog on two servers.

So if one server goes down, the other one will know and take the clusterIP for itself and start the syslog service.
Then the Splunk Forwarder on both servers listen to the /var/log/syslog file and forward it to the indexer.

This will make the UDP log from any source HA and prevent it from beeing duplicated.

Thanks for any response, and thanks to the splunk custom er support for contacting me 🙂

thepittman · ‎02-20-2019

Do you have any more detailed information for this setup?

halr9000 · ‎04-14-2014

Smooth. 🙂

echalex · ‎04-12-2014

Unfortunately, the problem isn't really in Splunk, so the solution isn't in Splunk either. The two forwarders can not know that the other forwarder has received the event. Neither can the indexers. As soon as the data is duplicated by means of syslog, the events are two separate events. So, if you want to do high availability, you need to do it by how you use syslog. However, as halr9000 points out, you can't really have HA over UDP. You can round-robin the events, but you will not be able to ensure they make it to the syslog server, so you may be missing 50%. At best, you can duplicate the events and hope that at least one server gets it. But then you have to accept the duplicates in Splunk too.

Potentially, you could consolidate the events by forwarding to one forwarder and have a scripted input run uniq on it, but this strikes me as a dirty hack and a very bad idea, AND it won't be highly available.

halr9000 · ‎04-10-2014

I suggest not sending duplicate data in the first place. Instead, consider putting a syslog server in place that can listen on UDP and buffer events to disk. Then install a Splunk forwarder on that system to relay the events to Splunk over TCP / HTTPS. A forwarder can be configured to intelligently send events out to multiple index servers simultaneously. This is important, because the more indexers that can participate in a distributed search, the faster your search will complete. And the forwarder gracefully handles failure scenarios automatically. If one indexer goes down, it will continue sending events to the others. If the network goes down, it will buffer events and send when it comes back up, and so on.

Another solution would be to use a load balancer, but as pointed out in this answer, it's impossible for a load balancer to know if a UDP service is listening.

ultima · ‎04-11-2014

i need to have HA on all servers. If i use one UDP syslog server, the loggs will be lost if that server goes down. i need to send the UDP logs to two instances, and then forward it.

ultima · ‎04-11-2014

So, if i forward UDP data to two syslog servers with UF installed, and then use UF to forward this data using TCP to one or more indexers, the data will not be duplicated using the configuration that is mention in the question ?

ultima · ‎04-10-2014

Yes, title might be confusing.
But yeah, my question would be "how do I do HA for syslog inputs"

halr9000 · ‎04-09-2014

Are you sending data to two UFs as a high availability requirement? If so, you may want to change the title and tags a bit to get more visibility. This isn't a Cisco question, it's a "how do I do HA for syslog inputs" question, if I'm reading you right. Or, how do I de-dupe data question, perhaps.

How to have HA for syslog inputs

disabled = false

Good Sourcetype Naming

See your relevant APM services, dashboards, and alerts in one place with the updated ...

Splunk App for Anomaly Detection End of Life Announcement

Are you a member of the Splunk Community?

How to have HA for syslog inputs

disabled = false

Good Sourcetype Naming

See your relevant APM services, dashboards, and alerts in one place with the updated ...

Splunk App for Anomaly Detection End of Life Announcement