Deployment Architecture

How to have HA for syslog inputs

ultima
Explorer

Hi.

I was wondering. In a 2 UF, 2 Idexer setup, where UF config looks like this:

[tcpout:SplunkIndexerGroup1]
server = SplunkIndexer1:port,SplunkIndexer2:port
autoLB = true
useACK = true

[tcpout]
defaultGroup = SplunkIndexerGroup1

disabled = false


If i send regular syslog data via UDP to both UF simultaneously, (ie. Cisco ASA logs), they will both forward the log to the inderxers where they both will be stored as duplicates.

How can i make it so that it will only forward 1 of the logs, or know that it is a duplicate and only index one ?

1 Solution

ultima
Explorer

Splunk customer support called me, and we talked about the solution.

As both of the answers have pointed out, it is not possible to have splunk separate the events.

What we did is to install Heartbeat for linux
linux-ha.org/wiki/Main_Page
and split the rsyslog on two servers.

So if one server goes down, the other one will know and take the clusterIP for itself and start the syslog service.
Then the Splunk Forwarder on both servers listen to the /var/log/syslog file and forward it to the indexer.

This will make the UDP log from any source HA and prevent it from beeing duplicated.

Thanks for any response, and thanks to the splunk custom er support for contacting me 🙂

View solution in original post

ultima
Explorer

Splunk customer support called me, and we talked about the solution.

As both of the answers have pointed out, it is not possible to have splunk separate the events.

What we did is to install Heartbeat for linux
linux-ha.org/wiki/Main_Page
and split the rsyslog on two servers.

So if one server goes down, the other one will know and take the clusterIP for itself and start the syslog service.
Then the Splunk Forwarder on both servers listen to the /var/log/syslog file and forward it to the indexer.

This will make the UDP log from any source HA and prevent it from beeing duplicated.

Thanks for any response, and thanks to the splunk custom er support for contacting me 🙂

thepittman
Engager

Do you have any more detailed information for this setup?

0 Karma

halr9000
Motivator

Smooth. 🙂

0 Karma

echalex
Builder

Unfortunately, the problem isn't really in Splunk, so the solution isn't in Splunk either. The two forwarders can not know that the other forwarder has received the event. Neither can the indexers. As soon as the data is duplicated by means of syslog, the events are two separate events. So, if you want to do high availability, you need to do it by how you use syslog. However, as halr9000 points out, you can't really have HA over UDP. You can round-robin the events, but you will not be able to ensure they make it to the syslog server, so you may be missing 50%. At best, you can duplicate the events and hope that at least one server gets it. But then you have to accept the duplicates in Splunk too.

Potentially, you could consolidate the events by forwarding to one forwarder and have a scripted input run uniq on it, but this strikes me as a dirty hack and a very bad idea, AND it won't be highly available.

0 Karma

halr9000
Motivator

I suggest not sending duplicate data in the first place. Instead, consider putting a syslog server in place that can listen on UDP and buffer events to disk. Then install a Splunk forwarder on that system to relay the events to Splunk over TCP / HTTPS. A forwarder can be configured to intelligently send events out to multiple index servers simultaneously. This is important, because the more indexers that can participate in a distributed search, the faster your search will complete. And the forwarder gracefully handles failure scenarios automatically. If one indexer goes down, it will continue sending events to the others. If the network goes down, it will buffer events and send when it comes back up, and so on.

Another solution would be to use a load balancer, but as pointed out in this answer, it's impossible for a load balancer to know if a UDP service is listening.

ultima
Explorer

i need to have HA on all servers. If i use one UDP syslog server, the loggs will be lost if that server goes down. i need to send the UDP logs to two instances, and then forward it.

0 Karma

ultima
Explorer

So, if i forward UDP data to two syslog servers with UF installed, and then use UF to forward this data using TCP to one or more indexers, the data will not be duplicated using the configuration that is mention in the question ?

0 Karma

ultima
Explorer

Yes, title might be confusing.
But yeah, my question would be "how do I do HA for syslog inputs"

0 Karma

halr9000
Motivator

Are you sending data to two UFs as a high availability requirement? If so, you may want to change the title and tags a bit to get more visibility. This isn't a Cisco question, it's a "how do I do HA for syslog inputs" question, if I'm reading you right. Or, how do I de-dupe data question, perhaps.

0 Karma
Get Updates on the Splunk Community!

Part 2: A Guide to Maximizing Splunk IT Service Intelligence

Welcome to the second segment of our guide. In Part 1, we covered the essentials of getting started with ITSI ...

Part 1: A Guide to Maximizing Splunk IT Service Intelligence

As modern IT environments continue to grow in complexity and speed, the ability to efficiently manage and ...

Exporting Splunk Apps

Join us on Monday, October 21 at 11 am PT | 2 pm ET!With the app export functionality, app developers and ...