I have multiple forwarders and an indexer cluster. If the syslogs source devices were to send syslogs to both forwarders, would the syslogs be duplicated in an index defined in the cluster or is there a way to stop that from happening?
Normally I would have suggested using a load balancer or anycast but that solution is not available for us yet. We want to ensure no syslogs are lost even if one forwarder goes down but we don't want duplicates either.
I spent some time researching the exact issue. In the end the team decided against any complexity at this point, so we are going with two syslog servers behind a load balancer.
When digging around, I did find a few resources that may be of interest.
One is an article from 2014, The State of Highly-Available Syslog, which also mentioned the second solution: a blog entry and an open source tool from 2013, Overengineering Syslog: Redundancy, High Availability, Deduplication, and Splunk.
I have not used duplog, so can't comment on that.
If you're sending same data to both forwarders, the data would get duplicated. Splunk doesn't have any ways to know if the incoming data is already indexes and avoid duplicate data indexing. This has to be sorted out before it's sent to Indexers (or forwarder for that matter). Best option was to setup a load balancer if it was available.