Getting Data In

Forwarder Redundancy

melonman
Motivator

Hi,

I want to know the best practice and patterns that makes Forwarders highly available and redundant.
- SH pooling for Search Head redundancy,
- Indexer's Index&Forward (Replication) for Indexer redundandy,
- AutoLB provides HA for connection between Forwarders and indexers.

However, Forwarder itself looks Single Point of Failure.

How do people configure forwarders to eliminate Single Point of Failure?

Thanks,

Tags (3)
1 Solution

rturk
Builder

I have recently had to roll out a Distributed Deployment with HA & fault-tolerance considerations in mind, and this was one of the concerns that was raised. You can address this a number of ways.

Intermediary Heavy Forwarders - In the design that I settled on I essentially had two types of heavy forwarders.

  • Aggregating Heavy Forwarders - Placed in a POP or a network segment, they receive "local" data before forwarding cooked events on to the Load Balancing Heavy Forwarders. They not only accept data from Universal Forwarders, but have network ports (such as UDP514 for Syslog) opened for collection as well.
  • Load Balancing Heavy Forwarders - Essentially a gateway to your "farm" of indexers, with AutoLB configured to spread cooked events from the Aggregating Heavy Forwarders between indexers.

This approach was also attractive as it greatly simplified the network/firewall configuration required for the environment to facilitate indexing data from many environments. Also configuring "sufficient" buffering on all forwarders guarded against outages (your environment and data volumes will determine what's appropriate for you here).

Another benefit of this architecture was the ability to have intermediary deployment servers on the Aggregating Heavy Forwarders (this may not apply to your environment).

AutoLB to Multiple Forwarders - You can configure your Universal Forwarders to send data to one or many intermediary forwarders. With multiple forwarders as targets, the UF will detect (via heartbeats) when one of them has failed before redirecting data to the active forwarders until the failed forwarder returns.

If you're referring to the redundancy of the Universal Forwarder on the servers themselves, most OS'es support the ability to restart failed processes automatically, and Splunk keeps track of what data in a log file has been forwarded, so a restart should pick up where it left off.

Hope this has helped 🙂

View solution in original post

rturk
Builder

I have recently had to roll out a Distributed Deployment with HA & fault-tolerance considerations in mind, and this was one of the concerns that was raised. You can address this a number of ways.

Intermediary Heavy Forwarders - In the design that I settled on I essentially had two types of heavy forwarders.

  • Aggregating Heavy Forwarders - Placed in a POP or a network segment, they receive "local" data before forwarding cooked events on to the Load Balancing Heavy Forwarders. They not only accept data from Universal Forwarders, but have network ports (such as UDP514 for Syslog) opened for collection as well.
  • Load Balancing Heavy Forwarders - Essentially a gateway to your "farm" of indexers, with AutoLB configured to spread cooked events from the Aggregating Heavy Forwarders between indexers.

This approach was also attractive as it greatly simplified the network/firewall configuration required for the environment to facilitate indexing data from many environments. Also configuring "sufficient" buffering on all forwarders guarded against outages (your environment and data volumes will determine what's appropriate for you here).

Another benefit of this architecture was the ability to have intermediary deployment servers on the Aggregating Heavy Forwarders (this may not apply to your environment).

AutoLB to Multiple Forwarders - You can configure your Universal Forwarders to send data to one or many intermediary forwarders. With multiple forwarders as targets, the UF will detect (via heartbeats) when one of them has failed before redirecting data to the active forwarders until the failed forwarder returns.

If you're referring to the redundancy of the Universal Forwarder on the servers themselves, most OS'es support the ability to restart failed processes automatically, and Splunk keeps track of what data in a log file has been forwarded, so a restart should pick up where it left off.

Hope this has helped 🙂

melonman
Motivator

Thanks a lot R.Turk.
This helped a lot!, and also found this.
Simply share this for everyone else.

http://splunk-base.splunk.com/answers/39482/anycast-redundancy-with-syslog

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...