Solved: Forwarder Redundancy

melonman · ‎07-06-2012

Hi,

I want to know the best practice and patterns that makes Forwarders highly available and redundant.
- SH pooling for Search Head redundancy,
- Indexer's Index&Forward (Replication) for Indexer redundandy,
- AutoLB provides HA for connection between Forwarders and indexers.

However, Forwarder itself looks Single Point of Failure.

How do people configure forwarders to eliminate Single Point of Failure?

Thanks,

rturk · ‎07-06-2012

I have recently had to roll out a Distributed Deployment with HA & fault-tolerance considerations in mind, and this was one of the concerns that was raised. You can address this a number of ways.

Intermediary Heavy Forwarders - In the design that I settled on I essentially had two types of heavy forwarders.

Aggregating Heavy Forwarders - Placed in a POP or a network segment, they receive "local" data before forwarding cooked events on to the Load Balancing Heavy Forwarders. They not only accept data from Universal Forwarders, but have network ports (such as UDP514 for Syslog) opened for collection as well.
Load Balancing Heavy Forwarders - Essentially a gateway to your "farm" of indexers, with AutoLB configured to spread cooked events from the Aggregating Heavy Forwarders between indexers.

This approach was also attractive as it greatly simplified the network/firewall configuration required for the environment to facilitate indexing data from many environments. Also configuring "sufficient" buffering on all forwarders guarded against outages (your environment and data volumes will determine what's appropriate for you here).

Another benefit of this architecture was the ability to have intermediary deployment servers on the Aggregating Heavy Forwarders (this may not apply to your environment).

AutoLB to Multiple Forwarders - You can configure your Universal Forwarders to send data to one or many intermediary forwarders. With multiple forwarders as targets, the UF will detect (via heartbeats) when one of them has failed before redirecting data to the active forwarders until the failed forwarder returns.

If you're referring to the redundancy of the Universal Forwarder on the servers themselves, most OS'es support the ability to restart failed processes automatically, and Splunk keeps track of what data in a log file has been forwarded, so a restart should pick up where it left off.

Hope this has helped 🙂

View solution in original post

rturk · ‎07-06-2012

I have recently had to roll out a Distributed Deployment with HA & fault-tolerance considerations in mind, and this was one of the concerns that was raised. You can address this a number of ways.

Intermediary Heavy Forwarders - In the design that I settled on I essentially had two types of heavy forwarders.

Aggregating Heavy Forwarders - Placed in a POP or a network segment, they receive "local" data before forwarding cooked events on to the Load Balancing Heavy Forwarders. They not only accept data from Universal Forwarders, but have network ports (such as UDP514 for Syslog) opened for collection as well.
Load Balancing Heavy Forwarders - Essentially a gateway to your "farm" of indexers, with AutoLB configured to spread cooked events from the Aggregating Heavy Forwarders between indexers.

This approach was also attractive as it greatly simplified the network/firewall configuration required for the environment to facilitate indexing data from many environments. Also configuring "sufficient" buffering on all forwarders guarded against outages (your environment and data volumes will determine what's appropriate for you here).

Another benefit of this architecture was the ability to have intermediary deployment servers on the Aggregating Heavy Forwarders (this may not apply to your environment).

AutoLB to Multiple Forwarders - You can configure your Universal Forwarders to send data to one or many intermediary forwarders. With multiple forwarders as targets, the UF will detect (via heartbeats) when one of them has failed before redirecting data to the active forwarders until the failed forwarder returns.

If you're referring to the redundancy of the Universal Forwarder on the servers themselves, most OS'es support the ability to restart failed processes automatically, and Splunk keeps track of what data in a log file has been forwarded, so a restart should pick up where it left off.

Hope this has helped 🙂

melonman · ‎07-09-2012

Thanks a lot R.Turk.
This helped a lot!, and also found this.
Simply share this for everyone else.

http://splunk-base.splunk.com/answers/39482/anycast-redundancy-with-syslog

Forwarder Redundancy

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Casting Call: Compete in Cyber Games

Announcing Modern Navigation: A New Era of Splunk User Experience

How Edge Processor's Durable Queue Works

Join the Conversation