Currently we have a standalone Splunk instance. All of the data that is indexed comes as UDP data over a data diode.
We want to move to one search head and three indexers. In order to have load balancing on the indexers, we plan to have all of the data from the data diode go to a server with a heavy forwarder installed. The heavy forwarder will balance the load to the Splunk indexers.
The heavy forwarder and its server become a single point of failure.
Questions:
Is there a better way to set this up with a data diode?
How do we size the server with the heavy forwarder?
How can we "harden" the heavy forwarder to avoid down time?
Assuming for the context of this question, that the data diodes are OSI layer 2 based, and support UDP, there are a few solutions here.. ranking these based on preference of architectural availability and support from Splunk PS and Architecture..
1) VIP + syslog servers collecting UDP inputs + UF's doing file monitoring and forwarding to indexers
2) UDP syslog receiver + UF doing file monitoring and forwarding to indexers
3) UDP Direct into Splunk + forwarding to indexers
So stepping back, lets ask the question of do you have the wallet for HA. If so, option 1 is the best choice here. And the reason behind saying this goes as follows: First and foremost, Splunk + UDP direct inputs is not a valid architecture for environments where HA / zero data loss is a requirement. Reason being here, what happens when you restart Splunk, or service locks up / hangs etc. Adding a VIP in front of these reduces this, but also increases management timing and tasking for managing the forwarders..
Most environments that have these requirements use syslog-ng or rsyslog. Mainly because its is task build to take syslog, and they do it well at scale (not to mention that they are able to do it with filtering and custom ingest rules.. Splunk is basically one port, one sourcetype [ or heavy transforms, cpu trade off]) and in this case, use the forwarder just to read the files and use logrotate to delete them after N days..
The other two options are variations on 1, but stepping down the additional components.
Recommend using a UF over a HF, unless you are required to mask / filter data before its indexed.
I appreciate all of your feedback. I will be studying your posts carefully. Thank you.
To explain more about what we are doing, it is not really syslog. We have an OPC server. The data we are interested in is OPC DA. We send that data into Kepware which converts the data to "tcp" (really udp?). Next, the data goes over the data diode to Splunk.
The reason I put tcp in quotes is that Kepware calls the converted data tcp. But when the data goes over the diode it has to be udp.
You could consider using a virtual IP address (VIP) managed by a high availability mechanism, such as pacemaker on Linux. I have used this with success on CentOS for syslog receivers. It allowed us to run two syslog servers side-by-side, with only one having the VIP active at a time. This way if one syslog server goes down, the other takes over. The syslog server was also running a heavy forwarder to send events to the indexers.
I am assuming the data your forwarding is syslog, since you mentioned UDP?
The reality is that Syslog is the real problem in terms of HA.
Most syslog implementations don't support ant kind of HA or load balancing - it tends to be a single destination address.
If your worried about faults at the HF layer, you can always add more HFs.
If you have something which can load balance udp then all you have really achieved is to move the SPOF to the load balancer (although a HA-LB may be mitigation enough for you).
The 'Splunk Way' of achieving better durability would be to use universal forwarders on your source servers, which can load balance events onto a HF pool. The HF's then load balance events into an index cluster. If implemented properly, that setup should be more or less bullet proof, however - I am not sure on compatibility with data diode.
With regards to sizing, Splunk provides the following guidance on forwarding/indexing rations and hardware:
http://docs.splunk.com/Documentation/Splunk/7.0.1/Capacity/Forwarder-to-indexerratios
For a detailed steer on Splunk HA architectures, take a look at the following presentation from conf16:
https://conf.splunk.com/files/2016/slides/architecting-splunk-for-high-availability-and-disaster-rec...