Hi,
I am successfully mirroring a filtered set of events at a heavy forwarder and sending them to a local TCP Syslog target (syslog-ng) and all other events on to the primary indexer on a different host (using [tcpout]).
When the local TCP Syslog endpoint is stopped, the primary indexer also stops receiving all events as well, even though it is healthy and unrelated to the tcp syslog endpoint. It appears as though the heavy forwarder doesn't like it when a configured TCP Syslog fails? It means my tcp syslog server has become a point of failure for all forwarding actions of the heavy forwarder? , or - more likely - I have got my config wrong 🙂
If the outage is short, (ie. the tcp syslog target is restored) all events appear in the primary indexer as if they were held back on the forwarder. Not sure what happens if there is an extended outage the the tcp syslog target though.
Any ideas on how I can make the forwarder a bit more resilient with TCP Syslog endpoint failure?
Thanks for any pointers!
GB
outputs.conf...
[tcpout]
defaultGroup = default-autolb-group
[tcpout:default-autolb-group]
disabled = 0
server = splunkindexer.network.internal:9997
[syslog]
[syslog:writetofiles]
server = 127.0.0.1:2514
type = tcp
transforms.conf..
[routeAll]
REGEX=.
DEST_KEY=_TCP_ROUTING
FORMAT=default-autolb-group
[syslogRouting]
REGEX=index.php\"
DEST_KEY=_SYSLOG_ROUTING
FORMAT=writetofiles
props.conf
[default]
TRANSFORMS-routing=routeAll,syslogRouting
[syslog]
TRANSFORMS-routing=routeAll,syslogRouting
splunkd.log
..this appears relevant from the forwarder:
07-15-2015 17:05:30.348 +1000 INFO TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
07-15-2015 17:10:55.765 +1000 WARN TcpInputProc - Stopping all listening ports. Queues blocked for more than 300 seconds
The problem here is the TCP Connection. TCP connections require handshakes. If there are no handshakes, nothing can be sent. If nothing can be sent, the queues start to fill on the forwarder. If the queues start to fill, they will backlog all of the queues, including other tcpout settings. I had the same problem. The fix was to use UDP. This WILL cause data loss to your syslog-ng. BUT your primary Splunk instance will still have the data that is lost on the syslog out. It's a pretty simple config change.
[syslog:writetofiles]
server = 127.0.0.1:2514
type = udp
The problem here is the TCP Connection. TCP connections require handshakes. If there are no handshakes, nothing can be sent. If nothing can be sent, the queues start to fill on the forwarder. If the queues start to fill, they will backlog all of the queues, including other tcpout settings. I had the same problem. The fix was to use UDP. This WILL cause data loss to your syslog-ng. BUT your primary Splunk instance will still have the data that is lost on the syslog out. It's a pretty simple config change.
[syslog:writetofiles]
server = 127.0.0.1:2514
type = udp
Thanks! - Yes I had considered UDP as well, but the loss of data was something I was trying to avoid if possible.
I have done some testing with reversing the situation where TCP Syslog is working, but the downstream tcpout(indexer) host is not available - and similar behavior happens - i.e. TCP SYslog eventually stops . So this isn't a TCP Syslog only thing - its a case (as you say) of the forwarder not being able to successfully hand off events to a configured upstream target.
I have now have had a (better) read of the Admin manual regards forwarding and all the load-balancing options make a lot more sense now. - but you can't load-balance TCP syslog 😞
Thanks again for your insights!
GB