Getting Data In

Heavy forwarder stops forwarding when tcp syslog mirror destination fails?

brodieg
Engager

Hi,
I am successfully mirroring a filtered set of events at a heavy forwarder and sending them to a local TCP Syslog target (syslog-ng) and all other events on to the primary indexer on a different host (using [tcpout]).

When the local TCP Syslog endpoint is stopped, the primary indexer also stops receiving all events as well, even though it is healthy and unrelated to the tcp syslog endpoint. It appears as though the heavy forwarder doesn't like it when a configured TCP Syslog fails? It means my tcp syslog server has become a point of failure for all forwarding actions of the heavy forwarder? , or - more likely - I have got my config wrong 🙂

If the outage is short, (ie. the tcp syslog target is restored) all events appear in the primary indexer as if they were held back on the forwarder. Not sure what happens if there is an extended outage the the tcp syslog target though.

Any ideas on how I can make the forwarder a bit more resilient with TCP Syslog endpoint failure?

Thanks for any pointers!

GB

outputs.conf...

[tcpout]
defaultGroup = default-autolb-group

[tcpout:default-autolb-group]
disabled = 0
server = splunkindexer.network.internal:9997


[syslog]

[syslog:writetofiles]
server = 127.0.0.1:2514
type = tcp

transforms.conf..

[routeAll]
REGEX=.
DEST_KEY=_TCP_ROUTING
FORMAT=default-autolb-group

[syslogRouting]
REGEX=index.php\"
DEST_KEY=_SYSLOG_ROUTING
FORMAT=writetofiles

props.conf

[default]
TRANSFORMS-routing=routeAll,syslogRouting

[syslog]
TRANSFORMS-routing=routeAll,syslogRouting

splunkd.log

..this appears relevant from the forwarder:

07-15-2015 17:05:30.348 +1000 INFO  TailingProcessor - Could not send data to output queue (parsingQueue), retrying...
07-15-2015 17:10:55.765 +1000 WARN  TcpInputProc - Stopping all listening ports. Queues blocked for more than 300 seconds
0 Karma
1 Solution

alacercogitatus
SplunkTrust
SplunkTrust

The problem here is the TCP Connection. TCP connections require handshakes. If there are no handshakes, nothing can be sent. If nothing can be sent, the queues start to fill on the forwarder. If the queues start to fill, they will backlog all of the queues, including other tcpout settings. I had the same problem. The fix was to use UDP. This WILL cause data loss to your syslog-ng. BUT your primary Splunk instance will still have the data that is lost on the syslog out. It's a pretty simple config change.

[syslog:writetofiles]
 server = 127.0.0.1:2514
 type = udp

View solution in original post

alacercogitatus
SplunkTrust
SplunkTrust

The problem here is the TCP Connection. TCP connections require handshakes. If there are no handshakes, nothing can be sent. If nothing can be sent, the queues start to fill on the forwarder. If the queues start to fill, they will backlog all of the queues, including other tcpout settings. I had the same problem. The fix was to use UDP. This WILL cause data loss to your syslog-ng. BUT your primary Splunk instance will still have the data that is lost on the syslog out. It's a pretty simple config change.

[syslog:writetofiles]
 server = 127.0.0.1:2514
 type = udp

brodieg
Engager

Thanks! - Yes I had considered UDP as well, but the loss of data was something I was trying to avoid if possible.

I have done some testing with reversing the situation where TCP Syslog is working, but the downstream tcpout(indexer) host is not available - and similar behavior happens - i.e. TCP SYslog eventually stops . So this isn't a TCP Syslog only thing - its a case (as you say) of the forwarder not being able to successfully hand off events to a configured upstream target.

I have now have had a (better) read of the Admin manual regards forwarding and all the load-balancing options make a lot more sense now. - but you can't load-balance TCP syslog 😞

Thanks again for your insights!

GB

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...