Forwarding events - Splunk stopps working when endpoint is not available

nicocin — Tue, 17 Jan 2017 12:50:04 GMT

Hello

We forward events using the outputs.conf on the indexers:

outputs.conf

[tcpout] 
defaultGroup = default 
disabled = 0 
indexAndForward = 1 

[tcpout-server://x.x.x.x:601]
[tcpout:default] 
disabled = 0 
server = x.x.x.x:601
sendCookedData = false

When the endpoint (x.x.x.x) is not available splunk stopps all listeners:

12-21-2016 12:02:45.982 +0100 WARN  TcpOutputProc - Write operation timed out for 10.128.16.36:601 in 300 seconds.
12-21-2016 12:02:45.982 +0100 INFO  TcpOutputProc - Connected to idx=10.128.16.36:601
12-21-2016 12:02:45.982 +0100 WARN  TcpOutputProc - Forwarding to indexer group default blocked for 300 seconds.
12-21-2016 12:02:56.871 +0100 INFO  TcpInputProc - Stopping IPv4 port 514
12-21-2016 12:02:56.871 +0100 INFO  TcpInputProc - Stopping IPv4 port 1514
12-21-2016 12:02:56.871 +0100 INFO  TcpInputProc - Stopping IPv4 port 2514
12-21-2016 12:02:56.871 +0100 INFO  TcpInputProc - Stopping IPv4 port 4514
12-21-2016 12:02:56.871 +0100 INFO  TcpInputProc - Stopping IPv4 port 5514
12-21-2016 12:02:56.871 +0100 INFO  TcpInputProc - Stopping IPv4 port 3514
12-21-2016 12:02:56.871 +0100 INFO  TcpInputProc - Stopping IPv4 port 9997
12-21-2016 12:02:56.871 +0100 WARN  TcpInputProc - Stopping all listening ports. Queues blocked for more than 300 seconds
12-21-2016 12:03:11.816 +0100 WARN  TcpOutputProc - Forwarding to indexer group default blocked for 400 seconds.

I don't know why this happens. Any hints?

Thanks & Regards
Nicolas

Re: Forwarding events - Splunk stopps working when endpoint is not available

jwelch_splunk — Tue, 17 Jan 2017 13:30:31 GMT

If you take a look at this diagram:

https://wiki.splunk.com/Community:HowIndexingWorks

"3. Detail Diagram - Standalone Splunk"

You can see how the flow works. When Splunk is unable to communicate with a 3rd party system the queue fills up, which causes us to start filling up in the pipeline going backwards to the point where eventually we no longer accept any new data, as we can't do anything with the data we have.

2 of the main reasons why indexers stop indexing is because of:
1. We have an indexing loop. E.G. an outputs that is configured to forward data to another indexer, and back
2. Forwarding of data configured to send to a 3rd party and there is an issue with the upstream system.

You could play around with these settings out of outputs.conf, although these are just ideas and may not solve your issue / use case depending on how important it is that you get the data at the 3rd party:

dropEventsOnQueueFull =
* If set to a positive number, wait seconds before throwing out
all new events until the output queue has space.
* Setting this to -1 or 0 will cause the output queue to block when it gets
full, causing further blocking up the processing chain.
* If any target group's queue is blocked, no more data will reach any other
target group.
* Using auto load-balancing is the best way to minimize this condition,
because, in that case, multiple receivers must be down (or jammed up)
before queue blocking can occur.
* Defaults to -1 (do not drop events).
* DO NOT SET THIS VALUE TO A POSITIVE INTEGER IF YOU ARE MONITORING FILES!

dropClonedEventsOnQueueFull =
* If set to a positive number, do not block completely, but wait up to
seconds to queue events to a group. If it cannot enqueue to a
group for more than seconds, begin dropping events for the
group. It makes sure that at least one group in the cloning configuration
will get events. It blocks if event cannot be delivered to any of the
cloned groups.
* If set to -1, the TcpOutputProcessor will make sure that each group will
get all of the events. If one of the groups is down, then Splunk will
block everything.
* Defaults to 5.

Others might recommend to increase the queue size, but if the 3rd party is down for an extended period of time the same issue will occur.

You best bet is to figure out why the thrid party system is not taking data and fix that so it is more reliable.

topic Re: Forwarding events - Splunk stopps working when endpoint is not available in Getting Data In

Forwarding events - Splunk stopps working when endpoint is not available

Re: Forwarding events - Splunk stopps working when endpoint is not available