Re: Why do my forwarders have inconsistent connect...

ddrillic · ‎10-04-2018

We have this case for multiple servers where we see constant errors such as -

10-04-2018 13:25:55.480 -0500 INFO  TcpOutputProc - Removing quarantine from idx=<indexer1>:9997
10-04-2018 13:25:55.480 -0500 INFO  TcpOutputProc - Removing quarantine from idx=<indexer2>:9997
10-04-2018 13:25:55.480 -0500 INFO  TcpOutputProc - Removing quarantine from idx=<indexer3>:9997
10-04-2018 13:25:55.480 -0500 INFO  TcpOutputProc - Removing quarantine from idx=<indexer4>:9997
10-04-2018 13:25:55.489 -0500 ERROR TcpOutputFd - Read error. Connection reset by peer
10-04-2018 13:25:55.491 -0500 ERROR TcpOutputFd - Read error. Connection reset by peer
10-04-2018 13:25:55.493 -0500 ERROR TcpOutputFd - Read error. Connection reset by peer
10-04-2018 13:25:55.501 -0500 ERROR TcpOutputFd - Read error. Connection reset by peer
10-04-2018 13:25:55.509 -0500 ERROR TcpOutputFd - Read error. Connection reset by peer

telnet is fine -

telnet <indexer1> 9997
telnet <indexer2> 9997
telnet <indexer3> 9997
telnet <indexer4> 9997

We see lots of data for -

index=_internal host=<host name>

And nothing -

index=<customer index> host=<host name>

harsmarvania57 · ‎10-05-2018

Hi @ddrillic,

As you mentioned that connectivity looks good from UF to Indexer then I'll first check congestion on Indexer or Indexer Resource Usage (Might too busy - Resource constraint) to start troubleshooting.

ddrillic · ‎10-07-2018

Makes sense @harsmarvania57, but all other data sources look fine.

When searching for a specific path - index=_internal host=<host name> <path>

I see -

10-07-2018 14:04:24.766 -0500 INFO  Metrics - group=per_source_thruput, series="<path to log file>", kbps=0.05002527423023678, eps=0.419355447451456, kb=1.55078125, ev=13, avg_age=0.6153846153846154, max_age=3

What does it mean?

harsmarvania57 · ‎10-08-2018

This is metrics.log entry which indicates per_source_thruput from UF to Indexer for each and every source which you are monitoring. For detailed explanation of different fields in above log message please check https://docs.splunk.com/Documentation/Splunk/7.1.2/Troubleshooting/Aboutmetricslog#Thruput_messages

ddrillic · ‎10-12-2018

We do see the following error -

-- 10-12-2018 14:36:41.029 -0500 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group indexers has been blocked for 410303 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

harsmarvania57 · ‎10-15-2018

This means that there might be possibility that different queues on Indexers are blocking for various reason, you need to check Monitoring Console for blocking queues status.

Why do my forwarders have inconsistent connectivity to the indexers?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!