Getting Data In

Why do my forwarders have inconsistent connectivity to the indexers?

ddrillic
Ultra Champion

We have this case for multiple servers where we see constant errors such as -

10-04-2018 13:25:55.480 -0500 INFO  TcpOutputProc - Removing quarantine from idx=<indexer1>:9997
10-04-2018 13:25:55.480 -0500 INFO  TcpOutputProc - Removing quarantine from idx=<indexer2>:9997
10-04-2018 13:25:55.480 -0500 INFO  TcpOutputProc - Removing quarantine from idx=<indexer3>:9997
10-04-2018 13:25:55.480 -0500 INFO  TcpOutputProc - Removing quarantine from idx=<indexer4>:9997
10-04-2018 13:25:55.489 -0500 ERROR TcpOutputFd - Read error. Connection reset by peer
10-04-2018 13:25:55.491 -0500 ERROR TcpOutputFd - Read error. Connection reset by peer
10-04-2018 13:25:55.493 -0500 ERROR TcpOutputFd - Read error. Connection reset by peer
10-04-2018 13:25:55.501 -0500 ERROR TcpOutputFd - Read error. Connection reset by peer
10-04-2018 13:25:55.509 -0500 ERROR TcpOutputFd - Read error. Connection reset by peer

telnet is fine -

telnet <indexer1> 9997
telnet <indexer2> 9997
telnet <indexer3> 9997
telnet <indexer4> 9997

We see lots of data for -

index=_internal host=<host name>

And nothing -

index=<customer index> host=<host name>
0 Karma

harsmarvania57
Ultra Champion

Hi @ddrillic,

As you mentioned that connectivity looks good from UF to Indexer then I'll first check congestion on Indexer or Indexer Resource Usage (Might too busy - Resource constraint) to start troubleshooting.

0 Karma

ddrillic
Ultra Champion

Makes sense @harsmarvania57, but all other data sources look fine.

When searching for a specific path - index=_internal host=<host name> <path>

I see -

10-07-2018 14:04:24.766 -0500 INFO  Metrics - group=per_source_thruput, series="<path to log file>", kbps=0.05002527423023678, eps=0.419355447451456, kb=1.55078125, ev=13, avg_age=0.6153846153846154, max_age=3

What does it mean?

0 Karma

harsmarvania57
Ultra Champion

This is metrics.log entry which indicates per_source_thruput from UF to Indexer for each and every source which you are monitoring. For detailed explanation of different fields in above log message please check https://docs.splunk.com/Documentation/Splunk/7.1.2/Troubleshooting/Aboutmetricslog#Thruput_messages

ddrillic
Ultra Champion

We do see the following error -

-- 10-12-2018 14:36:41.029 -0500 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group indexers has been blocked for 410303 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

0 Karma

harsmarvania57
Ultra Champion

This means that there might be possibility that different queues on Indexers are blocking for various reason, you need to check Monitoring Console for blocking queues status.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...