Splunk Enterprise

Why has the Forwarding stopped suddenly for few of the forwarders?

yj055
Loves-to-Learn Lots

We have a distributed architecture 

Search head cluster with 6 hosts across 3 data centres

Index cluster with 6 index peers and 1 index master 

Forwarders on all servers in environment  - web tier, app tier, load balancer tier

Few months back , web tier stopped sending - log stopped coming to splunk ; but other tiers are are working 

When checked the activity on web-tier , there was a patching happened and splunkd was restarted -after that forwarding stopped in web-tier 

But splunkd process came up fine - still running in those 

And observed below WARN messages started coming exactly same time 

[ See the highlighted in red starting from 10 seconds it grows ]

WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group index_peers has been blocked for 10 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

----------

------

+0000 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group index-peers has been blocked for 9725460 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
+0000 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group index-peers has been blocked for 9725470 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

=============================================================================

 Why we picked this WARN message may be cause - as same happened in other tier recently 

load lancer tier stopped stopped forwarding recently. Above WARN started showing  same time onwards - starting with  "blocked for 10 seconds  "  

  • splunk forwarder is running fine in all these 
  • App tier still working -sending data , so indexers are fine 
  • not disk space or memory issue in any of these 
  • No config changes done any where ( inputs or outputs conf or any file that matter) -its same , just that stopped working suddenly 

 

What could have caused this sudden stopping of forwarding ?

Splunk Enterprise

Version:7.2.1Build:be11b2c46e23

 

 

Labels (1)
0 Karma

Stefanie
Builder

This sounds more of a network connectivity problem...

Are you able to test port 9997 on the affected forwarders to ensure it's open?

Could it be that these affected servers are on a different VLAN?

Are the indexers listed in your outputs.conf listed as FQDNs or IPs? Can you run the nslookup command on those indexers from your forwarders?

 

0 Karma

yj055
Loves-to-Learn Lots

We couldn't observe any connectivity issue as such. As I mentioned it's only few forwarders stopped working .Rest of the forwarders still sending data to same set of index clusters

And the port we use is 8089 , which is open 

splunkuser@fwdernode system]$ telnet idxnode1.iuser.iroot.adidom.com 8089
Trying xx.yy.zz.aa...
Connected to idxnode1.iuser.iroot.adidom.com.
Escape character is '^]'.


outputs.conf

[tcpout]
defaultGroup = index_peers

 

Indexers listed as index_peers

server.conf on each indexer node 1

[general]
serverName = node1_idx01


server.conf on each indexer node 2

[general]
serverName = node2_idx01

0 Karma

Stefanie
Builder

Thank you that's helpful.

Can you read splunkd.log while you restart the splunkforwarder service on the affected machines. What does it say before it starts saying "Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group index-peers has been blocked"?

0 Karma

yj055
Loves-to-Learn Lots

I had restarted splunkd in one of the affected forwarders .Here  is the log flow of splunkd.log  

splunkd.log 

Just before shutdown 

03-28-2022 11:24:23.074 +0100 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group index_peers has been blocked for 9989240 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
03-28-2022 11:24:33.085 +0100 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group index_peers has been blocked for 9989250 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
03-28-2022 11:24:37.798 +0100 INFO PipelineComponent - Performing early shutdown tasks
03-28-2022 11:24:37.798 +0100 INFO IndexProcessor - handleSignal : Disabling streaming searches.
03-28-2022 11:24:37.798 +0100 INFO IndexProcessor - request state change from=RUN to=SHUTDOWN_SIGNALED
03-28-2022 11:24:37.798 +0100 INFO loader - Shutdown HTTPDispatchThread
03-28-2022 11:24:37.798 +0100 INFO ShutdownHandler - Shutting down splunkd

starting splunkd 

03-28-2022 11:25:40.952 +0100 INFO loader - Splunkd starting (build be11b2c46e23).

03-28-2022 11:25:46.164 +0100 INFO TcpOutputProc - Will resolve indexer names at 450.000 second interval.
03-28-2022 11:25:46.589 +0100 WARN TailReader - Could not send data to output queue (parsingQueue), retrying...
03-28-2022 11:25:46.614 +0100 WARN TailReader - Could not send data to output queue (parsingQueue), retrying...

03-28-2022 11:26:11.164 +0100 INFO TcpOutputProc - Initialization time for indexer discovery service for default group=index_peers has been completed.
03-28-2022 11:26:11.212 +0100 INFO ScheduledViewsReaper - Scheduled views reaper run complete. Reaped count=0 scheduled views
03-28-2022 11:26:11.215 +0100 INFO TailReader - Continuing...
03-28-2022 11:26:11.215 +0100 INFO TailReader - ...continuing.

Again " TcpOutputProc" start appearing - reset with 10 seconds 

03-28-2022 11:26:11.411 +0100 WARN TcpOutputProc - Applying quarantine to ip=xx.yy.zz.aa  port=xxxx _numberOfFailures=2
03-28-2022 11:26:21.036 +0100 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group index_peers has been blocked for 10 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
03-28-2022 11:26:21.218 +0100 WARN TailReader - Could not send data to output queue (parsingQueue), retrying...
03-28-2022 11:26:21.218 +0100 WARN TailReader - Could not send data to output queue (parsingQueue), retrying...
03-28-2022 11:26:31.047 +0100 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group index_peers has been blocked for 20 seconds. This will probably stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.

 

 

0 Karma

Stefanie
Builder

This has me stumped. 🤔


@yj055 wrote:

Again " TcpOutputProc" start appearing - reset with 10 seconds 

03-28-2022 11:26:11.411 +0100 WARN TcpOutputProc - Applying quarantine to ip=xx.yy.zz.aa  port=xxxx _numberOfFailures=2


 

Was this one of the correct IP/port combinations for your indexer or are you using indexer discovery?

ip=xx.yy.zz.aa port=xxxx 

 

 

Maybe you can try to completely uninstall the forwarder and reinstall.

 

0 Karma

yj055
Loves-to-Learn Lots

Yes , they are the correct IP and port of  the indexer

the log message are  there for all the index peers  with correct receiver port (means have multiple occurrences of below log for each of the indexer ip and port ) 

03-28-2022 11:26:11.411 +0100 WARN TcpOutputProc - Applying quarantine to ip=xx.yy.zz.aa  port=xxxx _numberOfFailures=2

 

As I had mentioned initial post , the entire set web tier forwarders stopped working few months back 

Load balancer tier forwarder was working then until last week . LB forwarders stopped , splunkd log started showing same WARN messages TcpOutputProc"  :  tcpout Processor: The TCP output processor has paused the data flow

I see the same messages in the console health notification as well  as  " TCPOutAutoLB-0"  warning

we would like to know what had caused all these  production forwarders to stop suddenly before we decide and plan for fresh install 

Like more may stop , as of now application tier forwarders are still working - those source types still giving data 

 

 

0 Karma
Get Updates on the Splunk Community!

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?

Register Join this Tech Talk to learn how unique features like Service Centric Views, Tag Spotlight, and ...