Getting Data In

Why does universal forwarder stop sending data?

JuanAntunes
Explorer

Hello!

I have an environment with about 200 machines, all Windows Servers. All servers are sending TCP information through port 9997 directly to my Heavy Forwarder, all information is allocated in the "Windows" index 

 

What happens is that about 1-2x a day, the logs sent by Universal Forwarders stop from all machines leaving the Windows index blank. All other data that do not arrive through TCP 9997 are normal, such as some scripts that bring other types of information and save in other indexes.

The problem is only solved when Splunk is restarted in Heavy Forwarder


Trying to diagnose the problem, the only thing I could find is this message on all servers with Universal Forwarder installed


02-16-2022 15:20:51.293 -0400 WARN TcpOutputProc - Tcpout Processor: The TCP output processor has paused the data flow. Forwarding to output group default-autolb-group has been blocked for 82200 seconds

Has anyone gone through something similar, or can help me try to identify what is happening?
Remembering that the Log in Heavy Forwader, doesn't bring me anything relevant

Thanks in advance!

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

you obviously have blocked queues at least on HF side maybe even idx side too. Easy way to look what it situation on HF side is add it as an indexer with e.g. IHF custom group defined into MC. Then you can easily look what are happening on those queues and pipelines on that (and another nodes). If you haven't MC on place yet, then I strongly recommend to set it up.

Here are two excellent conf presentation how to look the situation even without MC.

r. Ismo

0 Karma

somesoni2
Revered Legend

Use DMC to see what's going on with HF. UF logs suggest that HF (as defined in outputs.conf for stanza default-autolb-group) is down/unavailable causing data ingestion to stop. Use "Indexing Performance" dashboards in DMC to see if any queues are getting filled up. 

0 Karma

SanjayReddy
SplunkTrust
SplunkTrust

Hi  @JuanAntunes 

Couple reasons for this issue

Please check if any queues are filling on the UF side , due to some sources sending too much data at once.

and any network issue between UF and HF , check in splunkd.log for timeout issues and check from the HF side as well. 

also in splunkd.log check for any ERROR or WARN error 

when we faced same issue, it turnout to be intermittent networks issues caused,

in your case it might be same issue or new one 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @JuanAntunes,

some additional information:

  • have you used the correct reference hardware for your HF?
  • which other jobs are scheduled in your Heavy Forwarder?
  • are you sure that, when forwardring stops, there isn't any job that usues the available bandwidth?

it seems that sometimes, when a scheduled job starts, your forwarding stops.

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...