Getting Data In

Forwarding of data dies

tkwaller
Builder

Have about 1000 UFs that not getting data that is searchable
They are throwing the error:
10-05-2016 14:54:05.162 +0000 INFO TailReader - Could not send data to output queue (parsingQueue), retrying...
10-05-2016 14:54:10.163 +0000 INFO TailReader - ...continuing.
10-05-2016 14:54:20.165 +0000 INFO TailReader - Could not send data to output queue (parsingQueue), retrying...
10-05-2016 14:54:25.166 +0000 INFO TailReader - ...continuing.

All hosts have unlimited thruput to our HWFs that also have unlimited thruput to the indexers. Our HWFs have dual pipelines so its not blocking there for sure. We have about 2800 UFs forwarding to 24 HWFs that forward to 28 or so Indexers.

Via the DMC I can see our queues are basically 0 so it shows no data backup.

Any idea what the issue could be?

Thanks for the thoughts!

1 Solution

guilmxm
SplunkTrust
SplunkTrust

Hi,

Nice deployment 😉

You should probably start by opening a case to Splunk support.
However some links that may be interesting for you:

https://answers.splunk.com/answers/5590/could-not-send-data-to-the-output-queue.html
http://splunkgeek.blogspot.co.uk/2015/05/could-not-send-data-to-output-queue.html
http://wiki.splunk.com/Community:HowIndexingWorks

Most probably you would need to investigate things that are running in the UF side, are there complex regex ? huge amount of files being monitored and so on.

If none of the queues on HFW/indexers have high usage of their queues, then the investigation shall focus on UFW and the job they're doing.

You said upgrading does not help, have you tried upgrading a group of UFs to 6.4.x for testing purposes ?

View solution in original post

0 Karma

guilmxm
SplunkTrust
SplunkTrust

Hi,

Nice deployment 😉

You should probably start by opening a case to Splunk support.
However some links that may be interesting for you:

https://answers.splunk.com/answers/5590/could-not-send-data-to-the-output-queue.html
http://splunkgeek.blogspot.co.uk/2015/05/could-not-send-data-to-output-queue.html
http://wiki.splunk.com/Community:HowIndexingWorks

Most probably you would need to investigate things that are running in the UF side, are there complex regex ? huge amount of files being monitored and so on.

If none of the queues on HFW/indexers have high usage of their queues, then the investigation shall focus on UFW and the job they're doing.

You said upgrading does not help, have you tried upgrading a group of UFs to 6.4.x for testing purposes ?

View solution in original post

0 Karma

tkwaller
Builder

The root cause of this was a HWF that also runs as a syslog collector. For some reason the HWF becomes too busy and stops.
It is still undetermined HOW this 1 HWF could stop the entire flow of data through the entire environment though, as there are many others that should have taken over.
For now the issue is fixed

0 Karma

daniel333
Builder

Hey,

I work on the same team as @Tkwaller. Restarting the Universal Forwarder often makes the problem go away for days but eventually resurfaces.

0 Karma

tkwaller
Builder

Also just FYI the UFs are are running Linux on Splunk 6.3.3 but upgrading has no affect on this issue
All other servers including Splunk HWFs and Indexers are 6.4.3

0 Karma

tkwaller
Builder

This has intermittently been ongoing for MONTHS since installing 6.4.1 on Splunk Admin servers, search heads, HWFs and indexers.

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!