Solved: Forwarding of data dies

tkwaller · ‎10-05-2016

Have about 1000 UFs that not getting data that is searchable
They are throwing the error:
10-05-2016 14:54:05.162 +0000 INFO TailReader - Could not send data to output queue (parsingQueue), retrying...
10-05-2016 14:54:10.163 +0000 INFO TailReader - ...continuing.
10-05-2016 14:54:20.165 +0000 INFO TailReader - Could not send data to output queue (parsingQueue), retrying...
10-05-2016 14:54:25.166 +0000 INFO TailReader - ...continuing.

All hosts have unlimited thruput to our HWFs that also have unlimited thruput to the indexers. Our HWFs have dual pipelines so its not blocking there for sure. We have about 2800 UFs forwarding to 24 HWFs that forward to 28 or so Indexers.

Via the DMC I can see our queues are basically 0 so it shows no data backup.

Any idea what the issue could be?

Thanks for the thoughts!

guilmxm · ‎10-05-2016

Hi,

Nice deployment 😉

You should probably start by opening a case to Splunk support.
However some links that may be interesting for you:

https://answers.splunk.com/answers/5590/could-not-send-data-to-the-output-queue.html
http://splunkgeek.blogspot.co.uk/2015/05/could-not-send-data-to-output-queue.html
http://wiki.splunk.com/Community:HowIndexingWorks

Most probably you would need to investigate things that are running in the UF side, are there complex regex ? huge amount of files being monitored and so on.

If none of the queues on HFW/indexers have high usage of their queues, then the investigation shall focus on UFW and the job they're doing.

You said upgrading does not help, have you tried upgrading a group of UFs to 6.4.x for testing purposes ?

View solution in original post

guilmxm · ‎10-05-2016

Hi,

Nice deployment 😉

You should probably start by opening a case to Splunk support.
However some links that may be interesting for you:

https://answers.splunk.com/answers/5590/could-not-send-data-to-the-output-queue.html
http://splunkgeek.blogspot.co.uk/2015/05/could-not-send-data-to-output-queue.html
http://wiki.splunk.com/Community:HowIndexingWorks

Most probably you would need to investigate things that are running in the UF side, are there complex regex ? huge amount of files being monitored and so on.

If none of the queues on HFW/indexers have high usage of their queues, then the investigation shall focus on UFW and the job they're doing.

You said upgrading does not help, have you tried upgrading a group of UFs to 6.4.x for testing purposes ?

tkwaller · ‎10-07-2016

The root cause of this was a HWF that also runs as a syslog collector. For some reason the HWF becomes too busy and stops.
It is still undetermined HOW this 1 HWF could stop the entire flow of data through the entire environment though, as there are many others that should have taken over.
For now the issue is fixed

daniel333 · ‎10-05-2016

Hey,

I work on the same team as @Tkwaller. Restarting the Universal Forwarder often makes the problem go away for days but eventually resurfaces.

tkwaller · ‎10-05-2016

Also just FYI the UFs are are running Linux on Splunk 6.3.3 but upgrading has no affect on this issue
All other servers including Splunk HWFs and Indexers are 6.4.3

tkwaller · ‎10-05-2016

This has intermittently been ongoing for MONTHS since installing 6.4.1 on Splunk Admin servers, search heads, HWFs and indexers.

Forwarding of data dies

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Announcing Modern Navigation: A New Era of Splunk User Experience

Think Like an Architect: Introducing the Splunk Certified Cybersecurity Defense ...

Best Practices: Splunk auto adjust pipeline queue

Join the Conversation