Hello there. I am seeing hte subject message in my HEC HWF servers. We are using index discovery and the following is my outputs.conf file:
[indexAndForward]
index = false
[tcpout]
defaultGroup = group1
forwardedindex.filter.disable = true
indexAndForward = false
[indexer_discovery:index_cluster]
pass4SymmKey = $1$C15X23+M+dxTVmLJ/AE=
master_uri = https://10.26.20.8:8089
[tcpout:group1]
autoLBFrequency = 30
forceTimebasedAutoLB = true
indexerDiscovery = index_cluster
useACK = true
The full context of the messges are:
12-07-2017 02:13:30.017 +0000 INFO IntrospectionGenerator:resource_usage - RU_main - I-data gathering (Resource Usage) starting; period=10s
12-07-2017 02:13:30.018 +0000 INFO IntrospectionGenerator:resource_usage - RU_main - I-data gathering (IO Statistics) starting; interval=60s
12-07-2017 02:13:31.465 +0000 INFO TcpOutputProc - Initializing connection for non-ssl forwarding to 10.26.200.99:9997
12-07-2017 02:13:31.465 +0000 INFO TcpOutputProc - Initializing connection for non-ssl forwarding to 10.26.200.187:9997
12-07-2017 02:13:31.465 +0000 INFO TcpOutputProc - Initializing connection for non-ssl forwarding to 10.26.201.36:9997
12-07-2017 02:13:31.465 +0000 INFO TcpOutputProc - Initializing connection for non-ssl forwarding to 10.26.200.73:9997
12-07-2017 02:13:31.465 +0000 INFO TcpOutputProc - Initializing connection for non-ssl forwarding to 10.26.201.72:9997
12-07-2017 02:13:31.465 +0000 INFO TcpOutputProc - Initializing connection for non-ssl forwarding to 10.26.200.155:9997
12-07-2017 02:13:31.465 +0000 INFO TcpOutputProc - Will resolve indexer names at 450 second interval.
12-07-2017 02:13:38.383 +0000 INFO DC:DeploymentClient - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
12-07-2017 02:13:48.024 +0000 INFO TailReader - Could not send data to output queue (parsingQueue), retrying...
12-07-2017 02:13:50.383 +0000 INFO DC:DeploymentClient - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
12-07-2017 02:13:56.310 +0000 INFO TcpOutputProc - Initialization time for indexer discovery service for default group=group1 has been completed.
12-07-2017 02:13:56.313 +0000 INFO TcpOutputProc - Connected to idx=10.26.200.73:9997 using ACK.
12-07-2017 02:13:56.962 +0000 INFO TailReader - ...continuing.
12-07-2017 02:13:57.763 +0000 INFO KeyManagerLocalhost - Checking for localhost key pair
12-07-2017 02:13:57.764 +0000 INFO KeyManagerLocalhost - Public key already exists: /opt/splunk/etc/auth/distServerKeys/trusted.pem
12-07-2017 02:13:57.764 +0000 INFO KeyManagerLocalhost - Reading public key for localhost: /opt/splunk/etc/auth/distServerKeys/trusted.pem
12-07-2017 02:13:57.764 +0000 INFO KeyManagerLocalhost - Finished reading public key for localhost: /opt/splunk/etc/auth/distServerKeys/trusted.pem
12-07-2017 02:13:57.764 +0000 INFO KeyManagerLocalhost - Reading private key for localhost: /opt/splunk/etc/auth/distServerKeys/private.pem
12-07-2017 02:13:57.764 +0000 INFO KeyManagerLocalhost - Finished reading private key for localhost: /opt/splunk/etc/auth/distServerKeys/private.pem
12-07-2017 02:13:57.859 +0000 INFO HttpPubSubConnection - SSL connection with id: connection_10.26.210.210_8089_ip-10-26-210-210.ec2.internal_ip-10-26-210-210_predix_aws_us_east_1_hec
12-07-2017 02:13:57.862 +0000 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection_10.26.210.210_8089_ip-10-26-210-210.ec2.internal_ip-10-26-210-210_predix_aws_us_east_1_hec
12-07-2017 02:14:01.965 +0000 INFO TailReader - Could not send data to output queue (parsingQueue), retrying...
12-07-2017 02:14:02.384 +0000 INFO HttpPubSubConnection - Running phone uri=/services/broker/phonehome/connection_10.26.210.210_8089_ip-10-26-210-210.ec2.internal_ip-10-26-210-210_predix_aws_us_east_1_hec
12-07-2017 02:14:02.385 +0000 INFO DC:HandshakeReplyHandler - Handshake done.
12-07-2017 02:14:06.079 +0000 WARN TcpOutputProc - Forwarding to indexer group group1 blocked for 10 seconds.
12-07-2017 02:14:11.965 +0000 INFO TailReader - ...continuing.
12-07-2017 02:14:19.966 +0000 INFO TailReader - Could not send data to output queue (parsingQueue), retrying...
12-07-2017 02:14:21.030 +0000 WARN TcpOutputProc - Forwarding to indexer group group1 blocked for 10 seconds.
12-07-2017 02:14:26.220 +0000 INFO TcpOutputProc - Closing stream for idx=10.26.200.73:9997
12-07-2017 02:14:26.221 +0000 INFO TcpOutputProc - Connected to idx=10.26.200.187:9997 using ACK.
12-07-2017 02:14:26.386 +0000 INFO TailReader - ...continuing.
12-07-2017 02:14:34.439 +0000 INFO TailReader - Could not send data to output queue (parsingQueue), retrying...
12-07-2017 02:14:36.012 +0000 WARN TcpOutputProc - Forwarding to indexer group group1 blocked for 10 seconds.
Any thoughts are more than welcome. Again this is only happenning on my HEC HWF. Thanks!
SO after digging deeper into this issue, it seems that my indexers may not be setup correct. It appears that THP is set to mvadvise which should be OK but I am not convinced that it is working. Could this be causing the issue on the indexers as well as the HEC's?
It looks like parsingQueue blocked due to various reason , if you refer answer on https://answers.splunk.com/answers/5590/could-not-send-data-to-the-output-queue.html there is very good explanation provided for same type of problem.
Hey there - it turns out that this issue only happens when I enable useACK. I have zero network connectivity issues and am just completely stumped on this one.
ANY help is MUCH appreciated, as Splunk support has no idea what the problem is either.
If you read this documentation http://docs.splunk.com/Documentation/Splunk/7.0.1/Forwarding/Protectagainstlossofin-flightdata#How_t... and next topics in same page then you will able to get idea that how acknowledgement works.
There might be several reason that your forwarder's output queue is full. Based on the doc.
A wait queue can fill up when something is wrong with the network or indexer; however, it can also fill up even when the indexer is functioning normally. This is because the indexer only sends the acknowledgment after it has written the data to the file system. Any delay in writing to the file system will slow the pace of acknowledgment, leading to a full wait queue.
There are a few reasons that a normal functioning indexer might delay writing data to the file system (and so delay its sending of acknowledgments):
The indexer is very busy. For example, at the time the data arrives, the indexer might be dealing with multiple search requests or with data coming from a large number of forwarders.
The indexer is receiving too little data. For efficiency, an indexer only writes to the file system periodically -- either when a write queue fills up or after a timeout of a few seconds. If a write queue is slow to fill up, the indexer will wait until the timeout to write. If data is coming from only a few forwarders, the indexer can end up in the timeout condition, even if each of those forwarders is sending a normal quantity of data. Since write queues exist on a per hot bucket basis, the condition occurs when some particular bucket is getting a small amount of data. Usually this means that a particular index is getting a small amount of data.
If any of the above clue does not help you then I'll suggest to increase queue size as per documentation and check whether it will help or not.