In my lab setup, I have a Heavy Forwarder hosted in AWS and an indexer at home that the HF forwards data to.
Every now and then forwarding of data gets interrupted because my old router starts suddenly considering it as a SIP attack and starts dropping it. During that time the queue on the HF gets full and it freaks out.
I wanted to make sure that even during the period when then happens I don't lose the data that the HF receives via it's HTTP Event Collector so I've created a 1 GB persistent queue on the HEC input. The connection went down again but once I got it fixed I did not get the data that I know was generated during that time. Nothing in my indexer. While it was still down I had a look at the
SPLUNK_HOME/var/run/splunk/httpin directory and there was one file but just a short meaningless string in there.
When going through the internal logs I did notice this around the time the connection was lost:
TcpInputProc - Stopping all listening ports. Queues blocked for more than 300 seconds
I'm sure I haven't filled up the persistent queue, so if all ports get stopped when the standard queue gets full, what is the point of the persistent queue?
Or am I doing something wrong here?
I did see those docs, it does not really explain why I'm seeing this behaviour.
I have the [queue] maxSize = 500KB by default and the persistent queue set to 1GB, which is larger then the size of the Queue. I don't see the queueSize stanza in the documentation for inputs.conf. I'm assuming it might be a depreciated setting but will put it in my conf anyway and test to see if it makes a difference.
Please look for other errors around that time.
A queue blocking is just a symptom but not the cause of the issue. When setting a persistenQueue in inputs.conf (remember, it's per input), also make sure to increaes all queue sizes accordingly (general setting in server.conf). I would not suggest playing around with persistentQueues and queue sizes if you don't have much experience with it. If you have a license and thus are entitled to open cases, I'd suggest to do so. Splunk support then may actually have a look at your environment and suggest settings for your queues (or other problems you might be facing).
Take this with a grain of salt, because I'm just here looking for details about the persistent queue myself, but it sounds like you think it is a queue where the heavy forwarder holds onto data it can't send to the indexer. I believe the point of the persistent queue is to hold streaming data (udp/tcp/hec) that the heavy forwarder isn't able to process immediately due to its queues filling up. So it's useful for when your heavy forwarder receives too much data to immediately process, because it caches that data instead of dropping it. I don't think it will work as a cache for data that the heavy forwarder is attempting to send to an indexer though. I believe the default behavior of universal/heavy forwarders is to cache data that cannot be transmitted to an indexer, but your router may be dropping the information without the heavy forwarder knowing it is being dropped.
I think what you want to do is use indexer acknowledgement. Edit your outputs.conf and set useACK=true so that the heavy forwarder would resend data to the indexer when it doesn't receive acknowledgement that it was received. Then I believe it would cache outbound data at the heavy forwarder until you fixed your router.
server=, , ...
Here's the splunk doc for indexer ack:
Persistent queues and useACK are two different kind of configurations that have nothing do to with each other. Persistent queues get either configured in inputs.conf per input,
useACK however is used for all outgoing data.
So yes, from what I can read out of the doc persistent queues will be filled up when the processing pipeline get's filled it, which is something that happens if the data is being streamed in faster that the pipeline can process it but also when the output is failing (like when there is loss of connectivity to the indexer) which is my scenario. Not 100% sure at this point. Will try to find some more info on this.
I am facing similar issues with HEC persistent queue. I want to store data in the persistent queue when there is an internet outage and data can not be forwarded. Did you happen to find the root cause ? Many thanks in advance!