Hello,
we have a problem with persistent queue's in our infrastructure.
We have TCP inputs sending SSL traffic to a heavy forwarder which acts as an intermediate forwarder. We do not parse on the hf! All we do is putting the data from TCP directly into the index queue. That mostly works perfectly fine for nearly 1 TB data per day. But sometimes the source pushes nearly 1 TB per hour which obviously overwhelms the HF, hence the persistent queue.
We have the following inputs.conf:
[tcp-ssl:4444]
index = xxx
persistentQueueSize=378000MB
sourcetype = xxx
disabled=false
queue = indexQueue
I expect all files in "/opt/splunk/var/run/splunk/tcpin/" for port 4444 to not exceed the allocated size of 378GB.
But as can seen below, the total size of all files for port 4444 is 474GB! Way more than the allocated 378GB.
Some files say corrupted probably because we hit our disk limit on the server and Splunk couldn't write to those files anymore.
Did someone else experienced this behavior before?
Thanks in advance and best regards,
Eric
I had a chat with the Splunk support and we figured out what went wrong:
The persistentQueueSize is a PER data pipeline setting. As we are using the HF as an intermediate forwarder we have multiple data pipelines to parallel the workload.
In our case Splunk tried to use 378GB per pipeline, hence the disk overflow. Our solutions was to divide the total allocated space by the number of pipelines. (378GB / #of pipelines -> new persistentQueueSize)
I had a chat with the Splunk support and we figured out what went wrong:
The persistentQueueSize is a PER data pipeline setting. As we are using the HF as an intermediate forwarder we have multiple data pipelines to parallel the workload.
In our case Splunk tried to use 378GB per pipeline, hence the disk overflow. Our solutions was to divide the total allocated space by the number of pipelines. (378GB / #of pipelines -> new persistentQueueSize)