How to troubleshoot blocked queues that are preven...

arber · ‎09-17-2014

Hello, currently im having a problem with the Splunk system we use. We collect data from other clients using syslog. The client send data to the splunk system via syslog and then the Splunk reads the content of the folder the data are stored. Today the system stopped indexing. we can see the logs still coming in the folder that splunk reads but they are not showed during the searches.
Searching the splunkd.log i found this:

09-17-2014 16:15:44.357 +0200 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...

also in metrics.log

09-17-2014 16:03:33.811 +0200 INFO  Metrics - group=queue, name=splunktcpin, blocked=true, max_size_kb=500, current_size_kb=499, current_size=661, largest_size=1081, smallest_size=0
09-17-2014 16:03:33.811 +0200 INFO  Metrics - group=queue, name=typingqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=874, largest_size=1399, smallest_size=0
09-17-2014 16:04:10.812 +0200 INFO  Metrics - group=queue, name=aggqueue, blocked=true, max_size_kb=1024, current_size_kb=1023, current_size=2826, largest_size=2855, smallest_size=735
09-17-2014 16:05:14.809 +0200 INFO  Metrics - group=queue, name=splunktcpin, blocked=true, max_size_kb=500, current_size_kb=499, current_size=739, largest_size=813, smallest_size=0
09-17-2014 16:06:16.811 +0200 INFO  Metrics - group=queue, name=typingqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=659, largest_si"

Any idea on how to get the queues unblocked ?

Regards

Arber

ltrand · ‎09-23-2014

Within the UF you can manage queue size as below in the $SPLUNK/etc/system/local/server.conf file to increase the parsing queue:

[queue=parsingQueue] maxSize = 500 This is the default size
[queue=parsingQueue] maxSize = 10MB A reasonable size if watching a DNS server
[queue=parsingQueue] maxSize = 0 If you are crazy and want to allow unthrottled forwarding. USE WITH CARE

I would suggest identifying the servers that need this and define them as a server class so you can easily manage who has this setting and who has the default.

arber · ‎09-24-2014

I changed the queue sizes. Made the parsingQueue from 6 MB to 20 MB and the aggQueue from 1 MB to 20 MB but still the queues are blocked. And the indexing for that specific file is stopped

ltrand · ‎09-24-2014

Can you provide some log output from splunkd or metric in /splunk/var/log? We had to tune this several times as well and it was a balancing act. The queues will take RAM, so if you have plenty available feel free to crank it. Since I dont' know what logs from your router that your sending or what that rate is I can't suggest a good number to hit. For reference I have an IDS who's UF parsing queue is set to 300MB to make it work, but it's creating large amounts of logs so that's what it takes to keep up with the rate.

Think of it this way, if your trying to drain a basin that is filling at a rate of 5 gallons/min and you can only bail 3gallons/min you'll never keep up. So, when splunk is bailing out the logs, it needs to be at or better than the rate of incoming.

arber · ‎09-25-2014

metrics.log

09-25-2014 16:09:12.905 +0200 INFO Metrics - group=queue, name=parsingqueue, blocked=true, max_size_kb=6144, current_size_kb=6143, current_size=4790, largest_size=6126, smallest_size=3166
09-25-2014 16:09:12.905 +0200 INFO Metrics - group=queue, name=splunktcpin, blocked=true, max_size_kb=500, current_size_kb=499, current_size=615, largest_size=845, smallest_size=0
09-25-2014 16:09:12.905 +0200 INFO Metrics - group=queue, name=typingqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=644, largest_size=781, smallest_size=0
09-25-2014 16:09:56.924 +0200 INFO Metrics - group=queue, name=aggqueue, blocked=true, max_size_kb=1024, current_size_kb=1023, current_size=1596, largest_size=1816, smallest_size=637
09-25-2014 16:10:36.903 +0200 INFO Metrics - group=queue, name=aggqueue, blocked=true, max_size_kb=1024, current_size_kb=1023, current_size=1516, largest_si

I tried to increment the size this time to 100 MB.. so i made aggqueue 100 MB (even though in the log it is 1 MB) but still i would get something like INFO Metrics - group=queue, name=aggqueue, blocked=true, max_size_kb=102400, current_size_kb=102300, current_size=1036648, largest_si

still the same thing queues are blocked. what is strange is that this is happening with just this file.. But before it was ok ..and im pretty sure that the log volume from this host is not changed

jagadeeshm · ‎04-17-2017

So how did you eventually resolve this issue?

arber · ‎04-18-2017

We created another folder on splunk that these devices send the logs to. After that applied the monitor on the new folder. That seemed to work

arber · ‎09-23-2014

The problem is that the device is a juniper router ...so not possible to install the UF. When i search it using the SoS app i can see that that log file is in a status of ignored (reading batch file) while all the other files in thta folder that is beeing monitored are in a status of reading

ltrand · ‎09-23-2014

It will ignore files that overload its buffer in an effort to preserve logging for the rest. The server.conf and limits.conf file should be edited at whatever point splunk is touching the data. So if you have syslog that is receiving the data and writing it to file on the indexer and the indexer is watching the local files, then that is where you change it.

linu1988 · ‎09-23-2014

I had seen errors like this on splunk 5 indexers. Mostly after restarting the services it would continue without any issue. It may be due to all the network ports getting used up?

martin_mueller · ‎09-17-2014

Go dig for messages about the actual reason though, preferably at the time indexing stopped for the first time. Those queues are only a symptom.

Alternatively, take a look at http://wiki.splunk.com/Community:HowIndexingWorks and see if queues further down suddenly aren't blocked. Then the processor after the bottom-most blocked queue might be to blame.

arber · ‎09-17-2014

No disk space is ok.. The indexing is stop for only these clients that are using syslog to send data

martin_mueller · ‎09-17-2014

The key question is, why did indexing stop? Blocked queues are usually just a symptom of something down the line not working properly, they're usually not a cause of anything.

Disk space would be a common issue... should be shown prominently in Splunk though.

How to troubleshoot blocked queues that are preventing data from being indexed?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

How to troubleshoot blocked queues that are preventing data from being indexed?

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers