Archive

Receiving "blocked=true messages" on Splunk instance

user789
New Member

I noticed on my splunk instance that I am getting messages like these:

02-07-2020 15:20:36.038 -0500 INFO  Metrics - group=queue, name=typingqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=993, largest_size=993, smallest_size=993
02-07-2020 15:21:35.038 -0500 INFO  Metrics - group=queue, name=aggqueue, blocked=true, max_size_kb=1024, current_size_kb=1023, current_size=2035, largest_size=2035, smallest_size=2035
02-07-2020 15:21:35.038 -0500 INFO  Metrics - group=queue, name=auditqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=809, largest_size=809, smallest_size=809
02-07-2020 15:21:35.038 -0500 INFO  Metrics - group=queue, name=indexqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=998, largest_size=998, smallest_size=998
02-07-2020 15:21:35.038 -0500 INFO  Metrics - group=queue, name=parsingqueue, blocked=true, max_size_kb=6144, current_size_kb=6143, current_size=99, largest_size=99, smallest_size=99
02-07-2020 15:21:35.038 -0500 INFO  Metrics - group=queue, name=splunktcpin, blocked=true, max_size_kb=500, current_size_kb=499, current_size=995, largest_size=995, smallest_size=995

How can I resolve this?

alt text

0 Karma

codebuilder
Motivator

Based on your screenshot, you have multiple compounding issues.

You need to disable Transparent Huge Pages:
https://docs.splunk.com/Documentation/Splunk/8.0.1/ReleaseNotes/SplunkandTHP

Your ulimits are not set correctly and need to be increased:
https://docs.splunk.com/Documentation/Splunk/8.0.1/Troubleshooting/ulimitErrors#Set_limits_using_the...

Your system resources are below the recommendation, which usually means you're running on VMWare.

If correcting the first two issues does not ease the congestion, you may want to consider increasing the parallel ingestion pipelines.
https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/Pipelinesets

0 Karma

user789
New Member

None of these seemed to fix it. I am running on AWS, and it is a c4.4xlarge.
As for the ulimit, the file did not exist, so I created it, and added that text to the file.

0 Karma

user789
New Member

I noticed under netstat -tulpn, 9997 is not listening, as is defined under settings -> receive data. I disabled the receiver (which failed), then received a similar error when re-enabling:
Error occurred attempting to enable 9997: .

0 Karma

to4kawa
SplunkTrust
SplunkTrust
Queue messages
Queue messages look like

... group=queue, name=parsingqueue, max_size=1000, filled_count=0, empty_count=8, current_size=0, largest_size=2, smallest_size=0

Most of these values are not interesting. But current_size, especially considered in aggregate, across events, can tell you which portions of Splunk indexing are the bottlenecks. If current_size remains near zero, then probably the indexing system is not being taxed in any way. If the queues remain near 1000, then more data is being fed into the system (at the time) than it can process in total.

Sometimes you will see messages such as ... group=queue, name=parsingqueue, blocked!!=true, max_size=1000, filled_count=0, empty_count=8, current_size=0, largest_size=2, smallest_size=0

This message contains the blocked string, indicating that it was full, and someone tried to add more, and couldn't. A queue becomes unblocked as soon as the code pulling items out of it pulls an item. Many blocked queue messages in a sequence indicate that data is not flowing at all for some reason. A few scattered blocked messages indicate that flow control is operating, and is normal for a busy indexer.

If you want to look at the queue data in aggregate, graphing the average of current_size is probably a good starting point.

There are queues in place for data going into the parsing pipeline, and for data between parsing and indexing. Each networking output also has its own queue, which can be useful to determine whether the data is able to be sent promptly, or alternatively whether there's some network or receiving system limitation.

It comes out because the size of metric is 500kb or more.

0 Karma

user789
New Member

How would I graph that?

0 Karma

to4kawa
SplunkTrust
SplunkTrust
your_log
| extract pairdelim="," kvdelim="="
|table _time current_size_kb

viz >> Line chart

0 Karma

user789
New Member

Are there any good tutorials on this? I am new to Splunk. Thanks.

0 Karma

to4kawa
SplunkTrust
SplunkTrust
0 Karma

user789
New Member

I've attached an image - I don't have many hosts reporting to this server, and it looks like I have plenty of ram, even though it's not the recommended number.

0 Karma

to4kawa
SplunkTrust
SplunkTrust

why not talk sales vender?

0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!