Re: Receiving "blocked=true messages" on Splunk in...

user789 · ‎02-07-2020

I noticed on my splunk instance that I am getting messages like these:

02-07-2020 15:20:36.038 -0500 INFO  Metrics - group=queue, name=typingqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=993, largest_size=993, smallest_size=993
02-07-2020 15:21:35.038 -0500 INFO  Metrics - group=queue, name=aggqueue, blocked=true, max_size_kb=1024, current_size_kb=1023, current_size=2035, largest_size=2035, smallest_size=2035
02-07-2020 15:21:35.038 -0500 INFO  Metrics - group=queue, name=auditqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=809, largest_size=809, smallest_size=809
02-07-2020 15:21:35.038 -0500 INFO  Metrics - group=queue, name=indexqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=998, largest_size=998, smallest_size=998
02-07-2020 15:21:35.038 -0500 INFO  Metrics - group=queue, name=parsingqueue, blocked=true, max_size_kb=6144, current_size_kb=6143, current_size=99, largest_size=99, smallest_size=99
02-07-2020 15:21:35.038 -0500 INFO  Metrics - group=queue, name=splunktcpin, blocked=true, max_size_kb=500, current_size_kb=499, current_size=995, largest_size=995, smallest_size=995

How can I resolve this?

codebuilder · ‎02-11-2020

Based on your screenshot, you have multiple compounding issues.

You need to disable Transparent Huge Pages:
https://docs.splunk.com/Documentation/Splunk/8.0.1/ReleaseNotes/SplunkandTHP

Your ulimits are not set correctly and need to be increased:
https://docs.splunk.com/Documentation/Splunk/8.0.1/Troubleshooting/ulimitErrors#Set_limits_using_the...

Your system resources are below the recommendation, which usually means you're running on VMWare.

If correcting the first two issues does not ease the congestion, you may want to consider increasing the parallel ingestion pipelines.
https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/Pipelinesets

----
An upvote would be appreciated and Accept Solution if it helps!

user789 · ‎02-18-2020

None of these seemed to fix it. I am running on AWS, and it is a c4.4xlarge.
As for the ulimit, the file did not exist, so I created it, and added that text to the file.

user789 · ‎02-18-2020

I noticed under netstat -tulpn, 9997 is not listening, as is defined under settings -> receive data. I disabled the receiver (which failed), then received a similar error when re-enabling:
Error occurred attempting to enable 9997: .

to4kawa · ‎02-07-2020

Queue messages
Queue messages look like

... group=queue, name=parsingqueue, max_size=1000, filled_count=0, empty_count=8, current_size=0, largest_size=2, smallest_size=0

Most of these values are not interesting. But current_size, especially considered in aggregate, across events, can tell you which portions of Splunk indexing are the bottlenecks. If current_size remains near zero, then probably the indexing system is not being taxed in any way. If the queues remain near 1000, then more data is being fed into the system (at the time) than it can process in total.

Sometimes you will see messages such as ... group=queue, name=parsingqueue, blocked!!=true, max_size=1000, filled_count=0, empty_count=8, current_size=0, largest_size=2, smallest_size=0

This message contains the blocked string, indicating that it was full, and someone tried to add more, and couldn't. A queue becomes unblocked as soon as the code pulling items out of it pulls an item. Many blocked queue messages in a sequence indicate that data is not flowing at all for some reason. A few scattered blocked messages indicate that flow control is operating, and is normal for a busy indexer.

If you want to look at the queue data in aggregate, graphing the average of current_size is probably a good starting point.

There are queues in place for data going into the parsing pipeline, and for data between parsing and indexing. Each networking output also has its own queue, which can be useful to determine whether the data is able to be sent promptly, or alternatively whether there's some network or receiving system limitation.

It comes out because the size of metric is 500kb or more.

user789 · ‎02-10-2020

How would I graph that?

to4kawa · ‎02-10-2020

your_log
| extract pairdelim="," kvdelim="="
|table _time current_size_kb

viz >> Line chart

user789 · ‎02-10-2020

Are there any good tutorials on this? I am new to Splunk. Thanks.

to4kawa · ‎02-10-2020

WelcometotheSearchTutorial

Is this about?

user789 · ‎02-10-2020

I've attached an image - I don't have many hosts reporting to this server, and it looks like I have plenty of ram, even though it's not the recommended number.

to4kawa · ‎02-10-2020

why not talk sales vender?

Receiving "blocked=true messages" on Splunk instance

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms