I noticed on my splunk instance that I am getting messages like these:
02-07-2020 15:20:36.038 -0500 INFO Metrics - group=queue, name=typingqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=993, largest_size=993, smallest_size=993
02-07-2020 15:21:35.038 -0500 INFO Metrics - group=queue, name=aggqueue, blocked=true, max_size_kb=1024, current_size_kb=1023, current_size=2035, largest_size=2035, smallest_size=2035
02-07-2020 15:21:35.038 -0500 INFO Metrics - group=queue, name=auditqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=809, largest_size=809, smallest_size=809
02-07-2020 15:21:35.038 -0500 INFO Metrics - group=queue, name=indexqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=998, largest_size=998, smallest_size=998
02-07-2020 15:21:35.038 -0500 INFO Metrics - group=queue, name=parsingqueue, blocked=true, max_size_kb=6144, current_size_kb=6143, current_size=99, largest_size=99, smallest_size=99
02-07-2020 15:21:35.038 -0500 INFO Metrics - group=queue, name=splunktcpin, blocked=true, max_size_kb=500, current_size_kb=499, current_size=995, largest_size=995, smallest_size=995
How can I resolve this?
Based on your screenshot, you have multiple compounding issues.
You need to disable Transparent Huge Pages:
https://docs.splunk.com/Documentation/Splunk/8.0.1/ReleaseNotes/SplunkandTHP
Your ulimits are not set correctly and need to be increased:
https://docs.splunk.com/Documentation/Splunk/8.0.1/Troubleshooting/ulimitErrors#Set_limits_using_the...
Your system resources are below the recommendation, which usually means you're running on VMWare.
If correcting the first two issues does not ease the congestion, you may want to consider increasing the parallel ingestion pipelines.
https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/Pipelinesets
None of these seemed to fix it. I am running on AWS, and it is a c4.4xlarge.
As for the ulimit, the file did not exist, so I created it, and added that text to the file.
I noticed under netstat -tulpn, 9997 is not listening, as is defined under settings -> receive data. I disabled the receiver (which failed), then received a similar error when re-enabling:
Error occurred attempting to enable 9997: .
Queue messages
Queue messages look like
... group=queue, name=parsingqueue, max_size=1000, filled_count=0, empty_count=8, current_size=0, largest_size=2, smallest_size=0
Most of these values are not interesting. But current_size, especially considered in aggregate, across events, can tell you which portions of Splunk indexing are the bottlenecks. If current_size remains near zero, then probably the indexing system is not being taxed in any way. If the queues remain near 1000, then more data is being fed into the system (at the time) than it can process in total.
Sometimes you will see messages such as ... group=queue, name=parsingqueue, blocked!!=true, max_size=1000, filled_count=0, empty_count=8, current_size=0, largest_size=2, smallest_size=0
This message contains the blocked string, indicating that it was full, and someone tried to add more, and couldn't. A queue becomes unblocked as soon as the code pulling items out of it pulls an item. Many blocked queue messages in a sequence indicate that data is not flowing at all for some reason. A few scattered blocked messages indicate that flow control is operating, and is normal for a busy indexer.
If you want to look at the queue data in aggregate, graphing the average of current_size is probably a good starting point.
There are queues in place for data going into the parsing pipeline, and for data between parsing and indexing. Each networking output also has its own queue, which can be useful to determine whether the data is able to be sent promptly, or alternatively whether there's some network or receiving system limitation.
It comes out because the size of metric is 500kb or more.
How would I graph that?
your_log
| extract pairdelim="," kvdelim="="
|table _time current_size_kb
viz >> Line chart
Are there any good tutorials on this? I am new to Splunk. Thanks.
Is this about?
I've attached an image - I don't have many hosts reporting to this server, and it looks like I have plenty of ram, even though it's not the recommended number.
why not talk sales vender?