I am facing issues while searching any logs and its takes a lot of time to index the log.
While investigating, i am seeing many times the queues are blocked as mentioned below.
INFO Metrics - group=queue, name=indexqueue, blocked=true, max_size_kb=500, current_size_kb=499, current_size=1466, largest_size=1466, smallest_size=0
Is it worth to increase the max size of the queues? Also while config file i need to change to increase the same?
As others answered, your indexing problem will not be solved by modifying the size of the queues, since you will keep the same root problem, in this case, disk write speed problems. You can use the information from this post to calculate the IOPS of disk arrays used by the indexers.
On the other hand, I recommend using Splunk Sizing so that you can get an estimated IOPS recommendation for the disks based on the type of RAID you are using.
In case your Splunk instance is a standalone instance, maybe you need to change it to a distributed environment, using a dedicated server or dedicated cluster for each tier (search head, indexer, forwarder, license manager) instead of a single server.
Best wishes
LGS
Increasing the queue size may work to add an additional buffer for bursts of incoming log data, however it is unlikely to fix your blocked indexing queue.
For example if you made you incoming TCP input queue bigger, then more data can queue there while the data is getting written to disk, you can do this with various other queues to allow a bigger buffer as data travels down the queue.
However the only real way to resolve an index queue issue on an indexer would be to index less (i.e. add more indexers) or to have faster I/O (get faster disk for your hot section). if your data is getting forwarded then you would look at the system getting forwarded to as well.
Good luck!
We had an interesting discussion about it recently at What's the maxSize we can set for the event-processing queues?
It's crucial to adjust the queue's sizes as you compensate for slow I/O with memory cashing - marvelous!!
Can you please post your indexer's $SPLUNK_HOME/etc/system/local/server.conf
?
Hi @chintan_shah,
Indexqueue blocking is due to many reason.
1.) Storage latency - if indexer is not able to write in storage at required IOPS then indexqueue will be filled. -> You need to check with storage team whether you are getting required IOPS from storage or not.
2.) Universal forwarder sending more logs compare to indexers capacity. -> In this case if you are running Indexer cluster you need to add more indexer in your cluster.
Before increasing any queue size I'll recommnd to contact splunk support.
Thanks,
Harshil
i am seeing the above issue for exec queue .. please guide what to do for that and exec queue is for running some scripts or what ?
I have an HeavyForwarder in the middle of a Splunk flow infrastructure.
I needed to edit the conf for queues, since the HF can't manage all the data UFs send.
So, take a look here,
https://docs.splunk.com/Documentation/Splunk/9.0.3/Admin/Serverconf#Queue_settings
I need to raise all the major queues buffer, to 2GB (2000MB), in other ways HF was always blocked, and, moreover, did not send back to UFs the needed ACK, blocking all the streams many times a day.
The config can be edited in server.conf in [queue] stanza.
[queue]
maxSize = [<integer>|<integer>[KB|MB|GB]]
[queue=<queueName>] ( * you can look at metrics.log for the names)
maxSize = [<integer>|<integer>[KB|MB|GB]]
...
* Get all the queues names with
index=_internal source=*metrics.log host=YOUR_INDEXER(S) group=queue | dedup name | table host name max_size_kb
i did so for the HF (2000MB) and my Indexers (1000MB for major queues).
As in my answer to main question, maybe the problem in your case continues with queue resizing, and, moreover, you will have other issues with your instance. The best practice is not move the queues size, do it only in case of an emergency meanwhile you request a support checking to determine where is the real problem.
In your case, maybe the problem is disk IOPS, maybe a networking problem (check if you are using the same network device for input and output, the network speed, and if there is not any blockage at network level), or the number of events incoming in your instance from other forwarders, the number of other forwarders sending data through this router forwarder. Check that the server is not having any other problems related with CPU or RAM, if is Linux, that the server is correctly setted up, disabling Transparent Hugepages, modifying ulimits, filesystems used are consistent with Splunk's recommendations (System requirements for use of Splunk Enterprise on-premises - Splunk Documentation), etc
Hope you can get the real issue and solve it.