Getting Data In

Blocking on indexers

ntguru5
New Member

I am seeing a lot of blocking on my three indexers, in the range of 500-1000 a day per host. The heaviest is indexqueue and typingqueue, followed by aggqueue. splunktcpin is in the double-digit range.
The indexes are striped across all three indexers. I'm at a loss on where to begin looking, anyone have this issue with blocking on their Splunk indexers?

0 Karma

richnavis
Contributor

I believe I've identified the root cause as slow disk for the COLD_DB. Our configuration is to have the hot/warm DBs on local attached (virtually, anyway) disks, and point the cold_dbs to a CIFs share on a NetApp... so index.conf looks something like this...

[databases]

coldPath = \netapp\splunk\SplunkIndex02\DATA_2\databases\colddb

homePath = F:\CustomIndex\DATA_2\databases\db

thawedPath = \netapp\splunk\SplunkIndex02\DATA_2\databases\thaweddb

maxWarmDBCount = 32

So, I created another locally attached drive, and used it as the coldpath on ONE of the three indexers we have. After 4 hours, we have not seen ANY blocking on the indexer with the "locally attached" drive, while the other indexers continue to see blocking at the same rate as before. In this particular case, the slow disk was the cold db. If there a way to have splunk roll the files to cold on a schedule, rather than constantly.. this would not be a problem..

romantercero
Path Finder

Yeah, I've had this happen. How many GB is each indexer handling daily? A safe number is 100GB.

0 Karma

richnavis
Contributor

More Info... I looked at a low indexing volume time (800MB/Indexer) and we still saw 28 indexqueue blocking event...

0 Karma

richnavis
Contributor

Aprox 30GBs/day... However, this even happens at substancially lower indexing volumes..

0 Karma

MarioM
Motivator

have you try increasing the queue maxSize in splunk/etc/system/local/server.conf:

##########################################################################################
# Queue settings
##########################################################################################
[queue]
maxSize = [<integer>|<integer>[KB|MB|GB]]
        * Specifies default capacity of a queue.
        * If specified as a lone integer (for example, maxSize=1000), maxSize indicates the maximum number of events allowed
          in the queue.
        * If specified as an integer followed by KB, MB, or GB (for example, maxSize=100MB), it indicates the maximum
          RAM allocated for queue.
        *** The default is 500KB.**

[queue=<queueName>]
maxSize = [<integer>|<integer>[KB|MB|GB]]
        * Specifies the capacity of a queue. It overrides the default capacity specified in [queue].
        * If specified as a lone integer (for example, maxSize=1000), maxSize indicates the maximum number of events allowed
          in the queue.
        * If specified as an integer followed by KB, MB, or GB (for example, maxSize=100MB), it indicates the maximum
          RAM allocated for queue.
        * The default is inherited from maxSize value specified in [queue]

richnavis
Contributor

More Info... I looked at a low indexing volume time (800MB/Indexer) and we still saw 28 indexqueue blocking event...

0 Karma

richnavis
Contributor

The interesting part is that if you look at disk queueing, disk response times and IOPs, there is not not much to indicate a disk bottleneck... Queueing is less than 1, RT is sub 20ms, and IOPS are less than 100... We tested the disks before installing splunk and we were able to reach upwards of 3000 IOPS... Of note.. these machines are virtualized, but are not sharing resources with other servers.. essentially dedicated from a Server AND SAN perspective...

0 Karma

mikelanghorst
Motivator

Seeing that many messages a day, I would be concerned that the larger queue size would just delay the issue, since it seems it isn't getting the data output to disk quickly enough.

0 Karma

ntguru5
New Member

Thanks! I bumped indexqueue to 2000 and will look into increasing any others.

0 Karma
Get Updates on the Splunk Community!

Splunk With AppDynamics - Meet the New IT (And Engineering) Couple

Wednesday, November 20, 2024  |  10AM PT / 1PM ET Register Now Join us in this session to learn all about ...

Building a Self-Service and Scalable Observability Practice

Thursday, November 14, 2024  |  11AM PT / 2PM ET Register Now Join us in this session and learn how Splunk ...

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

The University of Nevada, Las Vegas (UNLV) is another premier research institution helping to shape the next ...