Re: Blocking on indexers

ntguru5 · ‎05-07-2012

I am seeing a lot of blocking on my three indexers, in the range of 500-1000 a day per host. The heaviest is indexqueue and typingqueue, followed by aggqueue. splunktcpin is in the double-digit range.
The indexes are striped across all three indexers. I'm at a loss on where to begin looking, anyone have this issue with blocking on their Splunk indexers?

richnavis · ‎05-10-2012

I believe I've identified the root cause as slow disk for the COLD_DB. Our configuration is to have the hot/warm DBs on local attached (virtually, anyway) disks, and point the cold_dbs to a CIFs share on a NetApp... so index.conf looks something like this...

[databases]

coldPath = \netapp\splunk\SplunkIndex02\DATA_2\databases\colddb

homePath = F:\CustomIndex\DATA_2\databases\db

thawedPath = \netapp\splunk\SplunkIndex02\DATA_2\databases\thaweddb

maxWarmDBCount = 32

So, I created another locally attached drive, and used it as the coldpath on ONE of the three indexers we have. After 4 hours, we have not seen ANY blocking on the indexer with the "locally attached" drive, while the other indexers continue to see blocking at the same rate as before. In this particular case, the slow disk was the cold db. If there a way to have splunk roll the files to cold on a schedule, rather than constantly.. this would not be a problem..

romantercero · ‎05-09-2012

Yeah, I've had this happen. How many GB is each indexer handling daily? A safe number is 100GB.

richnavis · ‎05-09-2012

More Info... I looked at a low indexing volume time (800MB/Indexer) and we still saw 28 indexqueue blocking event...

richnavis · ‎05-09-2012

Aprox 30GBs/day... However, this even happens at substancially lower indexing volumes..

MarioM · ‎05-08-2012

have you try increasing the queue maxSize in splunk/etc/system/local/server.conf:

##########################################################################################
# Queue settings
##########################################################################################
[queue]
maxSize = [<integer>|<integer>[KB|MB|GB]]
        * Specifies default capacity of a queue.
        * If specified as a lone integer (for example, maxSize=1000), maxSize indicates the maximum number of events allowed
          in the queue.
        * If specified as an integer followed by KB, MB, or GB (for example, maxSize=100MB), it indicates the maximum
          RAM allocated for queue.
        *** The default is 500KB.**

[queue=<queueName>]
maxSize = [<integer>|<integer>[KB|MB|GB]]
        * Specifies the capacity of a queue. It overrides the default capacity specified in [queue].
        * If specified as a lone integer (for example, maxSize=1000), maxSize indicates the maximum number of events allowed
          in the queue.
        * If specified as an integer followed by KB, MB, or GB (for example, maxSize=100MB), it indicates the maximum
          RAM allocated for queue.
        * The default is inherited from maxSize value specified in [queue]

richnavis · ‎05-09-2012

More Info... I looked at a low indexing volume time (800MB/Indexer) and we still saw 28 indexqueue blocking event...

richnavis · ‎05-09-2012

The interesting part is that if you look at disk queueing, disk response times and IOPs, there is not not much to indicate a disk bottleneck... Queueing is less than 1, RT is sub 20ms, and IOPS are less than 100... We tested the disks before installing splunk and we were able to reach upwards of 3000 IOPS... Of note.. these machines are virtualized, but are not sharing resources with other servers.. essentially dedicated from a Server AND SAN perspective...

mikelanghorst · ‎05-09-2012

Seeing that many messages a day, I would be concerned that the larger queue size would just delay the issue, since it seems it isn't getting the data output to disk quickly enough.

ntguru5 · ‎05-09-2012

Thanks! I bumped indexqueue to 2000 and will look into increasing any others.

Blocking on indexers

New Year, New Changes for Splunk Certifications

[Puzzles] Solve, Learn, Repeat: Unmerging HTML Tables

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Join the Conversation

Blocking on indexers

New Year, New Changes for Splunk Certifications

[Puzzles] Solve, Learn, Repeat: Unmerging HTML Tables

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...