Getting Data In

Blocking on indexers

ntguru5
New Member

I am seeing a lot of blocking on my three indexers, in the range of 500-1000 a day per host. The heaviest is indexqueue and typingqueue, followed by aggqueue. splunktcpin is in the double-digit range.
The indexes are striped across all three indexers. I'm at a loss on where to begin looking, anyone have this issue with blocking on their Splunk indexers?

0 Karma

richnavis
Contributor

I believe I've identified the root cause as slow disk for the COLD_DB. Our configuration is to have the hot/warm DBs on local attached (virtually, anyway) disks, and point the cold_dbs to a CIFs share on a NetApp... so index.conf looks something like this...

[databases]

coldPath = \netapp\splunk\SplunkIndex02\DATA_2\databases\colddb

homePath = F:\CustomIndex\DATA_2\databases\db

thawedPath = \netapp\splunk\SplunkIndex02\DATA_2\databases\thaweddb

maxWarmDBCount = 32

So, I created another locally attached drive, and used it as the coldpath on ONE of the three indexers we have. After 4 hours, we have not seen ANY blocking on the indexer with the "locally attached" drive, while the other indexers continue to see blocking at the same rate as before. In this particular case, the slow disk was the cold db. If there a way to have splunk roll the files to cold on a schedule, rather than constantly.. this would not be a problem..

romantercero
Path Finder

Yeah, I've had this happen. How many GB is each indexer handling daily? A safe number is 100GB.

0 Karma

richnavis
Contributor

More Info... I looked at a low indexing volume time (800MB/Indexer) and we still saw 28 indexqueue blocking event...

0 Karma

richnavis
Contributor

Aprox 30GBs/day... However, this even happens at substancially lower indexing volumes..

0 Karma

MarioM
Motivator

have you try increasing the queue maxSize in splunk/etc/system/local/server.conf:

##########################################################################################
# Queue settings
##########################################################################################
[queue]
maxSize = [<integer>|<integer>[KB|MB|GB]]
        * Specifies default capacity of a queue.
        * If specified as a lone integer (for example, maxSize=1000), maxSize indicates the maximum number of events allowed
          in the queue.
        * If specified as an integer followed by KB, MB, or GB (for example, maxSize=100MB), it indicates the maximum
          RAM allocated for queue.
        *** The default is 500KB.**

[queue=<queueName>]
maxSize = [<integer>|<integer>[KB|MB|GB]]
        * Specifies the capacity of a queue. It overrides the default capacity specified in [queue].
        * If specified as a lone integer (for example, maxSize=1000), maxSize indicates the maximum number of events allowed
          in the queue.
        * If specified as an integer followed by KB, MB, or GB (for example, maxSize=100MB), it indicates the maximum
          RAM allocated for queue.
        * The default is inherited from maxSize value specified in [queue]

richnavis
Contributor

More Info... I looked at a low indexing volume time (800MB/Indexer) and we still saw 28 indexqueue blocking event...

0 Karma

richnavis
Contributor

The interesting part is that if you look at disk queueing, disk response times and IOPs, there is not not much to indicate a disk bottleneck... Queueing is less than 1, RT is sub 20ms, and IOPS are less than 100... We tested the disks before installing splunk and we were able to reach upwards of 3000 IOPS... Of note.. these machines are virtualized, but are not sharing resources with other servers.. essentially dedicated from a Server AND SAN perspective...

0 Karma

mikelanghorst
Motivator

Seeing that many messages a day, I would be concerned that the larger queue size would just delay the issue, since it seems it isn't getting the data output to disk quickly enough.

0 Karma

ntguru5
New Member

Thanks! I bumped indexqueue to 2000 and will look into increasing any others.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...